bioanalyzeR

Joe Foley

2020-06-25

Introduction

This package imports electrophoresis data from the Agilent Bioanalyzer and TapeStation systems and includes functions to graph and analyze the data.

Features

Why is this useful?

Exporting data from the Agilent software

Bioanalyzer

In the 2100 Expert software, open your data file (.xad) in the “Data” context. Select “File->Export…” from the top menu. Check the “Export to XML” box and no others. Click “Export” and then save the file wherever you like.

TapeStation

In the TapeStation Analysis Software, open your data file (.D1000, .HSD1000, .RNA., .gDNA, etc.). You need to export both the metadata (in XML format) and gel image (PNG).

Metadata XML

Select “File->Export Data->Export to XML”. You do not need to export the gel image or individual EPG images at this point. Select your destination and then click “Export” to save the file.

Gel image PNG

It is important to follow these unusual directions carefully!

  1. In the “Home” tab (top), verify that there is a ladder lane (“Electronic Ladder” is okay) and that the markers are correctly identified in every sample (except failed lanes, which are okay).
  2. Select the “Gel” context (top left button).
  3. Select “Show All Lanes” if the button is not grayed out.
  4. Unselect “Aligned”, “Scale to Sample”, and “Scale to MW Range” if it is not grayed out.
  5. Leave the contrast slider in the middle.
  6. Maximize the window and drag the lower end of the gel image area to make it as tall as possible. At this point you should see all lanes from the run, with the marker bands present but unaligned.
  7. Right-click on any lane of the gel and in the context menu, uncheck “Show Marker Annotations”.
  8. Right-click on a lane near the left end of the gel (it doesn’t matter which) to bring up the context menu again.
  9. Move your cursor over “Snapshot” but also over a lane to the right of the one you right-clicked on.
  10. Left-click “Snapshot”. This will copy the gel image to your system clipboard, but you should see also see the newly selected lane become highlighted in light blue.
  11. Open any image editor (e.g. Paint) and paste the image from your clipboard. You should see one lane highlighted in light blue and unaligned lanes with no green or purple marker bands. If not, try taking the snapshot again.
  12. Save the gel image as a file in PNG format, preferably with the same name as the XML file and the .png extension. E.g. if your XML file is batch1.xml then the PNG file should be batch1.png.

Suggested workflow to process many files quickly:

  1. Copy the gel image into Paint but don’t save it yet.
  2. Export the XML file and click “Close” for that file instead of returning to “Home”.
  3. Return to Paint and click “Save”, then begin typing a filename to match the XML file and it should automatically complete the rest so you only have to change .xml to .png.
  4. Press Ctrl-N to start a new file in Paint.
  5. Repeat.

Data storage and transfer

This package can read XML files compressed with gzip. Although the Agilent software does not automatically compress its exported XML files, it may be helpful to compress them yourself for long-term storage, particularly the large Bioanalyzer XML files that contain all the raw data.

Uncompressed XML files are assumed to have the extension .xml and compressed ones .xml.gz. TapeStation gel image files are assumed to have the extension .png.

The filename you import can be a local path or a URL, so in principle you can directly open data stored on a remote server. However, opening from URLs works with some servers and fails with others because of problems with file handling in the XML package, which are outside the scope of this package to fix.

Importing data

This package includes the read.bioanalyzer and read.tapestation functions to import data from the different platforms, but it is easier to use the wrapper function read.electrophoresis, which automatically determines the type of data and can read multiple files into a single object.

Demo data included in this package

For a demonstration, we can use some pre-exported example data from the Agilent software. This package’s extdata subdirectory contains the pre-exported data from every supported demo file included with the Bioanalyzer 2100 Expert and TapeStation Analysis software (with some typos in the sample names corrected so they are easy to parse):

find.package("bioanalyzeR")
#> [1] "/tmp/RtmpFVZKTw/temp_libpathe6434e03df79/bioanalyzeR"
list.files(paste0(find.package("bioanalyzeR"), "/extdata"), recursive = TRUE)
#>  [1] "bioanalyzer/Demo DNA 1000 Series II.xml.gz"                 
#>  [2] "bioanalyzer/Demo DNA 12000 Series II.xml.gz"                
#>  [3] "bioanalyzer/Demo DNA 7500 Series II.xml.gz"                 
#>  [4] "bioanalyzer/Demo Eukaryote Total RNA Nano Series II.xml.gz" 
#>  [5] "bioanalyzer/Demo Eukaryote Total RNA Pico Series II.xml.gz" 
#>  [6] "bioanalyzer/Demo High Sensitivity DNA.xml.gz"               
#>  [7] "bioanalyzer/Demo mRNA Nano Series II.xml.gz"                
#>  [8] "bioanalyzer/Demo mRNA Pico Series II.xml.gz"                
#>  [9] "bioanalyzer/Demo Plant RNA Nano.xml.gz"                     
#> [10] "bioanalyzer/Demo Plant RNA Pico.xml.gz"                     
#> [11] "bioanalyzer/Demo Prokaryote Total RNA Nano Series II.xml.gz"
#> [12] "bioanalyzer/Demo Prokaryote Total RNA Pico Series II.xml.gz"
#> [13] "tapestation/cfDNA-Plate-96.png"                             
#> [14] "tapestation/cfDNA-Plate-96.xml.gz"                          
#> [15] "tapestation/cfDNA-Tubes-16.png"                             
#> [16] "tapestation/cfDNA-Tubes-16.xml.gz"                          
#> [17] "tapestation/D1000-Plate-96.annotations.csv"                 
#> [18] "tapestation/D1000-Plate-96.png"                             
#> [19] "tapestation/D1000-Plate-96.xml.gz"                          
#> [20] "tapestation/D1000-Tubes-16.png"                             
#> [21] "tapestation/D1000-Tubes-16.xml.gz"                          
#> [22] "tapestation/D5000-Plate-96.png"                             
#> [23] "tapestation/D5000-Plate-96.xml.gz"                          
#> [24] "tapestation/D5000-Tubes-16.png"                             
#> [25] "tapestation/D5000-Tubes-16.xml.gz"                          
#> [26] "tapestation/Eukaryotic RNA-Plate-64.png"                    
#> [27] "tapestation/Eukaryotic RNA-Plate-64.xml.gz"                 
#> [28] "tapestation/Eukaryotic RNA-Tubes-16.png"                    
#> [29] "tapestation/Eukaryotic RNA-Tubes-16.xml.gz"                 
#> [30] "tapestation/gDNA-Plate-96.png"                              
#> [31] "tapestation/gDNA-Plate-96.xml.gz"                           
#> [32] "tapestation/gDNA-Tubes-16.png"                              
#> [33] "tapestation/gDNA-Tubes-16.xml.gz"                           
#> [34] "tapestation/High Sensitivity D1000-Tubes-16.png"            
#> [35] "tapestation/High Sensitivity D1000-Tubes-16.xml.gz"         
#> [36] "tapestation/High Sensitivity D5000-Tubes-16.png"            
#> [37] "tapestation/High Sensitivity D5000-Tubes-16.xml.gz"         
#> [38] "tapestation/High Sensitivity Eukaryotic RNA-Plate-96.png"   
#> [39] "tapestation/High Sensitivity Eukaryotic RNA-Plate-96.xml.gz"
#> [40] "tapestation/High Sensitivity Eukaryotic RNA-Tubes-16.png"   
#> [41] "tapestation/High Sensitivity Eukaryotic RNA-Tubes-16.xml.gz"
#> [42] "tapestation/HSD1000-HaloplexHS-4.png"                       
#> [43] "tapestation/HSD1000-HaloplexHS-4.xml.gz"                    
#> [44] "tapestation/HSD5000-SureSelectQXT-3.png"                    
#> [45] "tapestation/HSD5000-SureSelectQXT-3.xml.gz"                 
#> [46] "tapestation/Prokaryotic RNA-Plate-32.png"                   
#> [47] "tapestation/Prokaryotic RNA-Plate-32.xml.gz"

So let’s import a file and see what we have:

batch well.number sample.name sample.observations sample.comment ladder.well
Demo DNA 1000 Series II 1 Ladder 1 Ladder 1 13
Demo DNA 1000 Series II 2 Ladder 2 Ladder 2 13
Demo DNA 1000 Series II 3 Ladder 3 Ladder 3 13
Demo DNA 1000 Series II 4 DNA 1000 ladder Ladder DNA 1000 13
Demo DNA 1000 Series II 5 Ladder 5 Ladder 5 13
Demo DNA 1000 Series II 6 Ladder 6 Ladder 6 13
Demo DNA 1000 Series II 7 Ladder 1 Ladder 1 13
Demo DNA 1000 Series II 8 Ladder 2 Ladder 2 13
Demo DNA 1000 Series II 9 Ladder 3 Ladder 3 13
Demo DNA 1000 Series II 10 DNA 1000 ladder Ladder DNA 1000 13
Demo DNA 1000 Series II 11 Ladder 5 Ladder 5 13
Demo DNA 1000 Series II 12 Ladder 6 Ladder 6 13
Demo DNA 1000 Series II 13 Ladder Ladder DNA 1000 13

Or we can read several files into one object:

batch well.number sample.name sample.observations sample.comment ladder.well
Demo DNA 1000 Series II 1 Ladder 1 Ladder 1 13
Demo DNA 1000 Series II 2 Ladder 2 Ladder 2 13
Demo DNA 1000 Series II 3 Ladder 3 Ladder 3 13
Demo DNA 1000 Series II 4 DNA 1000 ladder Ladder DNA 1000 13
Demo DNA 1000 Series II 5 Ladder 5 Ladder 5 13
Demo DNA 1000 Series II 6 Ladder 6 Ladder 6 13
Demo DNA 1000 Series II 7 Ladder 1 Ladder 1 13
Demo DNA 1000 Series II 8 Ladder 2 Ladder 2 13
Demo DNA 1000 Series II 9 Ladder 3 Ladder 3 13
Demo DNA 1000 Series II 10 DNA 1000 ladder Ladder DNA 1000 13
Demo DNA 1000 Series II 11 Ladder 5 Ladder 5 13
Demo DNA 1000 Series II 12 Ladder 6 Ladder 6 13
Demo DNA 1000 Series II 13 Ladder Ladder DNA 1000 13
Demo DNA 7500 Series II 1 ladder 1 Ladder 1 13
Demo DNA 7500 Series II 2 ladder 2 Ladder 2 13
Demo DNA 7500 Series II 3 ladder 3 Ladder 3 13
Demo DNA 7500 Series II 4 ladder DNA 7500 DNA 7500 ladder 13
Demo DNA 7500 Series II 5 ladder 5 Ladder 5 13
Demo DNA 7500 Series II 6 ladder 6 Ladder 6 13
Demo DNA 7500 Series II 7 ladder 1 Ladder 1 13
Demo DNA 7500 Series II 8 ladder 2 Ladder 2 13
Demo DNA 7500 Series II 9 ladder 3 Ladder 3 13
Demo DNA 7500 Series II 10 ladder DNA 7500 DNA 7500 ladder 13
Demo DNA 7500 Series II 11 ladder 5 Ladder 5 13
Demo DNA 7500 Series II 12 ladder 6 Ladder 6 13
Demo DNA 7500 Series II 13 Ladder DNA 7500 ladder 13
Demo DNA 12000 Series II 1 Ladder 3 Ladder 3 13
Demo DNA 12000 Series II 2 Ladder x 13
Demo DNA 12000 Series II 3 Ladder 6 Ladder 6 13
Demo DNA 12000 Series II 4 Ladder 2 Ladder 2 13
Demo DNA 12000 Series II 5 Ladder 1 13
Demo DNA 12000 Series II 6 Ladder 4 Ladder 4 13
Demo DNA 12000 Series II 7 Ladder 3 Ladder 3 13
Demo DNA 12000 Series II 8 Ladder x 13
Demo DNA 12000 Series II 9 Ladder 6 Ladder 6 13
Demo DNA 12000 Series II 10 Ladder 2 Ladder 2 13
Demo DNA 12000 Series II 11 Ladder 1 Ladder 1 13
Demo DNA 12000 Series II 12 Ladder 4 Ladder 4 13
Demo DNA 12000 Series II 13 Ladder DNA 12000_Ladder 13

The electrophoresis class

Let’s have a closer look at what we get when we open a data file:

dna1000 <- read.electrophoresis(system.file(
    "extdata",
    "bioanalyzer",
    "Demo DNA 1000 Series II.xml.gz",
    package = "bioanalyzeR"
))
class(dna1000)
#> [1] "electrophoresis"
names(dna1000)
#> [1] "data"               "assay.info"         "samples"           
#> [4] "peaks"              "regions"            "mobility.functions"
head(dna1000$data)
sample.index time fluorescence aligned.time length concentration molarity
1 30.00 -0.0004425 33.42105 NA NA NA
1 30.05 -0.2573700 33.47368 NA -0.0008645 NA
1 30.10 -0.0841217 33.52632 NA -0.0011433 NA
1 30.15 -0.1443634 33.57895 NA -0.0007638 NA
1 30.20 -0.1733398 33.63158 NA -0.0010604 NA
1 30.25 -0.2204742 33.68421 NA -0.0013123 NA
head(dna1000$samples)
batch well.number sample.name sample.observations sample.comment ladder.well
Demo DNA 1000 Series II 1 Ladder 1 Ladder 1 13
Demo DNA 1000 Series II 2 Ladder 2 Ladder 2 13
Demo DNA 1000 Series II 3 Ladder 3 Ladder 3 13
Demo DNA 1000 Series II 4 DNA 1000 ladder Ladder DNA 1000 13
Demo DNA 1000 Series II 5 Ladder 5 Ladder 5 13
Demo DNA 1000 Series II 6 Ladder 6 Ladder 6 13
head(dna1000$peaks)
sample.index peak.observations length time aligned.time lower.time upper.time lower.aligned.time upper.aligned.time area concentration molarity lower.length upper.length
1 Lower Marker 15.0000 39.10 43.00000 37.55 40.50 41.36842 44.47368 63.487170 4.2000000 424.242000 19.51728 11.50747
1 Possible Co-Migration of 4 Peaks 48.7340 43.15 47.26316 42.55 44.20 46.63158 48.36842 7.176994 0.5163261 16.052680 61.14507 40.51514
1 Possible Co-Migration of 2 Peaks 103.0442 49.95 54.42105 48.80 50.95 53.21052 55.47368 11.265350 0.6640159 9.763611 109.63968 95.11741
1 Questionable Peak 144.7399 55.10 59.84211 54.30 55.35 59.00000 60.10526 2.983823 0.1636650 1.713260 146.42629 137.12854
1 149.5976 55.70 60.47368 55.35 56.50 60.10526 61.31579 5.005250 0.2723388 2.758294 156.73152 146.42629
1 Possible Co-Migration of 2 Peaks 200.6069 61.35 66.42105 60.40 62.80 65.42105 67.94736 27.163960 1.2741050 9.623101 213.73246 191.99486
head(dna1000$regions)
#> NULL

Each electrophoresis object contains the main data in its $data member and has several other members with metadata like the sample names and software-reported peaks and regions of interest (though this particular example had no regions in it).

You’ll notice that many of the values at the beginning of dna1000$data are NA. This is because the mobility model (the standard curve relating migration speed to molecule length) does not extrapolate to observations below the lower marker or above the upper marker, and the estimates of concentration and molarity depend on the length. We can see some better examples if we look farther down:

head(subset(dna1000$data, ! is.na(length)))
sample.index time fluorescence aligned.time length concentration molarity
183 1 39.10 95.47542 43.00000 15.00000 0.4897247 52.83526
184 1 39.15 94.49809 43.05263 15.11272 0.4953069 53.04567
185 1 39.20 89.56355 43.10526 15.22582 0.4793073 50.95727
186 1 39.25 81.28615 43.15789 15.33966 0.4443601 46.89712
187 1 39.30 70.84047 43.21053 15.45462 0.3951816 41.40179
188 1 39.35 59.44926 43.26316 15.57107 0.3380439 35.15516

Subsetting an electrophoresis object

Because the electrophoresis class is complex, it has its own special subset method (subset.electrophoresis) to simplify subsetting. The principle is that you request a subset of the samples, and all members are automatically updated.

batch well.number sample.name sample.observations sample.comment ladder.well
Demo DNA 1000 Series II 1 Ladder 1 Ladder 1 13
Demo DNA 1000 Series II 2 Ladder 2 Ladder 2 13
Demo DNA 1000 Series II 3 Ladder 3 Ladder 3 13
Demo DNA 1000 Series II 4 DNA 1000 ladder Ladder DNA 1000 13
Demo DNA 1000 Series II 5 Ladder 5 Ladder 5 13
Demo DNA 1000 Series II 6 Ladder 6 Ladder 6 13
Demo DNA 1000 Series II 7 Ladder 1 Ladder 1 13
Demo DNA 1000 Series II 8 Ladder 2 Ladder 2 13
Demo DNA 1000 Series II 9 Ladder 3 Ladder 3 13
Demo DNA 1000 Series II 10 DNA 1000 ladder Ladder DNA 1000 13
Demo DNA 1000 Series II 11 Ladder 5 Ladder 5 13
Demo DNA 1000 Series II 12 Ladder 6 Ladder 6 13
Demo DNA 1000 Series II 13 Ladder Ladder DNA 1000 13
batch well.number sample.name sample.observations sample.comment ladder.well
Demo DNA 1000 Series II 1 Ladder 1 Ladder 1 13
Demo DNA 1000 Series II 7 Ladder 1 Ladder 1 13
sample.index peak.observations length time aligned.time lower.time upper.time lower.aligned.time upper.aligned.time area concentration molarity lower.length upper.length
1 Lower Marker 15.00000 39.10 43.00000 37.55 40.50 41.36842 44.47368 63.487170 4.2000000 424.242000 19.51728 11.50747
1 Possible Co-Migration of 4 Peaks 48.73400 43.15 47.26316 42.55 44.20 46.63158 48.36842 7.176994 0.5163261 16.052680 61.14507 40.51514
1 Possible Co-Migration of 2 Peaks 103.04420 49.95 54.42105 48.80 50.95 53.21052 55.47368 11.265350 0.6640159 9.763611 109.63968 95.11741
1 Questionable Peak 144.73990 55.10 59.84211 54.30 55.35 59.00000 60.10526 2.983823 0.1636650 1.713260 146.42629 137.12854
1 149.59760 55.70 60.47368 55.35 56.50 60.10526 61.31579 5.005250 0.2723388 2.758294 156.73152 146.42629
1 Possible Co-Migration of 2 Peaks 200.60690 61.35 66.42105 60.40 62.80 65.42105 67.94736 27.163960 1.2741050 9.623101 213.73246 191.99486
1 247.80470 66.55 71.89474 66.00 67.60 71.31579 73.00000 15.190050 0.6459550 3.949556 257.16736 242.67650
1 298.17930 72.10 77.73684 71.40 73.25 77.00000 78.94736 22.842370 0.8833409 4.488554 308.73868 291.75125
1 399.03180 81.45 87.57895 80.65 82.65 86.73684 88.84211 37.251460 1.1971060 4.545490 416.11561 387.95512
1 500.72600 87.70 94.15790 86.75 89.10 93.15789 95.63158 60.457780 1.7825220 5.393745 526.07298 483.71376
1 599.27960 92.35 99.05263 91.25 93.95 97.89474 100.73680 48.356040 1.3856860 3.503410 623.64741 567.08634
1 691.47530 96.70 103.63160 95.70 97.25 102.57890 104.21050 51.946870 1.4504850 3.178280 703.92033 664.42627
1 733.62730 97.80 104.78950 97.25 99.55 104.21050 106.63160 85.765930 2.3990110 4.954644 798.56214 703.92033
1 886.34890 100.55 107.68420 99.55 101.35 106.63160 108.52630 78.290610 2.1990900 3.759187 970.57550 798.56214
1 1062.70300 102.10 109.31580 101.35 104.50 108.52630 111.84210 111.507300 3.0126590 4.295300 1359.72492 970.57550
1 Upper Marker 1500.00000 105.60 113.00000 104.60 107.00 111.94740 114.47370 81.828600 2.1000000 2.121210 1678.82832 1372.44499
2 Lower Marker 15.00000 38.10 43.00000 36.50 39.80 41.22925 44.88142 60.471110 4.2000000 424.242000 21.86306 11.20956
2 Possible Co-Migration of 4 Peaks 49.37152 42.00 47.31620 41.45 43.00 46.70751 48.42292 7.704427 0.4866122 14.933520 61.69016 41.50816
2 Possible Co-Migration of 2 Peaks 104.15310 48.55 54.56522 47.45 49.40 53.34782 55.50593 12.328830 0.6406598 9.319897 109.86789 95.92045
2 Questionable Peak 144.58620 53.30 59.82213 52.50 53.55 58.93676 60.09881 3.557921 0.1724338 1.806971 146.37161 136.60235
2 149.69360 53.90 60.48617 53.55 54.70 60.09881 61.37154 5.903490 0.2836979 2.871497 157.20764 146.37161
2 Possible Co-Migration of 2 Peaks 200.96380 59.30 66.46245 58.45 60.45 65.52174 67.73518 29.403260 1.2173120 9.177821 211.90866 192.86213
2 247.24690 64.15 71.83004 63.60 64.90 71.22134 72.66008 17.541650 0.6596678 4.042501 254.24046 241.86454
2 297.82420 69.45 77.69566 68.40 70.45 76.53360 78.80238 25.624100 0.8758825 4.455963 307.46568 287.70168
2 398.68740 78.35 87.54546 77.40 79.55 86.49407 88.87353 41.578320 1.1809730 4.488104 416.56440 384.95290
2 Possible Co-Migration of 2 Peaks 501.28730 84.35 94.18578 83.60 85.75 93.35574 95.73518 93.347700 2.4307890 7.347105 527.89321 487.03267
2 595.99160 88.60 98.88933 87.90 90.25 98.11463 100.71540 60.722940 1.5385310 3.911305 623.19480 571.23498
2 689.58220 92.80 103.53760 91.80 93.40 102.43080 104.20160 66.136880 1.6321430 3.586145 703.69439 661.00692
2 734.58020 93.95 104.81030 93.40 95.50 104.20160 106.52570 98.885480 2.4436170 5.040221 792.05827 703.69439
2 875.31400 96.45 107.57710 95.55 97.25 106.58100 108.46250 96.875570 2.4111390 4.173631 963.47126 795.41612
2 1046.80200 97.90 109.18180 97.25 100.50 108.46250 112.05930 128.535800 3.0735300 4.448652 1385.97377 963.47126
2 Upper Marker 1500.00000 101.35 113.00000 100.55 102.75 112.11460 114.54940 92.634490 2.1000000 2.121210 1688.01425 1392.66356

You can see that in addition to dna1000.ladder1$samples, $peaks and other members have also been reduced to data from the remaining samples even though they don’t contain the sample.observations variable themselves. Instead, the sample.index column has been updated to point to the new row numbers of those samples in the $samples table.

Combining electrophoresis objects

The electrophoresis class also has a special method for combining multiple instances into one, which is rbind (rbind.electrophoresis) since most members are data frames and multiple instances should have the same columns. However, this special rbind method automatically updates the sample.index columns and concatenates the other members that are not data frames.

dna1000 <- read.electrophoresis(system.file(
    "extdata",
    "bioanalyzer",
    "Demo DNA 1000 Series II.xml.gz",
    package = "bioanalyzeR"
))
dna7500 <- read.electrophoresis(system.file(
    "extdata",
    "bioanalyzer",
    "Demo DNA 7500 Series II.xml.gz",
    package = "bioanalyzeR"
))
unique(dna1000$data$sample.index)
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
unique(dna7500$data$sample.index)
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
combined.batches <- rbind(dna1000, dna7500)
unique(combined.batches$data$sample.index)
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26
combined.batches$assay.info
#> $`Demo DNA 1000 Series II`
#> $`Demo DNA 1000 Series II`$file.name
#> [1] "C:\\Program Files (x86)\\Agilent\\2100 bioanalyzer\\2100 expert\\data\\samples\\demo\\electrophoresis\\Demo DNA 1000 Series II.xad"
#> 
#> $`Demo DNA 1000 Series II`$creation.date
#> [1] "2005-12-14T08:03:40.000"
#> 
#> $`Demo DNA 1000 Series II`$assay.name
#> [1] "DNA 1000 Series II"
#> 
#> $`Demo DNA 1000 Series II`$assay.type
#> [1] "DNA"
#> 
#> $`Demo DNA 1000 Series II`$length.unit
#> [1] "bp"
#> 
#> $`Demo DNA 1000 Series II`$concentration.unit
#> [1] "ng/µl"
#> 
#> $`Demo DNA 1000 Series II`$molarity.unit
#> [1] "nM"
#> 
#> $`Demo DNA 1000 Series II`$fit
#> [1] "spline"
#> 
#> 
#> $`Demo DNA 7500 Series II`
#> $`Demo DNA 7500 Series II`$file.name
#> [1] "C:\\Program Files (x86)\\Agilent\\2100 bioanalyzer\\2100 expert\\data\\samples\\demo\\electrophoresis\\Demo DNA 7500 Series II.xad"
#> 
#> $`Demo DNA 7500 Series II`$creation.date
#> [1] "2005-12-14T09:56:52.000"
#> 
#> $`Demo DNA 7500 Series II`$assay.name
#> [1] "DNA 7500 Series II"
#> 
#> $`Demo DNA 7500 Series II`$assay.type
#> [1] "DNA"
#> 
#> $`Demo DNA 7500 Series II`$length.unit
#> [1] "bp"
#> 
#> $`Demo DNA 7500 Series II`$concentration.unit
#> [1] "ng/µl"
#> 
#> $`Demo DNA 7500 Series II`$molarity.unit
#> [1] "nM"
#> 
#> $`Demo DNA 7500 Series II`$fit
#> [1] "spline"

The rbind method is automatically used to combine multiple batches in read.electrophoresis, so if you use that function to import the batches at the same time, you probably will not need to rbind them later.

Drawing electropherograms

qplot.electrophoresis

The members of an electrophoresis object that contain graphable data are data frames compatible with ggplot2. However, the metadata by sample are in the $samples member while the actual electrophoresis data are kept separately in the $data member. To simplify graphing, this package includes the qplot.electrophoresis function, which is analogous to ggplot2::qplot but has slightly different syntax: in particular, the first argument is the electrophoresis object, and the x- and y-variables have defaults.

This produces plots analogous to the electropherograms in the Agilent software. However, with the default settings there are several differences; these settings are described in the next sections.

Data, peaks, and regions

qplot.electrophoresis displays the software-reported peaks as filled area under the curve. You can stop displaying peaks by setting show.peaks = FALSE:

Reported regions of interest are shown as semitransparent gray rectangles; you can modify the transparency of these or set it to NA to stop displaying regions:

Finally, the readings from the samples themselves are plotted by default as a continuous curve (ggplot2::geom_line). The other supported geom is geom_area, which (as in qplot) you get by setting geom = "area":

Data ranges

You can easily zoom in on an interesting feature by setting the xlim or ylim arguments (note that you can leave a limit as NA to let the software choose):

Or you can use the scales argument, which is passed to ggplot2::facet_wrap or ggplot2::facet_grid, to allow different facets to automatically get different axis scales:

Another difference from the electropherograms in the software is that, by default, the x-axis (molecule length) is in a linear scale. In the Agilent software, data points are simply graphed as they appeared to the instrument and the x-axis labels are roughly (but not exactly; see stdcrv.mobility below) logarithmic. You can log-scale the axes with the log argument that behaves the same as in ggplot2::qplot:

Of course, log-scaling the y-axis gives weird results when the values are fractional.

Changing the axis variables

By now you have probably noticed that the y-axis displays concentration rather than fluorescence. In fact it displays concentration per length (the concentration estimate for each point is scaled with the differential.scale function) so that the area under a curve, between any two x-values, is directly proportional to the concentration of molecules in that range.

If you change the y-variable to y = "fluorescence" you can see something closer to the electropherograms from the Agilent software:

This is still not quite the same as the electropherograms from the Agilent software because the x-axis is linear. You could log-scale it as above, but to get a truly analogous graph you can simply change the x-variable to relative.distance, which for TapeStation data is the migration distance normalized to the markers (the Bioanalyzer has aligned.time instead):

Note that the x-axis is automatically reversed when the x-variable is distance, to keep the plots in the same orientation.

Even though differential-scaled concentration is more directly informative than raw fluorescence, for many experiments the real variable of interest is molarity. Rather than the total mass of molecules, we want to know the number of molecules. The problem is that the mass concentration, and therefore the fluorescence, depends on the length of the molecule: a 2 kb fragment has approximately twice the mass and twice the fluorescence of a 1 kb fragment, even if the copy number is the same. So, compare concentration and molarity for one sample:

If our variable of interest was molarity, the concentration graph would have been very misleading! And the original fluorescence vs. distance graph even moreso.

Or sometimes the absolute variable is irrelevant and what we really want to know is the size distribution within each sample. For that you can set normalize = TRUE, which uses the normalize.proportion function to scale the variable of interest to a proportion of the sample’s total, i.e. the total area under every curve (between the markers) is 1. Notice how the different dilutions of the same sample become roughly equal after normalization: