extra data for our PacBio paper (nature Biotech 2013)



  1. Visualizing reads on teh UCSC genome browser

    • all consensus-split-mapped-molecules ("CSMMs" - meaning each intron in the mapping respects the splice site consensus including those of Tilgner et al, GGG, 2013 and of Tilgner et al, PNAS 2014) can be inspected visually on the UCSC browser using this link



  2. well-mapped and spliced alignments ("CSMMs") for the human Organ panel)

    • are here
    • note, that alignments overlapping ribosomal RNA genes have been removed
    • note also, that this file contains a comment line for display on the UCSC browser. Uploading all reads to the UCSC browser at once may however take a lot of time. We would advise to rewrite the comment line and all alignments in a genomic region into a smaller file and to upload this smaller file to the UCSC browser
    • on a linux-terminal you could select all alignments located on chr11 between 10Mb and 11Mb using the following command (and then submit the result to the UCSC browser)
      1. zcat supplementalFileS1.gff.gz | awk '{if($1=="track"){print; next;} if($1!="chr11" || $5<10000000 || $4>11000000){next;} print ; }' > test.gff
      2. gzip test.gff
    • similarly treated data using the 454 platform for the human cell lines K562 and HelaS3 can be found here



  3. input fasta data

    • you should be able to download the raw inpt data from the SRA under accession PRJEB3969.
    • To make your life easier you can also get all the CCS for the human Organ panel (in one file combined) in fasta format or in in fastq format
    • And subread data for the human Organ panel (in one file combined) in fasta format