Output Format
There are three types of output files that SpliceMap generates. This page details each of them.
SAM format (.sam)
The SAM format (see specifictions) output details the alignment of each read. If a read is multiply mapped, then there will be multiple entries in the SAM file. The following details the important columns.
QNAME | FLAG | RNAME | POS | MAPQ | CIGAR | MRNM | MPOS | ISIZE | SEQ | QUAL | OPTIONAL
- QNAME
- Name of each query copied from the FASTA or FASTQ file. A unique index is attached to the name.
- FLAG
- An explaination of each flag can be found at Explain Flags.
- MAPQ
- 255 if uniquely mapped and 0 if multiply mapped.
- OPTIONAL
-
XS -- Strand of junction
XC -- Number of bases clipped
NM -- Number of mismatchs in the read
MD -- Describes location of mismatchs
NH -- Number of multi-hits
BED format (.bed)
The BED files store junctions. This format is described in the UCSC genome browser help. SpliceMap tags each junction with a snippet of text describing the reliability of the junction. The format of this snippet is(nR)[width_nNR](nUR/nMR)nR -- Number of reads supporting this junction.
width -- Range of different right lengths supporting this junction, larger the better.
nNR -- Number of non-redundant reads supporting this junction.
nUR -- Number of uniquely mappable reads supporting this junction.
nMR -- Number of multiply mappable reads supporting this junction.
BEDgraph format (.wig)
The BEDgraph format generated by SpliceMap complies with the UCSC specifications. The columns are separated by tabs and are descibed as follows.
Chromosome name | Start position | End position | Value
The start and end positions are inclusive.
The value is determined by the number of reads covering that position and is negative for mutiply mapped reads. In addition, each multiply mapped read only contributes a fraction of coverage to each position. For example, if a read is mapped to 7 locations, that read only contributes 1/7 coverage value.