SpliceMap

Manual

Installation

No explicit installation is required for SpliceMap. You may copy the SpliceMap binaries to any location as long as all the binaries (including Bowtie) are in the same directory or path.

Instructions on how to build SpliceMap from source are included in the package in the file "INSTALL". This may be necessary if you have an older or newer version of the c++ standard libraries. Essentially, executing "./install.sh" and "./install-bowtie.sh" from the src directory will allow you to install SpliceMap and Bowtie from source to a directory of your choice.

Obtaining genome files

In order to run SpliceMap with Bowtie, you will need to obtain two kinds of genome files.

The first is the Bowtie index, which can be downloaded here (download the same version as your genome files, probably UCSC). If the genome you are interested in is not listed, you may need to build your own index by following these instructions.

The second is the same genome in FASTA format, with each chromosome in a separate file. These can be obtained from UCSC--Species--Full data set--chromFa.tar.gz. After unzipping, you will get each chromosome in its own .fa file. However, in general you will also want to delete the "*_random.fa" and "*_hap*.fa" files.

Make sure you place all the genome files of different organisms in separate directories.

Using SpliceMap

Firstly, see the tutorial on how to use SpliceMap on some example data.

In order to use SpliceMap on your own data:

Obtaining the genome files in the format "chr1.fa, chr2.fa, ..." and also the corresponding Bowtie index.
Create an empty directory, this will be the working directory.
Copy "run.cfg" from the SpliceMap package to the working directory.
Edit run.cfg to include paths to your data files and genome directories. You may also want to configure the default settings.
Execute "runSpliceMap run.cfg" while in your working directory.
After a certain time execution will conclude. You can find results in the "output" directory.

Viewing Output

Please see the Viewing output section.

Module: runSpliceMap

"runSpliceMap" is the main program in the SpliceMap package. It calls the other modules to run the full junction detection on your data. Output is written to the "output" folder. Details of the output are described in file formats. Its options are described below. Please note that in future versions, the command line options will be replaced by a configuration file for your convenience.

runSpliceMap run.cfg

run.cfg: the .cfg file which defines the run parameters. For details, see .cfg format

Example:

runSpliceMap run.cfg

Module: findNovelJunctions

Finds the junctions that do not exist in a reference file. If you wish to find the junctions that differ between two bed files, you should run this tool twice with each file as reference.

findNovelJunction refFlat.txt junction.bed
or
findNovelJunction refFlat.bed junction.bed

refFlat.txt or refFlat.bed: The reference bed or .txt files. Junctions that exist in this file will not be outputted.
junction.bed: The .bed file to search from. All junction that exist in this file, but not in the reference will be outputted to a .new.bed file.

Example:

findNovelJunction refFlat.txt junction.bed
or
findNovelJunction junction1.bed junction2.bed

Module: statSpliceMap

"statSpliceMap" computes some useful statistics on the junction found. It could give you an idea on the quality of the data. Its options are described below.

statSpliceMap junction.bed [newjun.bed]

junction.bed: The primary junction file in bed format. Its statistics will be outputted to the screen.
newjun.bed (optional): The novel junctions found from junction.bed using findNovelJunctions.

Example:

statSpliceMap junctions.bed junctions.bed.new.bed

Module: colorJunction

The bed files outputted from mergeGoodList are black and white. "colorJunction" gives them a splash of color and also highlights novel junctions. Its options are described below.

colorJunction infile.bed [newjun.bed]

infile.bed: This file will be colored and outputted as infile_color.bed.
newjun.bed (optional): The novel junctions found from junction.bed using findNovelJunctions. They will be highlighted in a different color.

Example:

colorJunction junctions.bed junctions.bed.new.bed

Module: subseq

"subseq" extracts a portion of a chromosome then displays its sequence and its reverse complement. Its options are described below.

subseq chr_file start end

chr_file: The path to the chromosome of interest.
start: Start postion of interest, the extracted sequence is inclusive of this position.
end: End postion of interest, the extracted sequence is inclusive of this position.

Example:

subseq ../genome/hg18/chr21.fa 214572 214687

Module: uniqueJunctionFilter

One of the optional filters as described in the paper. This removes all junctions that are only supported by multiply mapped reads with no redundant reads.

uniqueJunctionFilter infile.bed outfile.bed

infile.bed: Input bed file.
outfile.bed: Output bed file.

Example:

uniqueJunctionFilter junctions.bed junctions_nUM.bed

Module: nnrFilter

An optional filter that removes junctions based on number of non-redundant supporting reads. It removes all reads that have less than "limit" number of non-redundant supporting reads.

nnrFilter infile.bed outfile.bed limit

infile.bed: Input bed file.
outfile.bed: Output bed file.
limit: Number of non-redundant supporting reads. Suggested value is 2.

Example:

nnrFilter infile.bed outfile.bed 2

Module: neighborFilter

One of the optional filters as described in the paper. This removes all isolated junctions. Isolated is defined as no other exonic reads a number of nucleotides up stream and down stream of the junction.

neighborFilter good_hits.sam infile.bed outfile.bed [limit]

good_hits.sam: SAM file that contains all mapped reads.
infile.bed: Input bed file.
outfile.bed: Output bed file.
limit: This limit defines the number of nucleotides upstream and downstream to check. Default = 80.

Example:

neighborFilter good_hits.sam infile.bed outfile.bed 160

Module: wig2barwig

Converts the .wig file to a .barwig file for input to barloader. This is needed for cisGenome browser.

wig2barwig infile.wig outfile.barwig

infile.wig: Input wig file.
outfile.barwig: Output barwig file.

Example:

wig2barwig coverage.wig coverage.barwig

Module: barloader

Converts the .barwig file to a .bar file for input to cisGenome Browser. This is needed for viewing with cisGenome browser.

barloader infile.barwig

infile.barwig: Input .barwig file. It will output a file named infile.barwig.bar.

Example:

barloader coverage.barwig

File formats

See Output Format

Configuring Eland

If you have the Eland sources, you will need to build a version that can align 25bp reads. Then rename this to "eland_25" and copy it to the same directory as the SpliceMap package binaries.

Note that Eland is proprietary software, so we cannot distribute it with SpliceMap. However, if you are licensed to use Eland, contact us and we can help compile a copy and email it to you.

Execution Time

If you have many reads, SpliceMap can take quite long to execute. The following execution times are guesstimates based on the running times on our servers with one thread. These figures will greatly differ based on your system configuration.

40 million 100bp x2 reads - 60 hours
36 million 70bp x2 reads - 40 hours
23 million 50bp x2 reads - 12 hours

This speed should be comparable to or faster than similar tools.

Manual

Installation

Obtaining genome files

Using SpliceMap

Viewing Output

Module: runSpliceMap

Module: findNovelJunctions

Module: statSpliceMap

Module: colorJunction

Module: subseq

Module: uniqueJunctionFilter

Module: nnrFilter

Module: neighborFilter

Module: wig2barwig

Module: barloader

File formats

Configuring Eland

Execution Time

Site Map

Help

Download

Current release

Previous (very old) release

Useful Downloads

Useful links