Manual
- Installation
- Obtaining genome files
- Using SpliceMap
- Viewing the output
- Module descriptions
- File formats
- Configuring Eland
- Execution Time
Installation
No explicit installation is required for SpliceMap. You may copy the SpliceMap binaries to any location as long as all the binaries (including Bowtie) are in the same directory or path.
Instructions on how to build SpliceMap from source are included in the package in the file "INSTALL". This may be necessary if you have an older or newer version of the c++ standard libraries. Essentially, executing "./install.sh" and "./install-bowtie.sh" from the src directory will allow you to install SpliceMap and Bowtie from source to a directory of your choice.
Obtaining genome files
In order to run SpliceMap with Bowtie, you will need to obtain two kinds of genome files.
The first is the Bowtie index, which can be downloaded here (download the same version as your genome files, probably UCSC). If the genome you are interested in is not listed, you may need to build your own index by following these instructions.
The second is the same genome in FASTA format, with each chromosome in a separate file. These can be obtained from UCSC--Species--Full data set--chromFa.tar.gz. After unzipping, you will get each chromosome in its own .fa file. However, in general you will also want to delete the "*_random.fa" and "*_hap*.fa" files.
Make sure you place all the genome files of different organisms in separate directories.
Using SpliceMap
Firstly, see the tutorial on how to use SpliceMap on some example data.
In order to use SpliceMap on your own data:
- Obtaining the genome files in the format "chr1.fa, chr2.fa, ..." and also the corresponding Bowtie index.
- Create an empty directory, this will be the working directory.
- Copy "run.cfg" from the SpliceMap package to the working directory.
- Edit run.cfg to include paths to your data files and genome directories. You may also want to configure the default settings.
- Execute "runSpliceMap run.cfg" while in your working directory.
- After a certain time execution will conclude. You can find results in the "output" directory.
Viewing Output
Please see the Viewing output section.
Module: runSpliceMap
"runSpliceMap" is the main program in the SpliceMap package. It calls the other modules to run the full junction detection on your data. Output is written to the "output" folder. Details of the output are described in file formats. Its options are described below. Please note that in future versions, the command line options will be replaced by a configuration file for your convenience.
runSpliceMap run.cfg
- run.cfg
- the .cfg file which defines the run parameters. For details, see .cfg format
Example:
runSpliceMap run.cfg
Module: findNovelJunctions
Finds the junctions that do not exist in a reference file. If you wish to find the junctions that differ between two bed files, you should run this tool twice with each file as reference.
findNovelJunction refFlat.txt junction.bed
or
findNovelJunction refFlat.bed junction.bed
- refFlat.txt or refFlat.bed
- The reference bed or .txt files. Junctions that exist in this file will not be outputted.
- junction.bed
- The .bed file to search from. All junction that exist in this file, but not in the reference will be outputted to a .new.bed file.
findNovelJunction refFlat.txt junction.bed
or
findNovelJunction junction1.bed junction2.bed
Module: statSpliceMap
"statSpliceMap" computes some useful statistics on the junction found. It could give you an idea on the quality of the data. Its options are described below.
statSpliceMap junction.bed [newjun.bed]
- junction.bed
- The primary junction file in bed format. Its statistics will be outputted to the screen.
- newjun.bed (optional)
- The novel junctions found from junction.bed using findNovelJunctions.
statSpliceMap junctions.bed junctions.bed.new.bed
Module: colorJunction
The bed files outputted from mergeGoodList are black and white. "colorJunction" gives them a splash of color and also highlights novel junctions. Its options are described below.
colorJunction infile.bed [newjun.bed]
- infile.bed
- This file will be colored and outputted as infile_color.bed.
- newjun.bed (optional)
- The novel junctions found from junction.bed using findNovelJunctions. They will be highlighted in a different color.
colorJunction junctions.bed junctions.bed.new.bed
Module: subseq
"subseq" extracts a portion of a chromosome then displays its sequence and its reverse complement. Its options are described below.
subseq chr_file start end
- chr_file
- The path to the chromosome of interest.
- start
- Start postion of interest, the extracted sequence is inclusive of this position.
- end
- End postion of interest, the extracted sequence is inclusive of this position.
subseq ../genome/hg18/chr21.fa 214572 214687
Module: uniqueJunctionFilter
One of the optional filters as described in the paper. This removes all junctions that are only supported by multiply mapped reads with no redundant reads.
uniqueJunctionFilter infile.bed outfile.bed
- infile.bed
- Input bed file.
- outfile.bed
- Output bed file.
uniqueJunctionFilter junctions.bed junctions_nUM.bed
Module: nnrFilter
An optional filter that removes junctions based on number of non-redundant supporting reads. It removes all reads that have less than "limit" number of non-redundant supporting reads.
nnrFilter infile.bed outfile.bed limit
- infile.bed
- Input bed file.
- outfile.bed
- Output bed file.
- limit
- Number of non-redundant supporting reads. Suggested value is 2.
nnrFilter infile.bed outfile.bed 2
Module: neighborFilter
One of the optional filters as described in the paper. This removes all isolated junctions. Isolated is defined as no other exonic reads a number of nucleotides up stream and down stream of the junction.
neighborFilter good_hits.sam infile.bed outfile.bed [limit]
- good_hits.sam
- SAM file that contains all mapped reads.
- infile.bed
- Input bed file.
- outfile.bed
- Output bed file.
- limit
- This limit defines the number of nucleotides upstream and downstream to check. Default = 80.
neighborFilter good_hits.sam infile.bed outfile.bed 160
Module: wig2barwig
Converts the .wig file to a .barwig file for input to barloader. This is needed for cisGenome browser.
wig2barwig infile.wig outfile.barwig
- infile.wig
- Input wig file.
- outfile.barwig
- Output barwig file.
wig2barwig coverage.wig coverage.barwig
Module: barloader
Converts the .barwig file to a .bar file for input to cisGenome Browser. This is needed for viewing with cisGenome browser.
barloader infile.barwig
- infile.barwig
- Input .barwig file. It will output a file named infile.barwig.bar.
barloader coverage.barwig
File formats
See Output Format
Configuring Eland
If you have the Eland sources, you will need to build a version that can align 25bp reads. Then rename this to "eland_25" and copy it to the same directory as the SpliceMap package binaries.
Note that Eland is proprietary software, so we cannot distribute it with SpliceMap. However, if you are licensed to use Eland, contact us and we can help compile a copy and email it to you.
Execution Time
If you have many reads, SpliceMap can take quite long to execute. The following execution times are guesstimates based on the running times on our servers with one thread. These figures will greatly differ based on your system configuration.
- 40 million 100bp x2 reads - 60 hours
- 36 million 70bp x2 reads - 40 hours
- 23 million 50bp x2 reads - 12 hours
This speed should be comparable to or faster than similar tools.