Tutorial
This tutorial will help you get started with SpliceMap by demonstrating how to search for junctions in 100k sample RNA-seq reads of length 100bp from chromosome 21 in the human genome (hg18). If you experience any problems following these steps, please don't hesitate to contact us.
Step 1 - Download and extract the example files
Download the example:
SpliceMap 3.3.5.2 example (Linux-x86 64bit) | This is the recommended version for everyone. |
SpliceMap 3.3.5.2 example (Linux-x86 32bit) | This is the 32-bit version, if your system requires it. |
SpliceMap 3.3.5.2 example (OSX 64bit) | This is the Mac OSX (intel-64 bit) version. |
Extract the example to an empty folder of your choice. After extracting the folder should contain the following files and folders:
dn800c9107:SpliceMap3352_example_OSX-64 moo$ ls INSTALL data src LICENSE genome temp all.gene.refFlat.txt output bin run.cfg
The tutorial will be given with the OSX version. However, the steps are the same for all versions.
Step 2 - Build SpliceMap from source (optional)
Try running "./bin/runSpliceMap" in the example folder. If you see the following output, you are ok and you may skip this step.
dn800c9107:SpliceMap3352_example_OSX-64 moo$ ./bin/runSpliceMap ---== Welcome to SpliceMap 3.3.5.2 (55) ==--- Developed by Kin Fai Au and John C. Mu http://www.stanford.edu/group/wonglab/SpliceMap/ __________ usage: ./runSpliceMap run.cfg run.cfg -- Configuration options for this run, see comments in file for details See website for further details
However, if you see something like
[johnmu@solomon-0-10 SpliceMap3352_example_linux-32]$ ./bin/runSpliceMap ./bin/runSpliceMap: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./bin/runSpliceMap) ./bin/runSpliceMap: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by ./bin/runSpliceMap)
Then the C++ standard libraries in your Linux distribution are not compatible and you need to build SpliceMap from source by following these instructions (for 64-bit systems):
- Navigate to the "src" directory in the example folder
- Type "./install.sh ../bin", this will install SpliceMap into the example bin directory for the purposes of this tutorial. Of course, you can install it anywhere you like in future.
- Type "./install-bowtie.sh ../bin", this will install Bowtie into the example bin directory.
or these instructions (for 32-bit systems):
- Navigate to the "src" directory in the example folder
- Type "./install-32.sh ../bin", this will install SpliceMap into the example bin directory for the purposes of this tutorial. Of course, you can install it anywhere you like in future.
- Type "./install-bowtie-32.sh ../bin", this will install Bowtie into the example bin directory.
SpliceMap is now ready to run and you are ready to move to the next step!
Step 3 - Examine the example directory contents
Before we continue, it will be helpful to learn the purpose of each file in this example. When you run SpliceMap on your data, all of these files can be in separate locations if you wish.- run.cfg
- This is the most important file. It is a text file that contains the path to your sequencer reads, path to your genome files and the configuration settings. Please see .cfg file format for details. It is simple to edit and you will need to edit it once for each data-set.
- genome directory
- This directory contains all of the chromosomes of your organism and the Bowtie index of the same genome. It may be read-only. In this example, we only have chr21 and its associated Bowtie index in the genome directory. For instructions on how to obtain the genome/bowtie index files see the manual.
- data directory
- This directory contains all of the sequencer reads in the example. In your case, this directory could be anywhere and it may be read-only.
- temp directory
- This is a temporary directory created during the execution of SpliceMap. The results of the initial short reads mapping is stored here, so this directory can be quite large.
- output directory
- This is directory stores all the useful output after executing SpliceMap. It is also created during the execution of SpliceMap
- all.gene.refFlat.txt
- This file contains all the known (hg18) gene annotation from Ensembl, RefSeq and knowngene. It is provided for your convenience and may be used to find novel junctions.
- bin directory
- This is directory stores all of the SpliceMap binaries. It is important that all the binaries are in the same location. No installation is required! Simply copy this directory to a location convenient for you.
- src directory
- This is directory stores all of the SpliceMap/Bowtie sources.
Step 4 - Run SpliceMap on the example data
Only one command is need to to initiate SpliceMap.
Make sure your terminal is pointed to the example folder and type the following in one line:
./bin/runSpliceMap run.cfg
You should then see some output:
dn800c9107:SpliceMap3352_example_OSX-64 moo$ ./bin/runSpliceMap run.cfg ---== Welcome to SpliceMap 3.3.5.2 (55) ==--- Developed by Kin Fai Au and John C. Mu http://www.stanford.edu/group/wonglab/SpliceMap/ __________ Loading configuration file... run.cfg output directory exists temp directory exists Scaning genome: genome/chr*.fa List of chromosomes to be searched: chr21 | genome/chr21.fa | pos:7 - 47883217 Please check that these are correct... continuing in 7 s ... < control-c > to exit If they are not correct please check chromosome_wildcard: chr*.fa __________ Temp directory: temp/ Output directory: output/ Maximum number of multiple mapped reads allowed: 10 Maximum number of mismatches allowed in 25-mer seed: 1 Maximum number of mismatches allowed in full read: 2 Maximum number of bases SpliceMap is allowed to clip: 40 Mapper used: bowtie (25th-percentile) intron size: 20000 (99th-percentile) intron size: 400000 Annotations path: name: all.gene.refFlat.txt Package path: ./bin/ name: runSpliceMap Read format: RAW Number of threads: 2 Number of chromosomes to run together: 2 Will print Cufflinks compatible SAM file Reads List 1: data/long_reads_1_100K.txt.seq Reads List 2: data/long_reads_2_100K.txt.seq Preparing the reads!... Bases removed from front: 0 Using as many bases as possible. Extracting 25-mers... ... ...
At this point, feel free to take a break. After about 3-4 minutes the the mapping and junction search will be completed.
Step 5 - Examining the output
All of the output from SpliceMap is automatically copied to the "output" directory. After this execution, it should contain:dnab4167d9:output moo$ ls coverage_all.wig junction_color.bed coverage_down.wig junction_color.new.bed coverage_up.wig junction_nUM_color.bed debug_logs junction_nUM_color.new.bed good_hits.sam log
The following is a description of each output file:
- junction_color.bed
- This file contains the junctions found on all chromosomes. The novel junction are highlighted in red. The faded junctions are not well supported. It may be displayed on UCSC genome browser or cisGenome browser. The tag associated with each junction is explained in the output formats.
- junction_color.new.bed
- This file contains the junctions from junction_color.bed not found in all.gene.refFlat.txt
- junction_nUM_color.bed
- This file contains the junctions from junction_color.bed that are supported by at least one uniquely mappable read
- junction_nUM_color.new.bed
- This file contains the junctions from junction_nUM_color.bed not found in all.gene.refFlat.txt
- coverage_up.wig
- This file contains the coverage supported by uniquely mappable reads at each chromosome position.
- coverage_down.wig
- This file contains the coverage supported by only multiply mappable reads at each chromosome position.
- coverage_all.wig
- This file contains the coverage of all mapped reads at each chromosome position.
- good_hits.sam
- This file contains all mapped reads in SAM format.
- debug_logs
- This folder contains the logs of the output. If you experience problems, please send us the contents of this folder. Otherwise, you can safely ignore it.