-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
157 lines (113 loc) · 5.32 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
SToRM - Seed-based Read Mapping tool for SOLiD or Illumina sequencing data
[ABOUT]
SToRM is a read mapping tool designed first for the SOLiD technology, but
also available now for the classical Illumina technology. First, SToRM is
based on seeding techniques adapted to the statistical characteristics of
reads: the seeds and the alignment algorithms are designed to comply with
the properties of the SOLiD color encoding or Illumina substitution model
as well as the observed reading error distribution along the read.
More information at http://bioinfo.cristal.univ-lille.fr/storm/storm.php
[CONTENTS OF THIS ARCHIVE]
README
Makefile
*.c, *.h - source code
seeds/ - several seed families adapted for SOLiD/Illumina reads
[COMPILING THE SOURCES]
Requirements:
gcc (>=4.2), OpenMP support, Linux or MacOS
Command line:
"make" / "make storm-color" / "make storm-nucleotide" (multithreaded)
[RUNNING SToRM]
Command line:
"./storm -M 1 -g <path_2_reference.fas> -r <path_2_reads.fastq> [opts]"
"./storm -h" for full parameter list
A classic command line:
"./storm-color -M 1 -g reference.fas -r reads.cs -o out.sam"
"./storm-nucleotide -M 1 -g reference.fas -r reads.fq -o out.sam"
Advanced command line:
for "faster" /or/ "more sensitive" mapping :
[0] obviously, use -M <int> to have all mapable reads being mapped !!
Otherwise, only the best read is kept per reference position,
which implies that not all the reads (duplicate, lower quality
ones at the same reference position) are mapped.
[1] use "larger" /or/ "smaller" scoring thresholds
...
-z 150 -t 150
-z 120 -t 120
-z 100 -t 100
...
By default, setting the same value for both -z/-t is a good idea,
unless you have ">15 indels" or "SOLiD color correction" reasons.
...
-z 105 -t 120
-z 85 -t 100
-z 75 -t 90
...
note that E-value "may be somehow" checked with the ALP tool:
http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html_ncbi/html/software/program.html?uid=6
[2] "increase" /or/ "decrease" the number of indels :
...
-i 1
-i 3 (default value)
-i 7
-i 15
for full SIMD support.
...
-i 20
...
for final alignment support only : in that case, the SIMD code
generates a warning explaining that it will compute at most 15
diagonals : you can thus change the SIMD threshold -z <int> to
a smaller value than the final alignment threshold -t <int>.
(see [1] before)
[3] use "less" /or/ "more" seeds of "large" /or/ "small" weight:
...
-s "####-##-#-####;###-#--##-##--###;####-#---#----#--####"
-s "##-##-###-###;###--#-#--#--####;####--#--#---#---###
...
note first that iedera can help you in the seed design :
http://bioinfo.cristal.univ-lille.fr/yass/iedera.php
./iedera -spaced -n 3 -w 11,11 -s 11,22 -r 10000 -k -l 40
./iedera -spaced -n 3 -w 10,10 -s 10,20 -r 10000 -k -l 40
=> you can send requests to Laurent Noé for specific seeding
note also that "chain processing" can be now applied in SToRM :
unmmaped reads from the first step can be mapped in extra steps
...
"./storm-nucleotide -M 1 -s <... big seed(s) ...> -g ref.fas -r reads.fq -o out.sam -O reads2.fq"
"./storm-nucleotide -M 1 -s <... small seed(s) ...> -g ref.fas -r reads2.fq -o out2.sam -O reads3.fq"
...
e.g. an example for single seed:
"###-##--#-##-#-###" (weight 12)
-then->
"####-#-#--##-###" (weight 11)
-then->
"#####-##-###" (weight 10)
theses 3 seeds are generated with :
./iedera -spaced -l 50 -w 12,12 -s 12,24 -r 10000 -k
-> "###-##--#-##-#-###"
./iedera -spaced -l 45 -w 11,11 -s 11,22 -r 10000 -k -mx "###-##--#-##-#-###"
-> "####-#-#--##-###"
./iedera -spaced -l 40 -w 10,10 -s 10,20 -r 10000 -k -mx "###-##--#-##-#-###,####-#-#--##-###"
-> "#####-##-###"
[4] "use" the -l option with smaller value (e.g. -l 10) /or/ "use"
with bigger value (e.g. -l 1000)/"disable" with "<= 0" value
[5] check your quality and change the -b 33 to -b 0 to disable the
score scaling algorithm, or to -b 64 (for Illumina 1.5 to 1.7)
accordingly ...
[READING THE RESULT]
SToRM generates its output in the SAM format. To visualise the result, we
recommend use of the SAMtools package (http://samtools.sourceforge.net/ - http://www.htslib.org/).
For both a SToRM output file "out.sam" and a reference file "ref.fas", the
samtools command lines usually are:
samtools faidx ref.fas
samtools view -bT ref.fas out.sam > out.bam
##
## Only when needed, merge several bam files into one single :
samtools merge -f out.bam out{1}.bam out{2}.bam ... out{42}.bam
##
## Only if the "-M <int>" option is set: the "out.sam"/"out.bam"
## are "read sorted" and not "ref sorted" in that case, so :
samtools sort out.bam -o out_sorted.bam
samtools index out_sorted.bam
samtools tview out_sorted.bam ref.fas
Please refer to the SAMtools documentation for more usage details.