...
- Setup directory for indexed target sequences. I used '$HOME/BOWTIE.idx'. After building reference DB, move them to this directory.
- Setup 'BOWTIE_INDEXES' as your environment variable. If you use bash (if you don't know, type 'echo $SHELL' at command-line), put the following line on your '~/.bashrc' file. Alternatively, you can use automation shell script as below.
- Naming convention (optional). If you want to analyze both normal base-space reads and color-space reads, it is good idea to discriminate both target DBs with a flag. I put '_c' at the end of DB name if it is prepared with '-C' option.
Color-space reads
Prepare target(DB) sequence.
Code Block $ bowtie-build -C <FASTA file> <DB name>
Run bowtie. If you use fastq file (read sequences with quality scores),
Code Block $ bowtie \-a \-C \-q \-t \--suppress 6 <DB name> <Query fastq filename> <output filename>
If you want to ignore quality file, and use 'fasta' format reads,
Code Block $ bowtie \-a \-C \-f \-t \--suppress 6 <DB name> <Query csfasta filename> <output filename>
Normal base reads
Prepare target(DB) sequence.
Code Block $ bowtie-build <FASTA file> <DB name>
Run bowtie. If you use fastq file (read sequences with quality scores),
Code Block $ bowtie \-a \-q \-t \--suppress 6 <DB name> <Query fastq filename> <output filename>
If you want to ignore quality file, and use 'fasta' format reads,
Code Block $ bowtie \-a \-f \-t \--suppress 6 <DB name> <Query csfasta filename> <output filename>
bash script
I normally used the following bash script, after modifying each variable depending on data.
Code Block |
---|
#\!/bin/bash |
...
export BOWTIE_INDEXES="/home/taejoon/BOWTIE.idx/" |
...
BOWTIE="/home/taejoon/src64/bowtie/bowtie-0.12.5/bowtie" |
...
DB="DROME_E57_cdna_c" |
...
QUERY="SRR034220.called.fastq" |
...
OUT="SRR034220_called.$DB.bowtie_c" |
...
time $BOWTIE \-a \-C \-q \-t \--suppress 6 $DB $QUERY $OUT |
...
OUT="SRR034220_called.$DB.trim5_bowtie_c" |
...
time $BOWTIE \-a \-C \-q \-t \--trim5 5 \--trim3 5 \--suppress 6 $DB $QUERY $OUT |
Available on
User documentation
- To get started using bowtie, check out the bowtie manual.
...
cs2mbs ref.csfasta > ref.m.fasta
where
Code Block |
---|
ref. |
...
fasta : reference in base space |
...
ref. |
...
csfasta : reference in color space (for temporary purposes) |
...
ref.m. |
...
fasta : reference in mock base space |
2. Create bowtie indexes for the reference genome
bowtie-build ref.m.fasta refindex
where
Code Block |
---|
ref.m. |
...
fasta : reference in mock base |
...
space refindex : basename for bowtie indexes |
3. Convert the reads to mock base space
cs2mbs -d -r in.csfasta > in.m.fasta
where
Code Block |
---|
in. |
...
csfasta : reads file in color space |
...
in.m. |
...
fasta : reads file in mock base space |
...
\-d : drop the first colorspace base during conversion. This will ignore the first color space base which is part of the primer. \-r : For each read, include the reverse of the mock base space sequence. |
4. Convert the reads to fastq
fasta2fastq in.m.fasta in_QV.qual > in.m.fastq
where
Code Block |
---|
in.m. |
...
fasta : reads file in mock base space |
...
in_QV. |
...
qual : corresponding quality file |
...
in.m. |
...
fastq : output fastq file |
5. Align using bowtie
bowtie -q -n 3 --best --norc refindex in.m.fastq out
where refindex : base name for the bowtie index of the reference
Code Block |
---|
refindex : base name for the bowtie index of the reference in.m. |
...
fastq : input fastq |
...
file out : mapping output file \-q : indicates use of fastq file \-n 3 : mismatches allowed in seed ( \-v 3 can be used instead to indicate mismatches allowed in entire alignment) \--norc : do not report reverse complement matches -- best: make bowtie search till it find the best alignment (based on number of mismatches and quality values at mismatched positions) |