Bowtie

Summary

Bowtie is a fast and memory-efficient aligner that uses Burrows-wheeler transform. It handles color-space reads from version 0.13.

How to use Bowtie

Basic configuration

  • Setup directory for indexed target sequences. I used '$HOME/BOWTIE.idx'. After building reference DB, move them to this directory.
  • Setup 'BOWTIE_INDEXES' as your environment variable. If you use bash (if you don't know, type 'echo $SHELL' at command-line), put the following line on your '~/.bashrc' file. Alternatively, you can use automation shell script as below.
  • Naming convention (optional). If you want to analyze both normal base-space reads and color-space reads, it is good idea to discriminate both target DBs with a flag. I put '_c' at the end of DB name if it is prepared with '-C' option.

Color-space reads

  • Prepare target(DB) sequence.

    $ bowtie-build -C <FASTA file> <DB name>
  • Run bowtie. If you use fastq file (read sequences with quality scores),

    $ bowtie \-a \-C \-q \-t \--suppress 6 <DB name> <Query fastq filename> <output filename>

    If you want to ignore quality file, and use 'fasta' format reads,

    $ bowtie \-a \-C \-f \-t \--suppress 6 <DB name> <Query csfasta filename> <output filename>

Normal base reads

  • Prepare target(DB) sequence.

    $ bowtie-build <FASTA file> <DB name>
  • Run bowtie. If you use fastq file (read sequences with quality scores),

    $ bowtie \-a \-q \-t \--suppress 6 <DB name> <Query fastq filename> <output filename>

    If you want to ignore quality file, and use 'fasta' format reads,

    $ bowtie \-a \-f \-t \--suppress 6 <DB name> <Query csfasta filename> <output filename>

bash script

I normally used the following bash script, after modifying each variable depending on data.

#\!/bin/bash
export BOWTIE_INDEXES="/home/taejoon/BOWTIE.idx/"
BOWTIE="/home/taejoon/src64/bowtie/bowtie-0.12.5/bowtie"
DB="DROME_E57_cdna_c"

QUERY="SRR034220.called.fastq"
OUT="SRR034220_called.$DB.bowtie_c"
time $BOWTIE \-a \-C \-q \-t \--suppress 6 $DB $QUERY $OUT

OUT="SRR034220_called.$DB.trim5_bowtie_c"
time $BOWTIE \-a \-C \-q \-t \--trim5 5 \--trim3 5 \--suppress 6 $DB $QUERY $OUT

Available on

Fourierseq

Phylocluster

User documentation

How to run bowtie

Because bowtie does not handle color space data, the only way to use bowtie with color space reads is to convert both the reads and the reference to mock base space.

Example pipeline for running bowtie using colorspace reads: (when dealing with base space reads, follow step 3 onwards)

1. Convert the reference to mock base space.

bs2cs ref.fasta > ref.csfasta

cs2mbs ref.csfasta > ref.m.fasta

where

ref.fasta&nbsp;: reference in base space
ref.csfasta&nbsp;: reference in color space (for temporary purposes)
ref.m.fasta&nbsp;: reference in mock base space

2. Create bowtie indexes for the reference genome

bowtie-build ref.m.fasta refindex

where

ref.m.fasta&nbsp;: reference in mock base space
refindex&nbsp;: basename for bowtie indexes

3. Convert the reads to mock base space

cs2mbs -d -r in.csfasta > in.m.fasta

where

in.csfasta&nbsp;: reads file in color space
in.m.fasta&nbsp;: reads file in mock base space
\-d&nbsp;: drop the first colorspace base during conversion. This will ignore the first color space base which is part of the primer.
\-r&nbsp;:  For each read, include the reverse of the mock base space sequence.

4. Convert the reads to fastq

fasta2fastq in.m.fasta in_QV.qual > in.m.fastq

where

in.m.fasta&nbsp;: reads file in mock base space
in_QV.qual&nbsp;: corresponding quality file
in.m.fastq&nbsp;: output fastq file

5. Align using bowtie

bowtie -q -n 3 --best --norc refindex in.m.fastq out

where

refindex&nbsp;: base name for the bowtie index of the reference
in.m.fastq&nbsp;: input fastq file
out&nbsp;: mapping output file
\-q&nbsp;: indicates use of fastq file
\-n 3&nbsp;: mismatches allowed in seed ( \-v 3 can be used instead to indicate mismatches allowed in entire alignment)
\--norc&nbsp;: do not report reverse complement matches
-- best: make bowtie search till it find the best alignment (based on number of mismatches and quality values at mismatched positions)