Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

  • Setup directory for indexed target sequences. I used '$HOME/BOWTIE.idx'. After building reference DB, move them to this directory.
  • Setup 'BOWTIE_INDEXES' as your environment variable. If you use bash (if you don't know, type 'echo $SHELL' at command-line), put the following line on your '~/.bashrc' file. Alternatively, you can use automation shell script as below.
  • Naming convention (optional). If you want to analyze both normal base-space reads and color-space reads, it is good idea to discriminate both target DBs with a flag. I put '_c' at the end of DB name if it is prepared with '-C' option.

Color-space reads

  • Prepare target(DB) sequence.

    Code Block
    $ bowtie-build -C <FASTA file> <DB name>
  • Run bowtie. If you use fastq file (read sequences with quality scores),

    Code Block
    $ bowtie \-a \-C \-q \-t \--suppress 6 <DB name> <Query fastq filename> <output filename>

    If you want to ignore quality file, and use 'fasta' format reads,

    Code Block
    $ bowtie \-a \-C \-f \-t \--suppress 6 <DB name> <Query csfasta filename> <output filename>

Normal base reads

  • Prepare target(DB) sequence.

    Code Block
    $ bowtie-build <FASTA file> <DB name>
  • Run bowtie. If you use fastq file (read sequences with quality scores),

    Code Block
    $ bowtie \-a \-q \-t \--suppress 6 <DB name> <Query fastq filename> <output filename>

    If you want to ignore quality file, and use 'fasta' format reads,

    Code Block
    $ bowtie \-a \-f \-t \--suppress 6 <DB name> <Query csfasta filename> <output filename>

bash script

I normally used the following bash script, after modifying each variable depending on data.

Code Block
#\!/bin/bash

...


export BOWTIE_INDEXES="/home/taejoon/BOWTIE.idx/"

...


BOWTIE="/home/taejoon/src64/bowtie/bowtie-0.12.5/bowtie"

...


DB="DROME_E57_cdna_c"

...



QUERY="SRR034220.called.fastq"

...


OUT="SRR034220_called.$DB.bowtie_c"

...


time $BOWTIE \-a \-C \-q \-t \--suppress 6 $DB $QUERY $OUT

...



OUT="SRR034220_called.$DB.trim5_bowtie_c"

...


time $BOWTIE \-a \-C \-q \-t \--trim5 5 \--trim3 5 \--suppress 6 $DB $QUERY $OUT

Available on

Fourierseq

Phylocluster

User documentation

...

cs2mbs ref.csfasta > ref.m.fasta

where

Code Block
ref.

...

fasta&nbsp;: reference in base space

...


ref.

...

csfasta&nbsp;: reference in color space (for temporary purposes)

...


ref.m.

...

fasta&nbsp;: reference in mock base space

2. Create bowtie indexes for the reference genome

bowtie-build ref.m.fasta refindex

where

Code Block
ref.m.

...

fasta&nbsp;: reference in mock base

...

 space
refindex&nbsp;: basename for bowtie indexes

3. Convert the reads to mock base space

cs2mbs -d -r in.csfasta > in.m.fasta

where

Code Block
in.

...

csfasta&nbsp;: reads file in color space

...


in.m.

...

fasta&nbsp;: reads file in mock base space

...


\-d&nbsp;: drop the first colorspace base during conversion. This will ignore the first color space base which is part of the primer.
\-r&nbsp;:  For each read, include the reverse of the mock base space sequence.

4. Convert the reads to fastq

fasta2fastq in.m.fasta in_QV.qual > in.m.fastq

where

Code Block
in.m.

...

fasta&nbsp;: reads file in mock base space

...


in_QV.

...

qual&nbsp;: corresponding quality file

...


in.m.

...

fastq&nbsp;: output fastq file

5. Align using bowtie

bowtie -q -n 3 --best --norc refindex in.m.fastq out

where refindex : base name for the bowtie index of the reference

Code Block
refindex&nbsp;: base name for the bowtie index of the reference
in.m.

...

fastq&nbsp;: input fastq

...

 file
out&nbsp;: mapping output file
\-q&nbsp;: indicates use of fastq file
\-n 3&nbsp;: mismatches allowed in seed ( \-v 3 can be used instead to indicate mismatches allowed in entire alignment)
\--norc&nbsp;: do not report reverse complement matches
-- best: make bowtie search till it find the best alignment (based on number of mismatches and quality values at mismatched positions)