Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titleStart an idev session
idev -m 180 -p normal -A UT-2015-05-18 -N 1 -n 68 
# Answer "y" to the question about using the BIO_DATA_week_1 reservation
# But if that hangs ("PD") waiting for a node:
#   - Ctrl-C to cancel the idev request
#   - execute the idev command above again
#   - but answer "n" to the reservation question

Go ahead and load the bowtie2 module so we can examine some help pages and options.

Code Block
languagebash
module biocontainers
module load bowtie

 Now that it's loaded, check out the options. There are a lot of them! In fact for the full range of options and their meaning, Google "Bowtie2 manual" and bring up that page (http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml). The Table of Contents is several pages long! Ouch!

This is the key to using bowtie2 - it allows you to control almost everything about its behavior, which make it the go-to aligner for specialized alignment tasks (e.g. aligning miRNA or other small reads). But it also makes it is much more challenging to use than bwa – and it's easier to screw things up too!

Building the bowtie2 vibCho index

Before the alignment, of course, we've got to build a bowtie2- specific index using bowtie2-build. Go ahead and check out its options. Unlike for the aligner itself, we only need to worry about a few things here:

  • reference_in file is just the vibCho.O395.fa FASTA we built from GenBank records
  • bt2_index_base is the prefix of all the bowtie2-build output file

Here, to build the reference index for alignment, we only need the FASTA file. (This is not always true - extensively spliced transcriptomes requires splice junction annotations to align RNA-seq data properly.)

First create a directory specifically for the bowtie2 index, then build the index using bowtie-build.

Code Block
languagebash
titlePrepare Bowtie2 index files for vibCho

Go ahead and load the bowtie2 module so we can examine some help pages and options.

Code Block
languagebash
module biocontainers
module load bowtie

 Now that it's loaded, check out the options. There are a lot of them! In fact for the full range of options and their meaning, Google "Bowtie2 manual" and bring up that page (http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml). The Table of Contents is several pages long! Ouch!

This is the key to using bowtie2 - it allows you to control almost everything about its behavior, which make it the go-to aligner for specialized alignment tasks (e.g. aligning miRNA or other small reads). But it also makes it is much more challenging to use than bwa – and it's easier to screw things up too!

Building the bowtie2 vibCho index

Before the alignment, of course, we've got to build a bowtie2- specific index using bowtie2-build. Go ahead and check out its options. Unlike for the aligner itself, we only need to worry about a few things here:

  • reference_in file is just the vibCho.O395.fa FASTA we built from GenBank records
  • bt2_index_base is the prefix of all the bowtie2-build output file

Here, to build the reference index for alignment, we only need the FASTA file. (This is not always true - extensively spliced transcriptomes requires splice junction annotations to align RNA-seq data properly.)

First create a directory specifically for the bowtie2 index, then build the index using bowtie-build.

Code Block
languagebash
titlePrepare Bowtie2 index files for vibCho
mkdir -p $SCRATCH/core_ngs/references/bt2/vibCho
cd $SCRATCH/core_ngs/references/bt2/vibCho

# Symlink to the fasta file you created
ln -sf $SCRATCH/core_ngs/references/vibCho.O395.fa
# or, to catch up:
ln -sf $CORENGS/references/vibCho.O395.fa

bowtie2-build vibCho.O395.fa vibCho.O395

This should also go pretty fast. You can see the resulting files using ls like before.

Performing the bowtie2 alignment

We'll set up a new directory to perform the V. cholerae data alignment. But first make sure you have the FASTQ file to align and the vibCho bowtie2 index:

Code Block
languagebash
# Get the FASTQ to align
mkdir -p $SCRATCH/core_ngs/alignment/fastq
cp $CORENGS/alignment/*fastq.gz $SCRATCH/core_ngs/alignment/fastq/

# Set up the bowtie2 index
mkdir -p $SCRATCH/core_ngs/references/bt2/vibCho
cdcp $SCRATCH$CORENGS/core_ngsidx/references/bt2/vibCho

# Symlink to the fasta file you created
ln -sf/*.*  $SCRATCH/core_ngs/references/vibCho.O395.fa
# or, to catch up:
ln -sf $CORENGS/references/vibCho.O395.fa

bowtie2-build vibCho.O395.fa vibCho.O395

This should also go pretty fast. You can see the resulting files using ls like before.

Performing the bowtie2 alignment

...

/bt2/vibCho

Make sure you're in an idev session with the bowtie2 BioContainers module loaded:

Code Block
languagebash
idev -p normal -m 120 -A UT-2015-05-18 -N 1 -n 68
module load biocontainers
module load bowtie

Now set up a directlry to do this alignment, with symbolic links to the bowtie2 index directory and the directory containing the FASTQ to align:

Code Block
languagebash
mkdir -p $SCRATCH/core_ngs/alignment/vibCho
cd $SCRATCH/core_ngs/alignment
ln -s -f $WORK/core_ngs/references/bt2/vibCho vibCho/alignment/vibCho
ln -sf ../../references/bt2/vibCho
ln -sf ../../alignment/fastq fq

We'll be aligning the V. cholerae reads now in ./fq/cholera_rnaseq.fastq.gz (how many sequences does it contain?)

Note that here the data is from standard mRNA sequencing, meaning that the DNA fragments are typically longer than the reads. There is likely to be very little contamination that would require using a local rather than global alignment, or many other pre-processing steps (e.g. adapter trimming). Thus, we will run bowtie2 with default parameters, omitting options other than the input, output, and reference index. This performs a global alignment.

As you can tell from looking at the bowtie2 help message, the general syntax looks like this:

Code Block
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

So our command would look like thisexecute this bowtie2 global, single-end alignment command:

Code Block
languagebash
cd $SCRATCH/core_ngs/alignment/vibCho
bowtie2 -x vibCho/vibCho.O395 -U fastqfq/cholera_rnaseq.fastq.gz -S cholera_rnaseq.sam
Expand
titleWhat's going on?
Parameters are
 2>&1 | tee aln.log

Notes:

  • -x  vibCho/vibCho.O395.fa – prefix path of

...

  • index files
  • -U

...

  • fq/cholera_rnaseq.fastq.gz – FASTQ file for single-end (Unpaired) alignment
  • -S cholera_rnaseq.sam – tells bowtie2 to report alignments in

...

  • SAM format to the specified file
  • 2>&1 redirects standard error to standard output
    • while the alignment data is being written to the cholera_rnaseq.sam file, bowtie2 will report its progress to standard error.
  • | tee aln.log takes the bowtie2 progress output and pipes it to the tee program
    • tee takes its standard input and writes it to the specified file

...

Create a commands file called bt2_vibCho.cmds with this task definition then generate and submit a batch job for it (time 1 hour, development queue).

...

titleWhat's going on?

Use nano to create the bt2_vibCho.cmds file. Then:

...

languagebash
titleLocal bowti2 alignment of miRNA data

...

    • and also to standard output
    • that way, you can see the progress output now, but also save it to review later (or supply to MultiQC)

Since the FASTQ file is not large, this should not take too long, and you will see progress output like this:

Code Block
89006 reads; of these:
  89006 (100.00%) were unpaired; of these:
    20675 (23.23%) aligned 0 times
    38226 (42.95%) aligned exactly 1 time
    30105 (33.82%) aligned >1 times
76.77% overall alignment rate

When the job is complete you should have a cholera_rnaseq.sam file that you can examine using whatever commands you like.  Remember, to further process it downstream, you should create a sorted, indexed BAM file from this SAM output.

Exercise #5: Bowtie2 local alignment - Human microRNA-seq

...