Mapping with BWA

Objectives

In this lab, you will explore a popular fast mapper called Bowtie2. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:

Learn how BWA works and how to use it.

Introduction

BWA (the Burrows-Wheeler Aligner) is a fast short read aligner. It's the successor to another aligner you might have used or heard of called MAQ (Mapping and Assembly with Quality). As the name suggests, it uses the burrows-wheeler transform to perform alignment in a time and memory efficient manner.

BWA Variants

BWA has three different algorithms:

For reads upto 100 bp long:
- BWA-backtrack : BWA aln/samse/sampe
For reads upto 1 Mbp long:
- BWA-SW
- BWA-MEM : Newer! Typically faster and more accurate.

Run BWA

Load the module:

module load bwa

There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.

Here are some commands that could help...

module spider bwa
module list
bwa

Create a fresh output directory. Be sure you are back in your main intro_to_mapping directory. Then:

mkdir bwa

You can see the different commands available under the bwa package from the command line help:

bwa

Part 1. Create a index of your reference

bwa index -a bwtsw reference/genome.fa

Part 2a. Align the samples to reference using bwa aln/samse/sampe

You will need to run this set of commands (with options that you should try to figure out) in this order, on each sample:

bwa index
bwa aln
bwa samse or sampe

What's going on at each step?

Remember to use the option that enables multithreading, if there is one, for each BWA command.

Submit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by qsub.

I need some help figuring out the options...

Put this in your commands file:

bwa aln -t 6 -f bwa/SRR030257_1.sai bwa/NC_012967.1.fasta SRR030257_1.fastq
bwa aln -t 6 -f bwa/SRR030257_2.sai bwa/NC_012967.1.fasta SRR030257_2.fastq

Why did we use -t 6 instead of -t 12 for multithreading? Both of our commands are going to go to a single node on Lonestar, so they should share the 12 available cores.

Again, take a look at your output directory using ls bwa to see what new files have appeared. What is a *.sai file? It's a file containing "alignment seeds" in a file format specific to BWA. Many programs produce this kind of "intermediate" file in their own format and then at the end have tools for converting things to a "community" format shared by many downstream programs.

We still need to extend these seed matches into alignments of entire reads, choose the best matches, and convert the output to SAM format.

Do we use sampe or samse?

Submit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by qsub.

I need some help figuring out the options...

Put this in your commands file:

bwa sampe -f bwa/SRR030257.sam bwa/NC_012967.1.fasta bwa/SRR030257_1.sai bwa/SRR030257_2.sai SRR030257_1.fastq SRR030257_2.fastq

Part 2b. Align the samples to reference using bwa mem

Alignment is just one single sep with bwa mem.

Submit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by qsub.

I need some help figuring out the options...

Put this in your commands file:

bwa mem

Assessing Mapping Results

samtools view, sort, index

run BWA pipeline