Objectives
In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:
...
- Learn how BWA works and how to use it.
Introduction
BWA (the Burrows-Wheeler Aligner) is a fast short read aligner. It's the successor to another aligner you might have used or heard of called MAQ (Mapping and Assembly with Quality). As the name suggests, it uses the burrows-wheeler transform to perform alignment in a time and memory efficient manner.
BWA Variants
BWA has three different algorithms:
- For reads upto 100 bp long:
- BWA-backtrack : BWA aln/samse/sampe
- For reads upto 1 Mbp long:
- BWA-SW
- BWA-MEM : Newer! Typically faster and more accurate.
Get your data
Six raw data files have been provided for all our further RNA-seq analysis:
- c1_r1, c1_r2, c1_r3 from the first biological condition
- c2_r1, c2_r2, and c2_r3 from the second biological conditiocondition
Code Block |
---|
title | Get set up for the exercises |
---|
|
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course_2015/bwa_exercise . &
cd bwa_exercise |
Run BWA
Load the module:
Code Block |
---|
module load bwa/0.7.7
|
There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.
...
Warning |
---|
title | Submit to the TACC queue or run in an idev shell |
---|
|
Create a commands file and use launcher_creator.py followed by qsub. Expand |
---|
nano commands.bwa Put this in your commands file: bwa aln -f GSM794483_C1_R1_1.sai reference/genome.fa data/GSM794483_C1_R1_1.fq bwa aln -f GSM794483_C1_R1_2.sai reference/genome.fa data/GSM794483_C1_R1_2.fq bwa aln -f GSM794484_C1_R2_1.sai reference/genome.fa data/GSM794484_C1_R2_1.fq bwa aln -f GSM794484_C1_R2_2.sai reference/genome.fa data/GSM794484_C1_R2_2.fq bwa aln -f GSM794485_C1_R3_1.sai reference/genome.fa data/GSM794485_C1_R3_1.fq bwa aln -f GSM794485_C1_R3_2.sai reference/genome.fa data/GSM794485_C1_R3_2.fq bwa aln -f GSM794486_C2_R1_1.sai reference/genome.fa data/GSM794486_C2_R1_1.fq bwa aln -f GSM794486_C2_R1_2.sai reference/genome.fa data/GSM794486_C2_R1_2.fq bwa aln -f GSM794487_C2_R2_1.sai reference/genome.fa data/GSM794487_C2_R2_1.fq bwa aln -f GSM794487_C2_R2_2.sai reference/genome.fa data/GSM794487_C2_R2_2.fq bwa aln -f GSM794488_C2_R3_1.sai reference/genome.fa data/GSM794488_C2_R3_1.fq bwa aln -f GSM794488_C2_R3_2.sai reference/genome.fa data/GSM794488_C2_R3_2.fq |
Expand |
---|
title | Use this Launcher_creator command |
---|
| launcher_creator.py -n aln -t 04:00:00 -j commands.bwa -q normal -a CCBB -m "module load bwa/0.7.7" -l bwa_launcher.sge |
|
*.sai file is a file containing "alignment seeds" in a file format specific to BWA. We still need to extend these seed matches into alignments of entire reads, choose the best matches, and convert the output to SAM format. Do we use sampe
or samse
?
...
Warning |
---|
title | Submit to the TACC queue or run in an idev shell |
---|
|
Create a commands file and use launcher_creator.py followed by qsub. Expand |
---|
title | I need some help figuring out the options... |
---|
| nano commands.bwa.sampe Put this in your commands file: bwa sampe -f C1_R1.sam reference/genome.fa GSM794483_C1_R1_1.sai GSM794483_C1_R1_2.sai data/GSM794483_C1_R1_1.fq data/GSM794483_C1_R1_2.fq bwa sampe -f C1_R2.sam reference/genome.fa GSM794484_C1_R2_1.sai GSM794484_C1_R2_2.sai data/GSM794484_C1_R2_1.fq data/GSM794484_C1_R2_2.fq bwa sampe -f C1_R3.sam reference/genome.fa GSM794485_C1_R3_1.sai GSM794485_C1_R3_2.sai data/GSM794485_C1_R3_1.fq data/GSM794485_C1_R3_2.fq bwa sampe -f C2_R1.sam reference/genome.fa GSM794486_C2_R1_1.sai GSM794486_C2_R1_2.sai data/GSM794486_C2_R1_1.fq data/GSM794486_C2_R1_2.fq bwa sampe -f C2_R2.sam reference/genome.fa GSM794487_C2_R2_1.sai GSM794487_C2_R2_2.sai data/GSM794487_C2_R2_1.fq data/GSM794487_C2_R2_2.fq bwa sampe -f C2_R3.sam reference/genome.fa GSM794488_C2_R3_1.sai GSM794488_C2_R3_2.sai data/GSM794488_C2_R3_1.fq data/GSM794488_C2_R3_2.fq |
|
...
Warning |
---|
title | Submit to the TACC queue or run in an idev shell |
---|
|
Create a commands file and use launcher_creator.py followed by qsub. Expand |
---|
title | I need some help figuring out the options... |
---|
| Put this in your commands file: Code Block |
---|
nano commands.mem
bwa mem reference/genome.fa data/GSM794483_C1_R1_1.fq data/GSM794483_C1_R1_2.fq > C1_R1.mem.sam
bwa mem reference/genome.fa data/GSM794484_C1_R2_1.fq data/GSM794484_C1_R2_2.fq > C1_R2.mem.sam
bwa mem reference/genome.fa data/GSM794485_C1_R3_1.fq data/GSM794485_C1_R3_2.fq > C1_R3.mem.sam
bwa mem reference/genome.fa data/GSM794486_C2_R1_1.fq data/GSM794486_C2_R1_2.fq > C2_R1.mem.sam
bwa mem reference/genome.fa data/GSM794487_C2_R2_1.fq data/GSM794487_C2_R2_2.fq > C2_R2.mem.sam
bwa mem reference/genome.fa data/GSM794488_C2_R3_1.fq data/GSM794488_C2_R3_2.fq > C2_R3.mem.sam |
|
|
Since these will take a while to run, you can look at already generated results at: /corral-repl/utexas/BioITeam/rnaseq_course_2015/bwa_exercise/results/bwabwa
Help! I have a lots of reads and a large number of reads. Make BWA go faster!
Now that we are done mapping, lets look at how to assess mapping results.