...
In this lab, you will explore a popular fast mapper called Bowtie2BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:
...
- For reads upto 100 bp long:
- BWA-backtrack : BWA aln/samse/sampe
- For reads upto 1 Mbp long:
- BWA-SW
- BWA-MEM : Newer! Typically faster and more accurate.
Get your data
Code Block |
---|
title | Get oriented for the exercises |
---|
|
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course/bwa_exercise . &
cd bwa_exercise |
Run BWA
Load the module:
Code Block |
---|
module load bwa
|
...
Expand |
---|
title | Here are some commands that could help... |
---|
|
Code Block |
---|
module spider bwa
module list
bwa
|
|
Create a fresh output directory. Be sure you are back in your main intro_to_mapping
directory. Then:
You can see the different commands available under the bwa package from the command line help:
Part 1. Create a index of your reference
...
What's going on at each step?Remember to use the option that enables multithreading, if there is one, for each BWA command.
Warning |
---|
title | Submit to the TACC queue or run in an idev shell |
---|
|
Create a commands file and use launcher_creator.py followed by qsub. Expand |
---|
title | I need some help figuring out the options... |
---|
| Put this in your commands file: codebwa aln -f GSM794483_C1_R1_1.sai reference/genome.fa GSM794483_C1_R1_1.fq bwa aln -t 6 -f bwa/SRR030257_1.sai bwa/NC_012967.1.fasta SRR030257_1.fastq
bwa aln -t 6 -f bwa/SRR030257_2.sai bwa/NC_012967.1.fasta SRR030257_2.fastq
Why did we use -t 6 instead of -t 12 for multithreading? Both of our commands are going to go to a single node on Lonestar, so they should share the 12 available cores. |
|
...
-f GSM794483_C1_R1_2.sai reference/genome.fa GSM794483_C1_R1_2.fq bwa aln -f GSM794484_C1_R2_1.sai reference/genome.fa GSM794484_C1_R2_1.fq bwa aln -f GSM794484_C1_R2_2.sai reference/genome.fa GSM794484_C1_R2_2.fq bwa aln -f GSM794485_C1_R3_1.sai reference/genome.fa GSM794485_C1_R3_1.fq bwa aln -f GSM794485_C1_R3_2.sai reference/genome.fa GSM794485_C1_R3_2.fq bwa aln -f GSM794486_C2_R1_1.sai reference/genome.fa GSM794486_C2_R1_1.fq bwa aln -f GSM794486_C2_R1_2.sai reference/genome.fa GSM794486_C2_R1_2.fq bwa aln -f GSM794487_C2_R2_1.sai reference/genome.fa GSM794487_C2_R2_1.fq bwa aln -f GSM794487_C2_R2_2.sai reference/genome.fa GSM794487_C2_R2_2.fq |
|
Since this will take a while to run, you can look at already generated results at: /corral-repl/utexas/BioITeam/rnaseq_course/bwa_exercise/results
What is a *.sai file? It's a file containing "alignment seeds" in a file format specific to BWA. Many programs produce this kind of "intermediate" file in their own format and then at the end have tools for converting things to a "community" format shared by many downstream programs.
...
Warning |
---|
title | Submit to the TACC queue or run in an idev shell |
---|
|
Create a commands file and use launcher_creator.py followed by qsub. Expand |
---|
title | I need some help figuring out the options... |
---|
| Put this in your commands file: Code Block | bwa sampe -f bwa/SRR030257.sam bwa/NC_012967.1.fasta bwa/SRR030257_1.sai bwa/SRR030257_2.sai SRR030257_1.fastq SRR030257_2.fastq
bwa sampe -f C1_R1.sam reference/genome.fa GSM794483_C1_R1_1.sai GSM794483_C1_R1_2.sai GSM794483_C1_R1_1.fq GSM794483_C1_R1_2.fq bwa sampe -f C1_R2.sam reference/genome.fa GSM794484_C1_R2_1.sai GSM794484_C1_R2_2.sai GSM794484_C1_R2_1.fq GSM794484_C1_R2_2.fq bwa sampe -f C1_R3.sam reference/genome.fa GSM794485_C1_R3_1.sai GSM794485_C1_R3_2.sai GSM794485_C1_R3_1.fq GSM794485_C1_R3_2.fq bwa sampe -f C2_R1.sam reference/genome.fa GSM794486_C2_R1_1.sai GSM794486_C2_R1_2.sai GSM794486_C2_R1_1.fq GSM794486_C2_R1_2.fq bwa sampe -f C2_R2.sam reference/genome.fa GSM794487_C2_R2_1.sai GSM794487_C2_R2_2.sai GSM794487_C2_R2_1.fq GSM794487_C2_R2_2.fq bwa sampe -f C2_R3.sam reference/genome.fa GSM794488_C2_R3_1.sai GSM794488_C2_R3_2.sai GSM794488_C2_R3_1.fq GSM794488_C2_R3_2.fq |
|
Part 2b. Align the samples to reference using bwa mem
Alternatively, lets also try running alignment using the newest and greatest, BWA MEM. Alignment is just one single sep step with bwa mem.
Warning |
---|
title | Submit to the TACC queue or run in an idev shell |
---|
|
Create a commands file and use launcher_creator.py followed by qsub. Expand |
---|
title | I need some help figuring out the options... |
---|
| Put this in your commands file: Code Block |
---|
bwa mem reference/genome.fa GSM794483_C1_R1_1.fq GSM794483_C1_R1_2.fq > C1_R1.mem.sam
bwa mem reference/genome.fa GSM794484_C1_R2_1.fq GSM794484_C1_R2_2.fq > C1_R2.mem.sam
bwa mem reference/genome.fa GSM794485_C1_R3_1.fq GSM794485_C1_R3_2.fq > C1_R3.mem.sam
bwa mem reference/genome.fa GSM794486_C2_R1_1.fq GSM794486_C2_R1_2.fq > C2_R1.mem.sam
bwa mem reference/genome.fa GSM794487_C2_R2_1.fq GSM794487_C2_R2_2.fq > C2_R2.mem.sam
bwa mem reference/genome.fa GSM794488_C2_R3_1.fq GSM794488_C2_R3_2.fq > C2_R3.mem.sam |
|
|
Assessing Mapping Results
samtools view, sort, index
...