Mapping with BWA
Objectives
In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:
- Learn how BWA works and how to use it.
Introduction
BWA (the Burrows-Wheeler Aligner) is a fast short read aligner. It is an unspliced mapper. As the name suggests, it uses the burrows-wheeler transform to perform alignment in a time and memory efficient manner.
Get your data
Six raw data files have been provided for all our further RNA-seq analysis:
- c1_r1, c1_r2, c1_r3 from the first biological condition
- c2_r1, c2_r2, and c2_r3 from the second biological condition
cds cd my_rnaseq_course cd day_2/bwa_exercise
Lets look at the data files and reference files
ls ../data ls ../reference #transcriptome head ../reference/transcripts.fasta #see how many transcripts there are in the file grep -c '^>' ../reference/transcripts.fasta #genome head ../reference/genome.fa #see how many sequences there are in the file grep -c '^>' ../reference/genome.fa #annotation head ../reference/genes.formatted.gtf #see how many entries there are in this file wc -l ../reference/genes.formatted.gtf
Run BWA
Load the module:
module load biocontainers module load bwa #to get the full path for bwa type bwa
You can see the different commands available under the bwa package from the command line help:
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa #this may need to run in an idev session since biocontainer modules cannot be run on the login nodes.
Part 1. Create a index of your reference
NO NEED TO RUN THIS NOW- YOUR INDEX HAS ALREADY BEEN BUILT!
All the files starting with the prefix transcripts.fasta are your BWA index files.
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa index -a bwtsw ../reference/transcripts.fasta
Part 2. Align the samples to reference using bwa mem
Submit to the TACC queue or run in idev session
Create a commands
file and use launcher_creator.py followed by sbatch.
Make sure each command is one line in your commands file.
nano commands.mem #Enter these lines into the file singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R1.mem.sam ../reference/transcripts.fasta ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R2.mem.sam ../reference/transcripts.fasta ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R3.mem.sam ../reference/transcripts.fasta ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R1.mem.sam ../reference/transcripts.fasta ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R2.mem.sam ../reference/transcripts.fasta ../data/GSM794487_C2_R2_1.fq ../data/GSM794487_C2_R2_2.fq singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R3.mem.sam ../reference/transcripts.fasta ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq #ctrl+X to exit nano #Y, followed by enter to save file
Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome
Alternatively, we can also use bwa to map to the genome (reference/genome.fa).
Now that we are done mapping, lets look at how to assess mapping results.
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.