Objectives
In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:
...
Code Block | ||
---|---|---|
| ||
ls ../data ls ../reference #transcriptome head ../reference/transcripts.fasta #see how many transcripts there are in the file grep -c '^>' ../reference/transcripts.fasta #genome head ../reference/genome.fa #see how many sequences there are in the file grep -c '^>' ../reference/genome.fa #annotation head ../reference/genes.formatted.gtf #see how many entries there are in this file wc -l ../reference/genes.formatted.gtf |
Run BWA
Load the module:
Code Block |
---|
module load biocontainers
module load bwa
#to get the full path for bwa
type bwa |
You can see the different commands available under the bwa package from the command line help:
Code Block |
---|
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa #this may need to run in an ideaidev session since biocontainer modules cannot be run on the login nodes. |
...
All the files starting with the prefix transcripts.fasta are your BWA index files.
Code Block |
---|
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa index -a bwtsw ../reference/transcripts.fasta |
...
Warning | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||
Create a Make sure each command is one line in your commands file.
|
Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome
Alternatively, we can also use bwa to map to the genome (reference/genome.fa).
Now that we are done mapping, lets look at how to assess mapping results.