...
After bowtie2 came out with a local alignment option, it wasn't long before bwa developed its own local alignment algorithm called BWA-MEM (for Maximal Exact Matches), implemented by the bwa mem command. bwa mem has the following advantages:
- It incorporates a lot of provides the simplicity of using bwa with without the complexities of local alignment, enabling straightforward alignment of datasets like the mirbase data we just examined
- It can align different portions of a read to different locations on the genome
- In a total RNA-seq experiment, reads will (at some frequency) span a splice junction themselves
- or a pair of reads in a paired-end library will fall on either side of a splice junction.
- We want to be able to align these splice-adjacent reads for many reasons, from accurate transcript quantification to novel fusion transcript discovery.
- In a total RNA-seq experiment, reads will (at some frequency) span a splice junction themselves
...
A word about real splice-aware aligners
Using BWA bwa mem for RNA-seq alignment is sort of a "poor man's" RNA-seq alignment method. Real splice-aware aligners like tophat2, hisat2 or STAR have more complex algorithms (as shown below) – and take a lot more time!
...
BWA MEM does not know about the exon structore structure of the genome. But it can align different sub-sections of a read to two different locations, producing two alignment records from one input read (one of the two will be marked as secondary (0x100 flag).
...
First set up our working directory for this alignment. Since it takes a long time to build a bwa index for a large genome (here human hg38/GRCh38), we'll use one that the BioITeam maintains in its /work2work/projects/BioITeam/ref_genome area.
Code Block | ||||
---|---|---|---|---|
| ||||
# Make sure you're in an idev session idev -m 120 -pN normal1 -A UT-2015-05-18OTH21164 -Nr 1CoreNGSday4 -n 68 # Load the modules we'll need module load biocontainers module load bwa module load samtools # Copy over the FASTQ data if needed mkdir -p $SCRATCH/core_ngs/alignment/fastq cp $CORENGS/alignment/*.gz $SCRATCH/core_ngs/alignment/fastq/ # Make a new alignment directory for running these scripts cds mkdir -p $SCRATCH/core_ngs/alignment/bwamem cd $SCRATCH/core_ngs/alignment/bwamem ln -sf ../fastq ln -sf /work2work/projects/BioITeam/ref_genome/bwa/bwtsw/hg38 |
...
Tip |
---|
Be aware that some downstream tools (for example the Picard suite, often used before SNP calling) do not like it when a read name appears more than once in the SAM file. Such reads can be filtered, but only if they can be identified as secondary by specifying the bwa mem -M option as we did above. This option leaves reports the longest alignmen alignment normally but marks additional alignments for the read as secondary (the 0x100 BAM flag). This designation also allows you to easily filter the secondary reads with samtools view -F 0x104 if desired. |
...