...
Now that you've done everything the hard way, let's see how to do run an alignment pipeline using a BWA alignment script maintained by the BioITeam, /work2work/projects/BioITeam/common/script/align_bwa_illumina.sh. Type in the script name to see its usage.
...
BWA MEM does not know about the exon structore of the genome. But it can align different sub-sections of a read to two different locations, producing two alignment records from one input read (one of the two will be marked as secondary (0x100 flag).
BWA MEM splits junction-spanning reads into two alignment records |
---|
Setup for BWA mem
First set up our working directory for this alignment. Since it takes a long time to build a bwa index for a large genome (here human hg38/GRCh38), we'll use one that the BioITeam maintains in its /work2/projects/BioITeam/ref_genome area.
...
Now take a look at bwa mem usage (type bwa mem with no arguments). The most important parameters are the following:
Option | Effect |
---|---|
-k | Controls the minimum seed length (default = 19) |
-w | Controls the "gap bandwidth", or the length of a maximum gap. This is particularly relevant for MEM, since it can determine whether a read is split into two separate alignments or is reported as one long alignment with a long gap in the middle (default = 100) |
-M | For split reads, mark the shorter read as secondary |
-r | Controls how long an alignment must be relative to its seed before it is re-seeded to try to find a best-fit local match (default = 1.5, e.g. the value of -k multiplied by 1.5) |
-c | Controls how many matches a MEM must have in the genome before it is discarded (default = 10000) |
-t | Controls the number of threads to use |
RNA-seq alignment with bwa mem
...