Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Now that you've done everything the hard way, let's see how to do run an alignment pipeline using a BWA alignment script maintained by the BioITeam,  /work2work/projects/BioITeam/common/script/align_bwa_illumina.sh. Type in the script name to see its usage.

...

BWA MEM does not know about the exon structore of the genome. But it can align different sub-sections of a read to two different locations, producing two alignment records from one input read (one of the two will be marked as secondary (0x100 flag).

BWA MEM splits junction-spanning reads into two alignment records

Image Modified

Setup for BWA mem

First set up our working directory for this alignment. Since it takes a long time to build a bwa index for a large genome (here human hg38/GRCh38), we'll use one that the BioITeam maintains in its /work2/projects/BioITeam/ref_genome area.

...

Now take a look at bwa mem usage (type bwa mem with no arguments). The most important parameters are the following:

OptionEffect
-kControls the minimum seed length (default = 19)
-wControls the "gap bandwidth", or the length of a maximum gap. This is particularly relevant for MEM, since it can determine whether a read is split into two separate alignments or is reported as one long alignment with a long gap in the middle (default = 100)
-MFor split reads, mark the shorter read as secondary
-rControls how long an alignment must be relative to its seed before it is re-seeded to try to find a best-fit local match (default = 1.5, e.g. the value of -k multiplied by 1.5)
-cControls how many matches a MEM must have in the genome before it is discarded (default = 10000)
-tControls the number of threads to use

RNA-seq alignment with bwa mem

...