Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

alignment typealigner optionspro'scon's
global with bwa 

single end reads:

  • bwa aln <R1>
  • bwa samse

paired end reads:

  • bwa aln <R1>
  • bwa aln <R2>
  • bwa sampe
  • simple to use (take default options)
  • good for basic global alignment
  • multiple steps needed
global with bowtie2bowtie2 
  • extremely configurable
  • can be used for RNAseq alignment (after adapter trimming) because of its many options
  • complex (many options)
local with bwa bwa mem
  • simple to use (take default options)
  • very fast
  • no adapter trimming needed
  • good for simple RNAseq analysis
    • the secondary alignments it reports can provide splice junction information
  • always produces alignments with secondary reads
    • must be filtered if not desired
local with bowtie2bowtie2 --local
  • extremely configurable
  • no adapter trimming needed
  • good for small RNA alignment because of its many options
  • complex – many options

...

We're going to skip the trimming step for now and see how it goes. We'll perform steps 2 - 5 now and leave , leaving samtools for a later exercise since steps 6 - 10 are common to nearly all post-alignment workflows.

...

Code Block
languagebash
titleStart an idev session
idev -m 180 -N 1 -A OTH21164 -r CoreNGS      # or -A TRA23004

idev -m 120 -N 1 -A OTH21164 -p development  # or -A TRA23004


Code Block
languagebash
module load biocontainers  # takes a while
module load bwa
bwa

...

Expand
titleAnswer

The last few lines of bwa's execution output should look something like this:

Code Block
languagebash
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 50.76 sec
[bwa_aln_core] write to the disk... 0.07 sec
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 50.35 sec
[bwa_aln_core] write to the disk... 0.07 sec
[bwa_aln_core] 524288 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 13.64 sec
[bwa_aln_core] write to the disk... 0.01 sec
[bwa_aln_core] 592180 sequences have been processed.
[main] Version: 0.7.17-r1188
[main] CMD: /usr/local/bin/bwa aln sacCer3/sacCer3.fa fastq/Sample_Yeast_L005_R1.cat.fastq.gz
[main] Real time: 7885.185584 sec; CPU: 7783.598825 sec

So the R2 alignment took ~78 ~85 seconds (~1.3 4 minutes).

Since you have your own private compute node, you can use all its resources. It has 128 cores, so re-run the R2 alignment asking for 60 execution threads.

...

Expand
titleAnswer

The last few lines of bwa's execution output should look something like this:

Code Block
languagebash
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 266.70 sec
[bwa_aln_core] write to the disk... 0.04 sec
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 268.94 sec
[bwa_aln_core] write to the disk... 0.03 sec
[bwa_aln_core] 524288 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 72.26 sec
[bwa_aln_core] write to the disk... 0.01 sec
[bwa_aln_core] 592180 sequences have been processed.
[main] Version: 0.7.17-r1188
[main] CMD: /usr/local/bin/bwa aln -t 60 sacCer3/sacCer3.fa fastq/Sample_Yeast_L005_R2.cat.fastq.gz
[main] Real time: 57.013931 sec; CPU: 142179.813153 sec

So the R2 alignment took only ~5 ~8 seconds (real time), or 1510+ times as fast as with only one processing thread.

Note, though, that the CPU time with 60 threads was greater (142.8 ~180 sec) than with only 1 thread (77.6 ~85 sec). That's because of the thread management overhead when using multiple threads.

...

Code Block
languagebash
titleCut syntax for a single field
tail yeast_pe.sam | cut -f 3

By default cut assumes the field delimiter is Tab, which is the delimiter used in the majority of NGS file formats. You can specify a different delimiter with the -d option.

...

Code Block
languagebash
titleCount aligned SAM records
grep -v -P '^@' yeast_pe.sam | cut -f 3 | grep -v '*' | wc -l

Read more at Some Linux commands: Advanced commands

Exercise: About how many records represent aligned sequences? What alignment rate does this represent?

Expand
titleAnswer

The expression above returns 612,968. There were 1,184,360 records total, so the percentage is:

Code Block
languagebash
titleCalculate alignment rate
awk 'BEGIN{print 612968/1184360}'

or about 51%. Not great.

Note we perform this calculation in awk's BEGIN block, which is always executed, instead of the body block, which is only executed for lines of input. And here we call awk without piping it any input. See Linux fundamentals: cut,sort,uniq,grep,awk

Exercise: What might we try in order to improve the alignment rate?

...

Expand
titleMake sure you're in a idev session


Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-Thu     # or -A TRA23004

idev -m 90 -N 1 -A OTH21164 -p development  # or -A TRA23004



Code Block
languagebash
# If not already loaded
module load biocontainers  # takes a while

module load samtools
samtools

...

Exercise: What samtools view option will include the header records in its output? Which option would show only the header records?

Expand
titleHint

Note that samtools (like bwa) writes its help to standard error, but less and more only accept input on standard input. So the syntax redirecting standard error to standard input must be used before the pipe to less or more.

samtools view 2>&1 | less

then search for "header" ( /header )


Expand
titleAnswer

samtools view -h shows header records along with alignment records.

samtools view -H shows header records only.

...

Here we use the tee command which reports its standard input outputto standard output before also writing it to the specified file.

...