...
alignment type | aligner options | pro's | con's |
---|---|---|---|
global with bwa | single end reads:
paired end reads:
|
|
|
global with bowtie2 | bowtie2 |
|
|
local with bwa | bwa mem |
|
|
local with bowtie2 | bowtie2 --local |
|
|
...
We're going to skip the trimming step for now and see how it goes. We'll perform steps 2 - 5 now and leave , leaving samtools for a later exercise since steps 6 - 10 are common to nearly all post-alignment workflows.
...
Code Block | ||||
---|---|---|---|---|
| ||||
idev -m 180 -N 1 -A OTH21164 -r CoreNGS # or -A TRA23004 idev -m 120 -N 1 -A OTH21164 -p development # or -A TRA23004 |
Code Block | ||
---|---|---|
| ||
module load biocontainers # takes a while module load bwa bwa |
...
Expand | |||||
---|---|---|---|---|---|
| |||||
The last few lines of bwa's execution output should look something like this:
So the R2 alignment took ~78 ~85 seconds (~1.3 4 minutes). |
Since you have your own private compute node, you can use all its resources. It has 128 cores, so re-run the R2 alignment asking for 60 execution threads.
...
Expand | |||||
---|---|---|---|---|---|
| |||||
The last few lines of bwa's execution output should look something like this:
So the R2 alignment took only ~5 ~8 seconds (real time), or 1510+ times as fast as with only one processing thread. Note, though, that the CPU time with 60 threads was greater (142.8 ~180 sec) than with only 1 thread (77.6 ~85 sec). That's because of the thread management overhead when using multiple threads. |
...
Code Block | ||||
---|---|---|---|---|
| ||||
tail yeast_pe.sam | cut -f 3 |
By default cut assumes the field delimiter is Tab, which is the delimiter used in the majority of NGS file formats. You can specify a different delimiter with the -d option.
...
Code Block | ||||
---|---|---|---|---|
| ||||
grep -v -P '^@' yeast_pe.sam | cut -f 3 | grep -v '*' | wc -l |
Read more at Some Linux commands: Advanced commands
Exercise: About how many records represent aligned sequences? What alignment rate does this represent?
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
The expression above returns 612,968. There were 1,184,360 records total, so the percentage is:
or about 51%. Not great. Note we perform this calculation in awk's BEGIN block, which is always executed, instead of the body block, which is only executed for lines of input. And here we call awk without piping it any input. See Linux fundamentals: cut,sort,uniq,grep,awk |
Exercise: What might we try in order to improve the alignment rate?
...
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
|
Code Block | ||
---|---|---|
| ||
# If not already loaded module load biocontainers # takes a while module load samtools samtools |
...
Exercise: What samtools view option will include the header records in its output? Which option would show only the header records?
Expand | ||
---|---|---|
| ||
Note that samtools (like bwa) writes its help to standard error, but less and more only accept input on standard input. So the syntax redirecting standard error to standard input must be used before the pipe to less or more. samtools view 2>&1 | less then search for "header" ( /header ) |
Expand | ||
---|---|---|
| ||
samtools view -h shows header records along with alignment records. samtools view -H shows header records only. |
...
Here we use the tee command which reports its standard input outputto standard output before also writing it to the specified file.
...