Page Comparison

...

Expand

title	Command break down

Command portion	Purpose
-t	Print wall clock time each step takes.
-p 8	Use 8 processors. As discussed above and below this is selected so the command will finish in a reasonable amount of time
-x bowtie2/NC_012967.1	listing the location and name of the index we created above with the bowtie2-build command
-1 SRR030257_1.fastq	Read 1 file name (note if not using the -1 and -2 options reads would not be mapped in paired end mode)
-2 SRR030257_2.fastq	Read 2 file name (note if not using the -1 and -2 options reads would not be mapped in paired end mode)
-S bowtie2/SRR030257.sam	Output mapped reads in sam format at given location with given name

Your final output file is in SAM format. It's just a text file, so you can peek at it and see what it's like inside. Two warnings though:

...

Expand

title	See if you can figure out how to re-run this using all 68 cores. Click here for a hint

You need to use the -p, for "processors" option. Since we had 68 cores available to our job.

Code Block

language	bash
title	click here to check your answer
collapse	true

bowtie2 -t -p 68 -x bowtie2/NC_012967.1 -1 SRR030257_1.fastq -2 SRR030257_2.fastq -S bowtie2/SRR030257.sam

Try it out and compare the speed of execution by looking at the log files.

Expand

title	How much faster was it using all 68 processors?

8 processor took a little over 5 minutes, 68 processors took ~ 57 seconds. Can you think of any reasons why it was ~ 5x faster rather than ~8x faster?

Expand

title	Answer

Anytime you use multiprocessing correctly things will go faster, but even if a program can divide the input perfectly among all available processors, and combine the outputs back together perfectly, there is "overhead" in dividing things up and recombining them. These are the types of considerations you may have to make with your data: When is it better to give more processors to a single sample? How fast do I actually need the data to come back?

An additional note from the stampede2 user manual is that while there are 68 cores available, and each core is capable of hyperthreading 4 x processors per core using all 272 processors is rarely the go to solution. While I am sure that this is more rigorously and appropriately tested in some other manner, I ran a test using different numbers of processors with the following results:

-p option	time (min:sec)
272	1:54
136	1:13
68	0:57
34	1:14
17	2:25
8	5:12
4	9:01
2
1

Again while there are almost certainly better ways to benchmark this, there are 2 things of note that are illustrated here:

~doubling the number of processors does not reduce the time in half, and while some applications may use hyperthreading on the individual cores appropriately, and assuming a program can/will actually makes things take longer.
Working on your laptop (which likely has at most 4-8 processors available) would significantly increase the amount of time these tutorials take.

...

In the bowtie2 example, we mapped in --local mode. Try mapping in --end-to-end mode (aka global mode).
Do the BWA tutorial so you can compare their outputs (note BWA has a conda package making it even easier to try).
- Did bowtie2 or BWA map more reads?
- In our examples, we mapped in paired-end mode. Try to figure out how to map the reads in single-end mode and create this output.
- Which aligner took less time to run? Are there any options you can change that:
  - Lead to a larger percentage of the reads being mapped? (increase sensitivity)
  - Speed up run time without causing many fewer reads to be mapped? (increase performance)

Here is a link to help you return to the GVA 2021 course schedule.

Versions Compared

Old Version 3

New Version 4

Key