...
Warning | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||
Create a
|
Your final output file is in SAM format. It's just a text file, so you can peek at it and see what it's like inside. Two warnings though:
...
Code Block |
---|
head bowtie2/SRR030257.sam |
Expand | ||
---|---|---|
| ||
What do you think the 4th and 8th columns mean? |
More reading about SAM files
...
We have actually massively under-utilized Lonestar in this example. We submitted a job that reserved a single node on the cluster, but that node has 12 processors. Bowtie was only using one of those processors (a single "thread")! For programs that support multithreaded execution (and most mappers do because they are obsessed with speed) we could have sped things up by using all 12 processors for the bowtie process.
Expand | What's the command line option to enable multithreaded execution in bowtie? | What's the command line option to enable multithreaded execution in bowtie? | It's|||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
You need to use the
Try it out and compare the speed of execution by looking at the log files. |
If you want to launch many processes as part of one job, so that they are distributed one per node and use the maximum number of processors available, then you need to learn about the "wayness" of how you request nodes on Lonestar and possibly edit your *.sge script.
One consequence of using multithreading that might be confusing is that the aligned reads might appear in your output SAM file in a different order than they were in the input FASTQ. This happens because small sets of reads get continuously packaged, "sent" to the different processors, and whichever set "returns" fastest is written first. You can force them to appear in the same order (at a slight cost in speed) by adding the --reorder
flag to your command, but is typically only necessary if the reads are already ordered or you intend to do some comparison between the input and output.
Optional Exercises
In the bowtie2 example, we mapped in
--local
mode. Try mapping in--end-to-end
mode (aka global mode).- Do the BWA tutorial so you can compare their outputs.
- Did bowtie2 or BWA map more reads?
- In our examples, we mapped in paired-end mode. Try to figure out how to map the reads in single-end mode and create this output.
- Which aligner took less time to run? Are there any options you can change that:
- Lead to a larger percentage of the reads being mapped? (increase sensitivity)
- Speed up performance without causing many fewer reads to be mapped? (increase performance)
From here...
From here you can use the output SAM files to predict genome variation in the SNV Calling Tutorial (SAMtools) or view your mapped reads in the Integrative Genomics Viewer (IGV) Tutorial.