...
- Become comfortable with the basic steps of indexing a reference genome, mapping reads, and converting output to
SAM/BAM
format for downstream analysis. - Use bowtie2 to map reads from an E. coli Illumina data set to a reference genome and compare the output.
Theory
Please see the Introduction the Introduction to mapping presentation on presentation on the course outline for more details of the theory behind read mapping algorithms and critical considerations for using these tools and references correctly.
...
Expand | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
We just went thought a lot of work to make sure we installed the version that we wanted, but sometimes we need to work the other way and figure out what version we have already been working with such as yesterday with cutadapt.
The above should show you now have access to version 2.4.5. If you have a different version listed (such as 2.3.5 or 2.3.2) make sure you are using the conda installation with access to conda forge and not relying on the TACC module, and then get my attention for clarification. |
Building an index
For many read mappers, the first step in mapping reads to a genome is quite often indexing the reference file. Put the output of this command into the bowtie
directory we created a minute ago. The command you need is:
Code Block |
---|
bowtie2-build
|
Try typing this alone in the terminal and figuring out what to do from the help show just from typing the command by itself.
Expand | ||
---|---|---|
| ||
The command requires 2 arguments. The first argument is the reference sequence in FASTA format. The second argument is the "base" file name to use for the created index files. It will create a bunch of files beginning bowtie/NC_012967.1*. |
...
title | IMPORTANT |
---|
The following command is extremely taxing to the head node and thus means we should not run it on the head node (especially when all of us are doing it at once). In fact, in previous years, TACC has noticed the spike in usage when multiple students forgot to make sure they were on idev nodes and complained pretty forcefully to us about it. Let's not have this be one of those years. Use the hostname
or showq -u
command to make sure you are on an idev node.
...
-bash: showq: command not found
...
If you are not sure if you are on an idev node or are seeing other output with one or both commands, speak up on zoom and I'll show(q) -u what you are looking for. Yes, your instructor likes bad puns. My apologies.
...
Warning | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
The following command is extremely taxing to the head node and thus means we should not run it on the head node (especially when all of us are doing it at once). In fact, in previous years, TACC has noticed the spike in usage when multiple students forgot to make sure they were on idev nodes and complained pretty forcefully to us about it. Let's not have this be one of those years. Use the
If you are not sure if you are on an idev node or are seeing other output with one or both commands, speak up on zoom and I'll show(q) -u what you are looking for. Yes, your instructor likes bad puns. My apologies. If you are not on an idev node, and need help to relaunch it, click over to the idev tutorial. |
For many read mappers, the first step in mapping reads to a genome is quite often indexing the reference file. Put the output of this command into the bowtie
directory we created a minute ago. The command you need is:
Code Block |
---|
bowtie2-build
|
Try typing this alone in the terminal and figuring out what to do from the help show just from typing the command by itself.
Expand | ||
---|---|---|
| ||
The command requires 2 arguments. The first argument is the reference sequence in FASTA format. The second argument is the "base" file name to use for the created index files. It will create a bunch of files beginning bowtie/NC_012967.1*. |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
bowtie2-build NC_012967.1.fasta bowtie2/NC_012967.1 |
...
In the bowtie2 example, we mapped in
--local
mode. Try mapping in--end-to-end
mode (aka global mode).- Do the BWA tutorial so you can compare their outputs (note BWA has a conda package making it even easier to try).
- Did bowtie2 or BWA map more reads?
- In our examples, we mapped in paired-end mode. Try to figure out how to map the reads in single-end mode and create this output.
- Which aligner took less time to run? Are there any options you can change that:
- Lead to a larger percentage of the reads being mapped? (increase sensitivity)
- Speed up run time without causing many fewer reads to be mapped? (increase performance)
...