Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Become comfortable with the basic steps of indexing a reference genome, mapping reads, and converting output to SAM/BAM format for downstream analysis.
  2. Use bowtie2 to map reads from an E. coli Illumina data set to a reference genome and compare the output.

Theory

Please see the Introduction the Introduction to mapping presentation on presentation on the course outline for more details of the theory behind read mapping algorithms and critical considerations for using these tools and references correctly.

...

Expand
titleHow to check what version of bowtie2 was loaded?

We just went thought a lot of work to make sure we installed the version that we wanted, but sometimes we need to work the other way and figure out what version we have already been working with such as  yesterday with cutadapt.

Code Block
languagebash
titleRecall many programs have a --version flag that can be used to retrieve this information, and conda has a 'list' function that takes additional optional arguments
collapsetrue
bowtie2 --version
conda list bowtie2

The above should show you now have access to version 2.4.5. If you have a different version listed (such as 2.3.5 or 2.3.2) make sure you are using the conda installation with access to conda forge and not relying on the TACC module, and then get my attention for clarification.

Building an index

For many read mappers, the first step in mapping reads to a genome is quite often indexing the reference file. Put the output of this command into the bowtie directory we created a minute ago. The command you need is:

Code Block
bowtie2-build

Try typing this alone in the terminal and figuring out what to do from the help show just from typing the command by itself.

Expand
titleIf you're stuck click here for an explanation of what arguments the command does need

The command requires 2 arguments. The first argument is the reference sequence in FASTA format. The second argument is the "base" file name to use for the created index files. It will create a bunch of files beginning bowtie/NC_012967.1*.

...

titleIMPORTANT

The following command is extremely taxing to the head node and thus means we should not run it on the head node (especially when all of us are doing it at once). In fact, in previous years, TACC has noticed the spike in usage when multiple students forgot to make sure they were on idev nodes and complained pretty forcefully to us about it. Let's not have this be one of those years. Use the hostname or  showq -u command to make sure you are on an idev node.

...

-bash: showq: command not found

...

If you are not sure if you are on an idev node or are seeing other output with one or both commands, speak up on zoom and I'll show(q) -u what you are looking for. Yes, your instructor likes bad puns. My apologies.

...

Warning
titleIMPORTANT

The following command is extremely taxing to the head node and thus means we should not run it on the head node (especially when all of us are doing it at once). In fact, in previous years, TACC has noticed the spike in usage when multiple students forgot to make sure they were on idev nodes and complained pretty forcefully to us about it. Let's not have this be one of those years. Use the hostname or  showq -u command to make sure you are on an idev node.

Commandon idev nodeon head node
hostnamelists a compute node starting with a C followed by a number before "stampede2.tacc.utexas.edu"lists a login node plus number before "stampede2.tacc.utexas.edu"
showq -u

-bash: showq: command not found

shows you a summary of jobs you have. (very likely empty during these tutorials)

If you are not sure if you are on an idev node or are seeing other output with one or both commands, speak up on zoom and I'll show(q) -u what you are looking for. Yes, your instructor likes bad puns. My apologies.

If you are not on an idev node, and need help to relaunch it, click over to the idev tutorial.

For many read mappers, the first step in mapping reads to a genome is quite often indexing the reference file. Put the output of this command into the bowtie directory we created a minute ago. The command you need is:

Code Block
bowtie2-build

Try typing this alone in the terminal and figuring out what to do from the help show just from typing the command by itself.

Expand
titleIf you're stuck click here for an explanation of what arguments the command does need

The command requires 2 arguments. The first argument is the reference sequence in FASTA format. The second argument is the "base" file name to use for the created index files. It will create a bunch of files beginning bowtie/NC_012967.1*.


Code Block
languagebash
titleClick here to check your work, or for the answer if needed
collapsetrue
bowtie2-build NC_012967.1.fasta bowtie2/NC_012967.1

...

  • In the bowtie2 example, we mapped in --local mode. Try mapping in --end-to-end mode (aka global mode).

  • Do the BWA tutorial so you can compare their outputs (note BWA has a conda package making it even easier to try).
    • Did bowtie2 or BWA map more reads?
    • In our examples, we mapped in paired-end mode. Try to figure out how to map the reads in single-end mode and create this output.
    • Which aligner took less time to run? Are there any options you can change that:
      • Lead to a larger percentage of the reads being mapped? (increase sensitivity)
      • Speed up run time without causing many fewer reads to be mapped? (increase performance)

...