Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Become comfortable with the basic steps of indexing a reference genome, mapping reads, and converting output to SAM/BAM format for downstream analysis.
  2. Use bowtie2 to map reads from an E. coli Illumina data set to a reference genome and compare the output.

Theory

Please see the Introduction the Introduction to mapping presentation for presentation on the course outline for more details of the theory behind read mapping algorithms and critical considerations for using these tools and references correctly.

...

Bowtie2 is a complete rewrite of an older program bowtie. In terms of configurability, sensitivity, and speed it is useful for a wide range of projects. After years of teaching bwa mapping along with bowtie2, bowtie2 alone is now taught as I never recommend anyone use bwa, and based on positive feedback we continue with this set up. For some more details about how read mappers work see the bonus presentationpresentation about read mapping details and file formats on the course home page, and if you find a compelling reason to use bwa (or any other read mapper) rather than bowtie2 after the course is over, I'd love to hear from you.

...

  1. SAM files can be enormously humongous text files (potentially >1 gigabytes). Attempting to open the entire file at once can cause your computer to lock up or your text editor to crash. You are generally safer only looking at a portion at a time using linux commands like head or grep or more or using a viewer like IGV, which we will cover in a later tutorial.
  2. SAM files have some rather complicated information encoded as text, like a binary encoded FLAGS field and CIGAR strings. We'll take a look at some of these later, if we have time, or they are covered in the bonus presentation about read mapping and file formats which you can find on the home page.

Still, you should recognize some of the information on a line in a SAM file from the input FASTQ, and some of the other information is relatively straightforward to understand, like the position where the read mapped. Give this a try:

...