Assessing Mapping Results I 2014

Objectives

In this lab, you will explore the mapping results generated previously. You will learn how to assess mapping results using samtools and unix commands. You will also look at the differences between mapping with BWA and mapping with Tophat2.


Get your data

To try out exercises, you'll need to access your mapping results. 

Get set up for the exercises
# If using your own BWA mapped results (Sam files), they should be at:
$SCRATCH/my_rnaseq_course/bwa_exercise 

 
# If not, you can get the results from: 
/corral-repl/utexas/BioITeam/rnaseq_course/bwa_aln_results
 
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course/bwa_aln_results .

Samtools

Utilities for parsing and manipulating alignment files in SAM and BAM formats.  It is useful for:

  • Sorting alignment files
  • Merging multiple alignment files
  • Converting from SAM to BAM and vice versa.
  • Retrieving reads based on different criteria : reads mapping to a particular region, reads mapping with a certain quality, unmapped reads, etc.
  • Collecting statistics about your mapping result. 

With mapping results in SAM format, we need to convert it to BAM format, sort the BAM file and create and index for the BAM file. 

Exercise 1: Convert SAM file to BAM format.

samtools view syntax
samtools view -b -S samfile > bamfile


Exercise 2: Sort and index newly created BAM file.

samtools sort syntax
samtools sort bamfile sortedbamfile
samtools index sortedbamfile

 

WE ARE NOT GOING TO DO THIS RIGHT NOW, BUT TAKE A LOOK!

Submit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by qsub.

 Show me the commands...

Put this in your commands file:

samtools view -b -S C1_R1.mem.sam > C1_R1.mem.bam && samtools sort C1_R1.mem.bam C1_R1.mem && samtools index C1_R1.mem.bam

samtools view -b -S C1_R2.mem.sam > C1_R2.mem.bam && samtools sort C1_R2.mem.bam C1_R2.mem && samtools index C1_R2.mem.bam

samtools view -b -S C1_R3.mem.sam > C1_R3.mem.bam && samtools sort C1_R3.mem.bam C1_R3.mem && samtools index C1_R3.mem.bam

samtools view -b -S C2_R1.mem.sam > C2_R1.mem.bam && samtools sort C2_R1.mem.bam C2_R1.mem && samtools index C2_R1.mem.bam

samtools view -b -S C2_R1.mem.sam > C2_R1.mem.bam && samtools sort C2_R1.mem.bam C2_R1.mem && samtools index C2_R1.mem.bam

Get already created results
#For bwa mem results
   cp $SCRATCH/my_rnaseq_course/bwa_exercise/results/bwa_mem/C1_R1.mem.bam* .

 
# If not, you can get the results from: 
/corral-repl/utexas/BioITeam/rnaseq_course/bwa_mem_results
 
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course/bwa_mem_results .

 

Exercise 3: Let's get some statistics: Samtools flagstat

Samtools flagstat can generate useful statistics about a mapped BAM file. Let's try it on one of our samples- C1_R1. Let's look at the BAM files for this sample, generated from bwa aln/sampe and generated from bwa mem. The two files are:  C1_R1.bam  and C1_R1.mem.bam

samtools flagstat command
samtools flagstat C1_R1.bam
samtools flagstat C1_R1.mem.bam


Exercise 4: Let's get some statistics: Samtools idxstats

samtools flagstat command
samtools idxstats C1_R1.bam
samtools idxstats C1_R1.mem.bam

RNASEQC

A java program that generates rnaseq specific metrics for an input BAM file. Not available on TACC yet.  Metrics like:

• total read number, number of unique reads, and number of duplicate reads
• duplication rate (number of duplicates/total reads)
• number of reads mapped/aligned 
• number of unique reads mapped 
• number of reads that are mapped to rRNA regions
• intragenic rate (reads mapped to intragenic regions/mapped unique reads)
• exonic rate (reads mapped to exonic regions/mapped unique reads)
• coding rate (reads mapped to coding regions/mapped unique reads)
• intergenic rate (reads mapped to intergenic regions/mapped unique reads)
• strand specificity metrics
• coverage metrics

 

Images from: http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/rnaseqc/RNA-SeQC_Help_v1.1.2.pdf

BACK TO COURSE OUTLINE