Assessing Mapping Results I 2014
Objectives
In this lab, you will explore the mapping results generated previously. You will learn how to assess mapping results using samtools and unix commands. You will also look at the differences between mapping with BWA and mapping with Tophat2.
Get your data
To try out exercises, you'll need to access your mapping results.
# If using your own BWA mapped results (Sam files), they should be at: $SCRATCH/my_rnaseq_course/bwa_exercise # If not, you can get the results from: /corral-repl/utexas/BioITeam/rnaseq_course/bwa_aln_results cds cd my_rnaseq_course cp -r /corral-repl/utexas/BioITeam/rnaseq_course/bwa_aln_results .
Samtools
Utilities for parsing and manipulating alignment files in SAM and BAM formats. It is useful for:
- Sorting alignment files
- Merging multiple alignment files
- Converting from SAM to BAM and vice versa.
- Retrieving reads based on different criteria : reads mapping to a particular region, reads mapping with a certain quality, unmapped reads, etc.
- Collecting statistics about your mapping result.
With mapping results in SAM format, we need to convert it to BAM format, sort the BAM file and create and index for the BAM file.
Exercise 1: Convert SAM file to BAM format.
samtools view -b -S samfile > bamfile
Exercise 2: Sort and index newly created BAM file.
samtools sort bamfile sortedbamfile samtools index sortedbamfile
WE ARE NOT GOING TO DO THIS RIGHT NOW, BUT TAKE A LOOK!
Submit to the TACC queue or run in an idev shell
Create a commands
file and use launcher_creator.py followed by qsub.
#For bwa mem results cp $SCRATCH/my_rnaseq_course/bwa_exercise/results/bwa_mem/C1_R1.mem.bam* . # If not, you can get the results from: /corral-repl/utexas/BioITeam/rnaseq_course/bwa_mem_results cds cd my_rnaseq_course cp -r /corral-repl/utexas/BioITeam/rnaseq_course/bwa_mem_results .
Exercise 3: Let's get some statistics: Samtools flagstat
Samtools flagstat can generate useful statistics about a mapped BAM file. Let's try it on one of our samples- C1_R1. Let's look at the BAM files for this sample, generated from bwa aln/sampe and generated from bwa mem. The two files are: C1_R1.bam and C1_R1.mem.bam
samtools flagstat C1_R1.bam samtools flagstat C1_R1.mem.bam
Exercise 4: Let's get some statistics: Samtools idxstats
samtools idxstats C1_R1.bam samtools idxstats C1_R1.mem.bam
RNASEQC
A java program that generates rnaseq specific metrics for an input BAM file. Not available on TACC yet. Metrics like:
• total read number, number of unique reads, and number of duplicate reads
• duplication rate (number of duplicates/total reads)
• number of reads mapped/aligned
• number of unique reads mapped
• number of reads that are mapped to rRNA regions
• intragenic rate (reads mapped to intragenic regions/mapped unique reads)
• exonic rate (reads mapped to exonic regions/mapped unique reads)
• coding rate (reads mapped to coding regions/mapped unique reads)
• intergenic rate (reads mapped to intergenic regions/mapped unique reads)
• strand specificity metrics
• coverage metrics
Images from: http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/rnaseqc/RNA-SeQC_Help_v1.1.2.pdf
BACK TO COURSE OUTLINE
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.