Assessing Mappings Results II

Lets compare the results of BWA (unspliced) and Hisat2 (spliced) by looking at the respective BAM files.


Exercise 5a: Examine a BAM/SAM file

Examine a few lines of the bwa alignment file.

BWA output
cds
cd my_rnaseq_course/day_2
head bwa_exercise/bwa_mem_results_transcriptome/C1_R1.mem.sam 
#lot of these are header lines that start with @
grep -v '^@' bwa_exercise/bwa_mem_results_transcriptome/C1_R1.mem.sam|head


Exercise 5b: Spliced sequences

How is a spliced sequence represented in the BAM/SAM file?

BWA output
The 6th BAM/SAM file field is the CIGAR string which tells you how your query sequence mapped to the reference.
 grep -v '^@' bwa_exercise/bwa_mem_results_transcriptome/C1_R1.mem.sam|head|cut -f 6 


Examine the cigar scores of hisat2 results

HISAT2 output
grep -v '^@' hisat_exercise/results/GSM794483_C1.sam|head|cut -f 6 

The CIGAR string "58M76N17M" representst a spliced sequence. The codes mean:
56M - the first 58 bases match the reference
76N - there are then 76 bases on the reference with no corresponding bases in the sequence (an intron)
17M - the last 17 bases match the reference


Exercise 4: Count spliced sequences
How many spliced sequences are there in the C1_R1 alignment file?

BWA output
grep -v '^@' hisat_exercise/results/GSM794483_C1.sam|cut -f 6|grep 'N'|wc -l 


BACK TO COURSE OUTLINE