Assessing Mappings Results II
Lets compare the results of BWA (unspliced) and Hisat2 (spliced) by looking at the respective BAM files.
Exercise 5a: Examine a BAM/SAM file
Examine a few lines of the bwa alignment file.
BWA output
cds cd my_rnaseq_course/day_2 head bwa_exercise/bwa_mem_results_transcriptome/C1_R1.mem.sam #lot of these are header lines that start with @ grep -v '^@' bwa_exercise/bwa_mem_results_transcriptome/C1_R1.mem.sam|head
Exercise 5b: Spliced sequences
How is a spliced sequence represented in the BAM/SAM file?
BWA output
The 6th BAM/SAM file field is the CIGAR string which tells you how your query sequence mapped to the reference. grep -v '^@' bwa_exercise/bwa_mem_results_transcriptome/C1_R1.mem.sam|head|cut -f 6
Examine the cigar scores of hisat2 results
HISAT2 output
grep -v '^@' hisat_exercise/results/GSM794483_C1.sam|head|cut -f 6 The CIGAR string "58M76N17M" representst a spliced sequence. The codes mean: 56M - the first 58 bases match the reference 76N - there are then 76 bases on the reference with no corresponding bases in the sequence (an intron) 17M - the last 17 bases match the reference
Exercise 4: Count spliced sequences
How many spliced sequences are there in the C1_R1 alignment file?
BWA output
grep -v '^@' hisat_exercise/results/GSM794483_C1.sam|cut -f 6|grep 'N'|wc -l
BACK TO COURSE OUTLINE
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.