Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Introduction to Samtools - manipulating and filtering bam files

As Nathan we showed you yesterday, the main type of output from aligning reads to a databases is a binary alignment file, or BAM file. These files are compressed, so they can't be viewed using standard unix file viewers such as more, less and head. Samtools allows you to manipulate the .bam files - they can be converted into a non-binary format (SAM format specification here) and can also be ordered and sorted based on the quality of the alignment. This is a good way to remove low quality reads, or make a BAM file restricted to a single chromosome.

...

Code Block
languagebash
titleget a copy the yeast_pairedend.bam file from Nathan's scratch areayesterday to a new directory "samtools"
ssh user@login8.stampede.tacc.utexas.edu
cds
cd $SCRATCH/core_ngs
mkdir samtools
cd samtools
cp /scratch/02423/nsabell/ $SCRATCH/core_ngs/alignment/bam/yeast_pairedend.bam .

Sorting and Indexing a bam file: samtools index, sort

...

Expand
titleExercise 2 solution

I've put my output for each line in the comment area.

Code Block
languagebash
titlesolution
module load samtools                                     # if needed
samtools view -c yeast_pairedend_sort.bam chrIII         # total number of reads on this chromosome: 507015503
samtools view -c -F 0X04 yeast_pairedend_sort.bam chrIII # reads on this chromosome which are unmapped: 461113973

So the total proportion of reads that were unmapped on chromosome III is 461113973/5070 15503 or 90.9%1%, which is really high!  Only 10% ~10% of reads on this chromosome were able to be mapped to the genome.

...