Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Evaluating capture metrics

...

To run the program on Lonestar, there are three prerequisites: 1) A bam file and 2) a list of the genomic intervals that were to be captured and 3) the reference (.fa).  As you would guess, the BAM and interval list both have to be based on exactly the same genomic reference file.

For our tutorial, the bam files are one of these:

Code Block
titleBAM files for exome capture evaluation tutorial
/corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam  
/corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12892.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam
/corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12891.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam

I've started with one of Illumina's target capture definitions (the vendor of your capture kit will provide this) but since the bam files only represent chr21 data I've created a target definitions file from chr21 only as well.  Here they are:

Code Block
titleTwo relevant target list definitions
/corral-repl/utexas/BioITeam/ngs_course/human_variation/target_intervals.chr20.reduced.withhead.intervallist
/corral-repl/utexas/BioITeam/ngs_course/human_variation/target_intervals.reduced.withhead.intervallist

And the relevant reference is:

Code Block
titleReference for exome metrics
/corral-repl/utexas/BioITeam/ngs_course/human_variation/ref/hs37d5.fa
/corral-repl/utexas/BioITeam/ngs_course/human_variation/ref/hs37d5.fa.fai


We've already copied over all of these files to scratch yesterday at $SCRATCH/BDIB_Human_tutorial/raw_files let's copy these to a new folder in our 

If you'd like to try this, copy the intervals, bam files, and reference (.fa and .fai) to a temporary directory on your $SCRATCH, and don't forget to "module load picard" first!

...

titleIf you're in a hurry...

...

our scratch directory to begin our analysis.

Code Block
languagebash
titleHopefully by now you can do this without having to paste the commands
collapsetrue
cds
mkdir 
tmpE
BDIB_Exome_Capture
cd 
tmpE
BDIB_Exome_Capture
cp 
/corral-repl/utexas/BioITeam/ngs_course/human_variation
$SCRATCH/BDIB_Human_tutorial/raw_files/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam .
cp 
/corral-repl/utexas/BioITeam/ngs_course/human_variation
$SCRATCH/BDIB_Human_tutorial/raw_files/target_intervals.chr20.reduced.withhead.intervallist .
cp 
/corral-repl/utexas/BioITeam/ngs_course/human_variation
$SCRATCH/BDIB_Human_tutorial/raw_files/ref/hs37d5.fa .
cp 
/corral-repl/utexas/BioITeam/ngs_course/human_variation
$SCRATCH/BDIB_Human_tutorial/raw_files/ref/hs37d5.fa.fai .

 

The run command looks long but isn't that complicated (like most java programs):

Code Block
titleHow to run exactly these files on Lonestar
module load picard
java -Xmx4g -Djava.io.tmpdir=/tmp -jar $TACC_PICARD_DIR/CalculateHsMetrics.jar BI=target_intervals.chr20.reduced.withhead.intervallist TI=target_intervals.chr20.reduced.withhead.intervallist I=NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam R=ref/hs37d5.fa  O=exome.picard.stats PER_TARGET_COVERAGE=exome.pertarget.stats

...

Code Block
grep -A 1 '^BAIT' exome.picard.stats | awk 'BEGIN {FS="\t"} {for (i=1;i<=NF;i++) {a[NR"_"i]=$i}} END {for (i=1;i<=NF;i++) {print a[1"_"i]"\t"a[2"_"i]}}'
 
## below here is the output of the above command, do not paste this into the command line
BAIT_SET	target_intervals
GENOME_SIZE	3137454505
BAIT_TERRITORY	1843371
TARGET_TERRITORY	1843371
BAIT_DESIGN_EFFICIENCY	1
TOTAL_READS	4579959
PF_READS	4579959
PF_UNIQUE_READS	4208881
PCT_PF_READS	1
PCT_PF_UQ_READS	0.918978
PF_UQ_READS_ALIGNED	4114249
PCT_PF_UQ_READS_ALIGNED	0.977516
PF_UQ_BASES_ALIGNED	283708397
ON_BAIT_BASES	85464280
NEAR_BAIT_BASES	49788346
OFF_BAIT_BASES	148455771
ON_TARGET_BASES	85464280
PCT_SELECTED_BASES	0.476731
PCT_OFF_BAIT	0.523269
ON_BAIT_VS_SELECTED	0.631886
MEAN_BAIT_COVERAGE	46.363038
MEAN_TARGET_COVERAGE	46.76568
PCT_USABLE_BASES_ON_BAIT	0.245533
PCT_USABLE_BASES_ON_TARGET	0.245533
FOLD_ENRICHMENT	512.716312
ZERO_CVG_TARGETS_PCT	0.009438
FOLD_80_BASE_PENALTY	23.38284
PCT_TARGET_BASES_2X	0.849372
PCT_TARGET_BASES_10X	0.484824
PCT_TARGET_BASES_20X	0.435911
PCT_TARGET_BASES_30X	0.401622
PCT_TARGET_BASES_40X	0.36876
PCT_TARGET_BASES_50X	0.335459
PCT_TARGET_BASES_100X	0.173683
HS_LIBRARY_SIZE	5325189
HS_PENALTY_10X	232.05224
HS_PENALTY_20X	-1
HS_PENALTY_30X	-1
HS_PENALTY_40X	-1
HS_PENALTY_50X	-1
HS_PENALTY_100X	-1
AT_DROPOUT	2.143632
GC_DROPOUT	10.000011
SAMPLE	
LIBRARY	
READ_GROUP

...