Note |
---|
title | Data used in this tutorial |
---|
|
Recommended but not required that you first complete the trios tutorial and use the data you generated here. Alternatively, canned data provided. |
...
There are many ways to measure sequence capture. You might care more about minimizing off-target capture, to make your sequencing dollars go as far as possible. Or you might care more about maximizing on-target capture, to make sure you get data from every region of interest. These two are usually negatively correlated.
Using Picard's "CollectHsMetrics" function to evaluate capture
...
To run CollectHsMetrics on Lonestar, there are three prerequisites: 1) A bam file and 2) a list of the genomic intervals that were to be captured and 3) the reference (.fa). As you would guess, the BAM and interval list both have to be based on exactly the same genomic reference file.
For our tutorial, the bam files are one of these:
Code Block |
---|
title | BAM files for exome capture evaluation tutorial |
---|
|
/corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam
/corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12892.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam
/corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12891.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam |
I've started with one of Illumina's target capture definitions (the vendor of your capture kit will provide this) but since the bam files only represent chr21 data I've created a target definitions file from chr21 only as well. Here they are:
Code Block |
---|
title | Two relevant target list definitions |
---|
|
/corral-repl/utexas/BioITeam/ngs_course/human_variation/target_intervals.chr20.reduced.withhead.intervallist
/corral-repl/utexas/BioITeam/ngs_course/human_variation/target_intervals.reduced.withhead.intervallist |
And the relevant reference is:
Code Block |
---|
title | Reference for exome metrics |
---|
|
/corral-repl/utexas/BioITeam/ngs_course/human_variation/ref/hs37d5.fa
/corral-repl/utexas/BioITeam/ngs_course/human_variation/ref/hs37d5.fa.fai |
...
Code Block |
---|
language | bash |
---|
title | This block will work if you have not completed the human trios tutorial |
---|
collapse | true |
---|
|
mkdir $SCRATCH/GVA_Exome_Capture
cd $SCRATCH/GVA_Exome_Capture
cp /corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam .
cp /corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12892.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam .
cp /corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12891.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam .
cp /corral-repl/utexas/BioITeam/ngs_course/human_variation/target_intervals.chr20.reduced.withhead.intervallist .
cp /corral-repl/utexas/BioITeam/ngs_course/human_variation/target_intervals.reduced.withhead.intervallist .
cp /corral-repl/utexas/BioITeam/ngs_course/human_variation/ref/hs37d5.fa .
cp /corral-repl/utexas/BioITeam/ngs_course/human_variation/ref/hs37d5.fa.fai . |
The run command looks long but isn't that complicated (like most java programs):
Code Block |
---|
title | How to run exactly these files on Lonestar |
---|
|
java -Xmx4g -Djava.io.tmpdir=/tmp -jar /corral-repl/utexas/BioITeam/bin/picard.jar CollectHsMetrics BAIT_INTERVALS=target_intervals.chr20.reduced.withhead.intervallist TARGET_INTERVALS=target_intervals.chr20.reduced.withhead.intervallist INPUT=NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam REFERENCE_SEQUENCE=hs37d5.fa OUTPUT=exome.picard.stats PER_TARGET_COVERAGE=exome.pertarget.stats |
...