Page Comparison

...

Create a directory for your run. Let's call our directory Sample, so just run
Code Block
mkdir Sample
Run fastExon-mpileup to map reads and generate a VCF file. fastExon splits the data into chunks whose size you determine, runs BWA against the reference, and generates a VCF using SamTools mpileup. Right now the code needs to be edited to specify the reference.
Code Block
fastExon-mpileup Sample_R1.fastq Sample_R2.fastq 8000000 3
The first number after the FASTQ files specifies how many lines should be in each file chunk (for a FASTQ file, it must be a multiple of 4), the second number determines how many subdirectories the file chunks will be grouped in.

Clean up raw SNPs.

Code Block
cat Test.snps.vcf \| java -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 2 ) & ( DP < 32 ) & ( QUAL >= 10 ) ! (REF = 'N') ! (ALT = 'N')" > mapping_clean_snps.vcf

If a memory error is thrown, try loading java64 before running SnpSift and increase the memory size:

Code Block
module load java64 cat Test.snps.vcf \| java -Xmx6G -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 2 ) & ( DP < 32 ) & ( QUAL >= 10 ) ! (REF = 'N') ! (ALT = 'N')" > mapping_clean_snps.vcf

Clean up raw SNPs for mapping.

Code Block

cat variants.snps.vcf | java -Xmx6G -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 5 ) & ( DP < 32 ) & ( QUAL >= 10 ) & (DP4[2] > 0) & (DP4[3] > 0) ! (REF = 'N') ! (ALT = 'N')" > mapping_snps.vcf

Remove VCF header.

Code Block
sed '1,27d' mapping_snps.vcf > mapping_snps_headless.vcf

Extract coverage on mapping SNPs.

Code Block
python covextract.py -f mapping_snps_headless.vcf > mapping_snps.tab

Fix formatting error in first column. The HMFseq R script expects the first column to be pure numbers.
Code Block
gawk -F=$'\t' 'sub(/variants.*bcf:/, "", $1)' mapping_snps.tab > mapping_snps_nums.tab

Run HMFseq R script.

Code Block
Rscript --vanilla /corral-repl/utexas/BioITeam/bin/MegaMapper/HMFseq_Rscript mapping_snps_nums.tab HMFseq_output.tab HMFseq_genome_plot.pdf HMFseq_chromosome_scan.pdf "<Run_Name>"

(Here <Run_Name> is whatever you want to call this run.)

Run SnpEff on clean SNPs

Code Block

java -jar /work/01863/benni/snpEff_3_1/snpEff.jar eff -c /work/01863/benni/snpEff_3_1/snpEff.config -i vcf -o txt -upDownStreamLen 10000 -hom -no None,downstream,intergenic,intron,upstream,utr -stats all_SNPeff_output.tab Zv9.68 clean_snps_nums.vcf > snpEff_output.txt

Load bedtools
Code Block
module load bedtools
SNP Subtractor to substract dbSNP and WT (wild-type?) SNPs from clean SNPs This step uses a dbSNP library in my own directory, which should probably be moved somewhere more central.
Code Block
intersectBed -v -a clean_snps_nums.vcf -b ../dbSNP/vcf_chr_1-25.vcf > remaining_snps1.vcf intersectBed -v -a remaining_snps1.vcf -b WT_SNP_set.vcf > remaining_snps2.vcf

Reattach a VCF header

Code Block
python /corral-repl/utexas/BioITeam/bin/MegaMapper/VCFheader_mod.py remaining_snps3.vcf remaining_snps2.vcf

SnpEff again on unique SNPs

Code Block

java -jar /corral-repl/utexas/BioITeam/bin/snpEff_3_1/snpEff.jar eff -c /corral-repl/utexas/BioITeam/bin/snpEff_3_1/snpEff.config -i vcf -o txt -upDownStreamLen 5000 -hom -no None,downstream,intergenic,intron,upstream,utr -stats unfiltered_stats.tab Zv9.68 remaining_snps3.vcf > unfiltered_candidate_SNPs.tab

Find SNPs in predicted critical interval

Code Block
Rscript --vanilla /corral-repl/utexas/BioITeam/bin/MegaMapper/candidator.R unfiltered_candidate_SNPs.tab HMFseq_output.tab candidate_list.tab

The candidate_list.tab file is your output. Congratulations, you're done!

Versions Compared

Old Version 2

New Version Current

Key