...
- Create a directory for your run. Let's call our directory
Sample
, so just runCode Block mkdir Sample
- Run fastExon-mpileup to map reads and generate a VCF file. fastExon splits the data into chunks whose size you determine, runs BWA against the reference, and generates a VCF using SamTools mpileup. Right now the code needs to be edited to specify the reference.
The first number after the FASTQ files specifies how many lines should be in each file chunk (for a FASTQ file, it must be a multiple of 4), the second number determines how many subdirectories the file chunks will be grouped in.Code Block fastExon-mpileup Sample_R1.fastq Sample_R2.fastq 8000000 3
- Clean up raw SNPs.
If a memory error is thrown, try loading java64 before running SnpSift and increase the memory size:Code Block cat Test.snps.vcf | java -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 2 ) & ( DP < 32 ) & ( QUAL >= 10 ) ! (REF = 'N') ! (ALT = 'N')" > mapping_clean_snps.vcf
Code Block module load java64 cat Test.snps.vcf | java -Xmx6G -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 2 ) & ( DP < 32 ) & ( QUAL >= 10 ) ! (REF = 'N') ! (ALT = 'N')" > mapping_clean_snps.vcf
- Clean up raw SNPs for mapping.
Code Block cat variants.snps.vcf | java -Xmx6G -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 5 ) & ( DP < 32 ) & ( QUAL >= 10 ) & (DP4[2] > 0) & (DP4[3] > 0) ! (REF = 'N') ! (ALT = 'N')" > mapping_snps.vcf
- Remove VCF header.
Code Block sed '1,27d' mapping_snps.vcf > mapping_snps_headless.vcf
- Extract coverage on mapping SNPs.
Code Block python covextract.py -f mapping_snps_headless.vcf > mapping_snps.tab
- Fix formatting error in first column. The HMFseq R script expects the first column to be pure numbers.
Code Block gawk -F=$'\t' 'sub(/variants.*bcf:/, "", $1)' mapping_snps.tab > mapping_snps_nums.tab
- Run HMFseq R script.
(HereCode Block Rscript --vanilla /corral-repl/utexas/BioITeam/bin/MegaMapper/HMFseq_Rscript mapping_snps_nums.tab HMFseq_output.tab HMFseq_genome_plot.pdf HMFseq_chromosome_scan.pdf "<Run_Name>"
<Run_Name>
is whatever you want to call this run.) - Run SnpEff on clean SNPs
Code Block java -jar /work/01863/benni/snpEff_3_1/snpEff.jar eff -c /work/01863/benni/snpEff_3_1/snpEff.config -i vcf -o txt -upDownStreamLen 10000 -hom -no None,downstream,intergenic,intron,upstream,utr -stats all_SNPeff_output.tab Zv9.68 clean_snps_nums.vcf > snpEff_output.txt
- Load bedtools
Code Block module load bedtools
- SNP Subtractor to substract dbSNP and WT (wild-type?) SNPs from clean SNPs This step uses a dbSNP library in my own directory, which should probably be moved somewhere more central.
Code Block intersectBed -v -a clean_snps_nums.vcf -b ../dbSNP/vcf_chr_1-25.vcf > remaining_snps1.vcf intersectBed -v -a remaining_snps1.vcf -b WT_SNP_set.vcf > remaining_snps2.vcf
- Reattach a VCF header
Code Block python /corral-repl/utexas/BioITeam/bin/MegaMapper/VCFheader_mod.py remaining_snps3.vcf remaining_snps2.vcf
- SnpEff again on unique SNPs
Code Block java -jar /corral-repl/utexas/BioITeam/bin/snpEff_3_1/snpEff.jar eff -c /corral-repl/utexas/BioITeam/bin/snpEff_3_1/snpEff.config -i vcf -o txt -upDownStreamLen 5000 -hom -no None,downstream,intergenic,intron,upstream,utr -stats unfiltered_stats.tab Zv9.68 remaining_snps3.vcf > unfiltered_candidate_SNPs.tab
- Find SNPs in predicted critical interval
TheCode Block Rscript --vanilla /corral-repl/utexas/BioITeam/bin/MegaMapper/candidator.R unfiltered_candidate_SNPs.tab HMFseq_output.tab candidate_list.tab
candidate_list.tab
file is your output. Congratulations, you're done!