Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

  1. Create a directory for your run. Let's call our directory Sample, so just run
    Code Block
    mkdir Sample
  2. Run fastExon-mpileup to map reads and generate a VCF file. fastExon splits the data into chunks whose size you determine, runs BWA against the reference, and generates a VCF using SamTools mpileup. Right now the code needs to be edited to specify the reference.
    Code Block
    fastExon-mpileup Sample_R1.fastq Sample_R2.fastq 8000000 3
    The first number after the FASTQ files specifies how many lines should be in each file chunk (for a FASTQ file, it must be a multiple of 4), the second number determines how many subdirectories the file chunks will be grouped in.
  3. Clean up raw SNPs.
    Code Block
    cat Test.snps.vcf | java -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 2 ) & ( DP < 32 ) & ( QUAL >= 10 ) ! (REF = 'N') ! (ALT = 'N')" > mapping_clean_snps.vcf
    If a memory error is thrown, try loading java64 before running SnpSift and increase the memory size:
    Code Block
    module load java64
    cat Test.snps.vcf | java -Xmx6G -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 2 ) & ( DP < 32 ) & ( QUAL >= 10 ) ! (REF = 'N') ! (ALT = 'N')" > mapping_clean_snps.vcf
  4. Clean up raw SNPs for mapping.
    Code Block
    cat variants.snps.vcf | java -Xmx6G -jar /work/01863/benni/snpEff_3_1/SnpSift.jar filter "( DP > 5 ) & ( DP < 32 ) & ( QUAL >= 10 ) & (DP4[2] > 0) & (DP4[3] > 0) ! (REF = 'N') ! (ALT = 'N')" > mapping_snps.vcf
  5. Remove VCF header.
    Code Block
    sed '1,27d' mapping_snps.vcf > mapping_snps_headless.vcf
  6. Extract coverage on mapping SNPs.
    Code Block
    python covextract.py -f mapping_snps_headless.vcf > mapping_snps.tab
  7. Fix formatting error in first column. The HMFseq R script expects the first column to be pure numbers.
    Code Block
    gawk -F=$'\t' 'sub(/variants.*bcf:/, "", $1)' mapping_snps.tab > mapping_snps_nums.tab
  8. Run HMFseq R script.
    Code Block
    Rscript --vanilla /corral-repl/utexas/BioITeam/bin/MegaMapper/HMFseq_Rscript mapping_snps_nums.tab HMFseq_output.tab HMFseq_genome_plot.pdf HMFseq_chromosome_scan.pdf "<Run_Name>"
    (Here <Run_Name> is whatever you want to call this run.)
  9. Run SnpEff on clean SNPs
    Code Block
    java -jar /work/01863/benni/snpEff_3_1/snpEff.jar eff -c /work/01863/benni/snpEff_3_1/snpEff.config -i vcf -o txt -upDownStreamLen 10000 -hom -no None,downstream,intergenic,intron,upstream,utr -stats all_SNPeff_output.tab Zv9.68 clean_snps_nums.vcf > snpEff_output.txt
  10. Load bedtools
    Code Block
    module load bedtools
  11. SNP Subtractor to substract dbSNP and WT (wild-type?) SNPs from clean SNPs This step uses a dbSNP library in my own directory, which should probably be moved somewhere more central.
    Code Block
    intersectBed -v -a clean_snps_nums.vcf -b ../dbSNP/vcf_chr_1-25.vcf > remaining_snps1.vcf
    intersectBed -v -a remaining_snps1.vcf -b WT_SNP_set.vcf > remaining_snps2.vcf
  12. Reattach a VCF header
    Code Block
    python /corral-repl/utexas/BioITeam/bin/MegaMapper/VCFheader_mod.py remaining_snps3.vcf remaining_snps2.vcf
  13. SnpEff again on unique SNPs
    Code Block
    java -jar /corral-repl/utexas/BioITeam/bin/snpEff_3_1/snpEff.jar eff -c /corral-repl/utexas/BioITeam/bin/snpEff_3_1/snpEff.config -i vcf -o txt -upDownStreamLen 5000 -hom -no None,downstream,intergenic,intron,upstream,utr -stats unfiltered_stats.tab Zv9.68 remaining_snps3.vcf > unfiltered_candidate_SNPs.tab
  14. Find SNPs in predicted critical interval
    Code Block
    Rscript --vanilla /corral-repl/utexas/BioITeam/bin/MegaMapper/candidator.R unfiltered_candidate_SNPs.tab HMFseq_output.tab candidate_list.tab
    The candidate_list.tab file is your output. Congratulations, you're done!