...
Principal Component Analysis
Use On personal computer, use the R script snp_pca.R to do PCA on the data. What's going on with chromosome 12?
...
Code Block | ||||
---|---|---|---|---|
| ||||
#(back on TACC)
#First choose a window size and step size (here 100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000
#Run for chromosome 1
vcftools --vcf chrI.vcf \
--weir-fst-pop Pun_males.txt \
--weir-fst-pop Pun_females.txt \
--fst-window-size $WINDOW_SIZE \
--fst-window-step $WINDOW_STEP \
--out chrI_female_v_male
#chromosome 2
vcftools --vcf chrII.vcf \
--weir-fst-pop Pun_males.txt \
--weir-fst-pop Pun_females.txt \
--fst-window-size $WINDOW_SIZE \
--fst-window-step $WINDOW_STEP \
--out chrII_female_v_male
#chromosome12
vcftools --vcf chrXII.vcf \
--weir-fst-pop Pun_males.txt \
--weir-fst-pop Pun_females.txt \
--fst-window-size $WINDOW_SIZE \
--fst-window-step $WINDOW_STEP \
--out chrXII_female_v_male
#look at result files
ls -l *windowed.weir.fst
#send these to pungitius_exercise/ on personal computer to plot with R |
...
Code Block | ||||
---|---|---|---|---|
| ||||
#change to directory with species-separated vcfs
cd species_separated
#select window size and step (100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000
#loop through vcfs and build commands for running --window-pi
>piCommands
for vcfFile in *.recode.vcf
do echo "vcftools --vcf $vcfFile --window-pi $WINDOW_SIZE --window-pi-step $WINDOW_STEP --out ${vcfFile/.recode.vcf} &" >> piCommands
done
#execute them
#look at results files
ls -l *windowed.pi
#send to pungitius_exercise/ on personal computer to plot with |
Plot the resulting *weir.fst and *.pi results with plot_window_stats.R
A pseudoautosomal region (PAR) is a region of a sex chromosome that still undergoes recombination.
...
Code Block | ||||
---|---|---|---|---|
| ||||
probably at about 3 Mb |
Phylogenetics
Code Block |
---|
cd phylogenetics |
Build gene trees from Admixture
Run admixture on chromosome 1 and also the PAR and from the divergent regions region of chr12
(based on the Fst plot, let's use these boundaries: PAR: 0 - 2Mb, divergent: 4 - 17 Mb)
Code Block | ||||
---|---|---|---|---|
| ||||
#first switch to integers, because plink doesn't like roman numerals
sed 's/chrI/1/' chrI.vcf > chr1.vcf
sed 's/chrXII/12/' chrXII.vcf > chr12.vcf
#subset for the PAR and divergent region
vcftools --vcf chr12.vcf --chr 12 --from-bp 0 --to-bp 2000000 --recode --out PAR
vcftools --vcf chr12.vcf --chr 12 --from-bp 4000000 --to-bp 17000000 --recode --out SDR
#convert to bed files with plink
plink2 --vcf chr1.vcf --make-bed --out chr1 --allow-extra-chr
plink2 --vcf PAR.recode.vcf --make-bed --out PAR --allow-extra-chr
plink2 --vcf SDR.recode.vcf --make-bed --out SDR --allow-extra-chr
#run admixture
admixture chr1.bed 3
admixture PAR.bed 3
admixture SDR.bed 3
#send the *.Q files to personal computer to plot with plot_pungitius_admixture.R |
Phylogenetics
Code Block |
---|
(on TACC) cd phylogenetics |
Build gene trees from the PAR and from the divergent regions of chr12
Where did the Y chromosome come from?
Code Block | ||||
---|---|---|---|---|
| ||||
#subset the vcf for the two regions vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp 0 --to-bp 2000000 --recode --out PAR vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp 4000000 --to-bp 17000000 --recode --out SDR #use vcf-kit to build trees vk phylo tree upgma PAR.recode.vcf > PAR.newick vk phylo tree upgma SDR.recode.vcf > SDR.newick #look at the trees on TACC with view_newick.py view_newick.py PAR.newick view_newick.py SDR.newick |
For a nicer tree viewer, Dendroscope works well.
Build gene trees in windows along chromosome
Code Block | ||||
---|---|---|---|---|
| ||||
#set up boundaries seq 0 250000 19750000 > lefts seq 250000 250000 20000000 > rights paste lefts rights > bounds.txt #now build commands to subset the vcf with VCFtools, then pipe the subset into VCF-kit for tree building awk '{print "vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp "$1 " --to-bp "$2" --recode -c | vk phylo tree upgma - > window_"$1"-"$2".newick &"}' bounds.txt > treeBuildCommands #check the commands (should have 80 lines) head treeBuildCommands wc -l treeBuildCommands #Double-check you are in an idev session #If you are in an idev session, and had 80 lines, print to screen and execute sh treeBuildCommands #check that you have all trees ls window*.newick | wc -l #compile the resulting trees into a single file #they need to be in order, so we can just simply cat them #read bounds.txt again to cat them in order echo ">allTrees.newick" >catTrees while read bound do left=$(echo $bound | cut -d " " -f 1) right=$(echo $bound | cut -d " " -f 2) echo "cat window_${left}-${right}.newick | tr -d '\n' >> allTrees.newick && echo '' >>allTrees.newick" >> catTrees done<bounds.txt #execute the catting commands sh catTrees #check that the input is correct size (should be 80) wc -l allTrees.newick #now run Twisst on them ~/twisst/twisst.py -t allTrees.newick\ -w topology_weights.tsvtxt\ -g Pun_X -g Pun_Y -g Sin -g Tym\ --groupsFile groups_file.txt\ --method threshold\ --thresholdTable ~/twisst/threshold_tables/binom_threshold_wilson_CI95_0.1.tsv #This takes a bit. #Once it's done, format the output grep "^#" topology_weights.txt > topos.txt grep -v "^#" topology_weights.txt > weights.txt #send the results (all end in *.txt) to personal computer to plot with plot_twisst_results.R |