Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Principal Component Analysis

Use On personal computer, use the R script snp_pca.R to do PCA on the data. What's going on with chromosome 12?

...

Code Block
titlesolution
collapsetrue
#(back on TACC)
#First choose a window size and step size (here 100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000

#Run for chromosome 1
vcftools --vcf chrI.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrI_female_v_male

#chromosome 2
vcftools --vcf chrII.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrII_female_v_male

#chromosome12
vcftools --vcf chrXII.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrXII_female_v_male


#look at result files
ls -l *windowed.weir.fst


#send these to pungitius_exercise/ on personal computer to plot with R

...

Code Block
titlesolution
collapsetrue
#change to directory with species-separated vcfs
cd species_separated


#select window size and step (100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000


#loop through vcfs and build commands for running --window-pi
>piCommands
for vcfFile in *.recode.vcf
do echo "vcftools --vcf $vcfFile --window-pi $WINDOW_SIZE --window-pi-step $WINDOW_STEP --out ${vcfFile/.recode.vcf} &" >> piCommands
done


#execute them


#look at results files
ls -l *windowed.pi


#send to pungitius_exercise/ on personal computer to plot with 

Plot the resulting *weir.fst and *.pi results with plot_window_stats.R

A pseudoautosomal region (PAR) is a region of a sex chromosome that still undergoes recombination.

...

admixture result

Phylogenetics

Code Block
(on TACC)
cd phylogenetics

Build gene trees from the PAR and from the divergent regions of chr12

...

Code Block
titlesolution
collapsetrue
#subset the vcf for the two regions
vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp 0 --to-bp 2000000 --recode --out PAR
vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp 4000000 --to-bp 17000000 --recode --out SDR


#use vcf-kit to build trees
vk phylo tree upgma PAR.recode.vcf > PAR.newick
vk phylo tree upgma SDR.recode.vcf > SDR.newick


#look at the trees on TACC with view_newick.py
view_newick.py PAR.newick
view_newick.py SDR.newick

For a nicer tree viewer, Dendroscope works well.

Build gene trees in windows along chromosome

Code Block
titlesolution
collapsetrue
#set up boundaries
seq 0 250000 19750000 > lefts
seq 250000 250000 20000000 > rights
paste lefts rights > bounds.txt


#now build commands to subset the vcf with VCFtools, then pipe the subset into VCF-kit for tree building
awk '{print "vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp "$1 " --to-bp "$2" --recode -c | vk phylo tree upgma - > window_"$1"-"$2".newick &"}' bounds.txt > treeBuildCommands


#check the commands (should have 80 lines)
head treeBuildCommands
wc -l treeBuildCommands


#Double-check you are in an idev session
#If you are in an idev session, and had 80 lines, print to screen and execute


sh treeBuildCommands




#check that you have all trees
ls window*.newick | wc -l




#compile the resulting trees into a single file
#they need to be in order, so we can just simply cat them
#read bounds.txt again to cat them in order
echo ">allTrees.newick" >catTrees
while read bound
 do left=$(echo $bound | cut -d " " -f 1)
 right=$(echo $bound | cut -d " " -f 2)
 echo "cat window_${left}-${right}.newick | tr -d '\n' >> allTrees.newick && echo '' >>allTrees.newick" >> catTrees
done<bounds.txt


#execute the catting commands
sh catTrees


#check that the input is correct size (should be 80)
wc -l allTrees.newick


#now run Twisst on them
~/twisst/twisst.py -t allTrees.newick\
 -w topology_weights.tsvtxt\
 -g Pun_X -g Pun_Y -g Sin -g Tym\
 --groupsFile groups_file.txt\
 --method threshold\
 --thresholdTable ~/twisst/threshold_tables/binom_threshold_wilson_CI95_0.1.tsv



#This takes a bit.
#Once it's done, format the output
grep "^#" topology_weights.txt > topos.txt
grep -v "^#" topology_weights.txt > weights.txt


#send the results (all end in *.txt) to personal computer to plot with plot_twisst_results.R

...