Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Principal Component Analysis

Use On personal computer, use the R script snp_pca.R to do PCA on the data. What's going on with chromosome 12?

...

Code Block
titlesolution
collapsetrue
#(back on TACC)
#First choose a window size and step size (here 100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000

#Run for chromosome 1
vcftools --vcf chrI.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrI_female_v_male

#chromosome 2
vcftools --vcf chrII.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrII_female_v_male

#chromosome12
vcftools --vcf chrXII.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrXII_female_v_male


#look at result files
ls -l *windowed.weir.fst


#send these to pungitius_exercise/ on personal computer to plot with R

...

Code Block
titlesolution
collapsetrue
#change to directory with species-separated vcfs
cd species_separated


#select window size and step (100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000


#loop through vcfs and build commands for running --window-pi
>piCommands
for vcfFile in *.recode.vcf
do echo "vcftools --vcf $vcfFile --window-pi $WINDOW_SIZE --window-pi-step $WINDOW_STEP --out ${vcfFile/.recode.vcf} &" >> piCommands
done


#execute them


#look at results files
ls -l *windowed.pi


#send to pungitius_exercise/ on personal computer to plot with 

Plot the resulting *weir.fst and *.pi results with plot_window_stats.R

A pseudoautosomal region (PAR) is a region of a sex chromosome that still undergoes recombination.

...

Code Block
titlesolution
collapsetrue
probably at about 3 Mb

Fst result

Pi result

Phylogenetics

Code Block
cd phylogenetics

Build gene trees from Admixture

Run admixture on chromosome 1 and also the PAR and from the divergent regions region of chr12

(based on the Fst plot, let's use these boundaries: PAR: 0 - 2Mb, divergent: 4 - 17 Mb)

Code Block
titlesolution
collapsetrue
#first switch to integers, because plink doesn't like roman numerals
sed 's/chrI/1/' chrI.vcf > chr1.vcf
sed 's/chrXII/12/' chrXII.vcf > chr12.vcf


#subset for the PAR and divergent region
vcftools --vcf chr12.vcf --chr 12 --from-bp 0 --to-bp 2000000 --recode --out PAR
vcftools --vcf chr12.vcf --chr 12 --from-bp 4000000 --to-bp 17000000 --recode --out SDR


#convert to bed files with plink
plink2 --vcf chr1.vcf --make-bed --out chr1 --allow-extra-chr
plink2 --vcf PAR.recode.vcf --make-bed --out PAR --allow-extra-chr
plink2 --vcf SDR.recode.vcf --make-bed --out SDR --allow-extra-chr


#run admixture
admixture chr1.bed 3
admixture PAR.bed 3
admixture SDR.bed 3


#send the *.Q files to personal computer to plot with plot_pungitius_admixture.R

admixture result

Phylogenetics

Code Block
(on TACC)
cd phylogenetics

Build gene trees from the PAR and from the divergent regions of chr12

Where did the Y chromosome come from?

Code Block
titlesolution
collapsetrue
#subset the vcf for the two regions
vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp 0 --to-bp 2000000 --recode --out PAR
vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp 4000000 --to-bp 17000000 --recode --out SDR


#use vcf-kit to build trees
vk phylo tree upgma PAR.recode.vcf > PAR.newick
vk phylo tree upgma SDR.recode.vcf > SDR.newick


#look at the trees on TACC with view_newick.py
view_newick.py PAR.newick
view_newick.py SDR.newick

For a nicer tree viewer, Dendroscope works well.

Build gene trees in windows along chromosome

Code Block
titlesolution
collapsetrue
#set up boundaries
seq 0 250000 19750000 > lefts
seq 250000 250000 20000000 > rights
paste lefts rights > bounds.txt


#now build commands to subset the vcf with VCFtools, then pipe the subset into VCF-kit for tree building
awk '{print "vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp "$1 " --to-bp "$2" --recode -c | vk phylo tree upgma - > window_"$1"-"$2".newick &"}' bounds.txt > treeBuildCommands


#check the commands (should have 80 lines)
head treeBuildCommands
wc -l treeBuildCommands


#Double-check you are in an idev session
#If you are in an idev session, and had 80 lines, print to screen and execute


sh treeBuildCommands




#check that you have all trees
ls window*.newick | wc -l




#compile the resulting trees into a single file
#they need to be in order, so we can just simply cat them
#read bounds.txt again to cat them in order
echo ">allTrees.newick" >catTrees
while read bound
 do left=$(echo $bound | cut -d " " -f 1)
 right=$(echo $bound | cut -d " " -f 2)
 echo "cat window_${left}-${right}.newick | tr -d '\n' >> allTrees.newick && echo '' >>allTrees.newick" >> catTrees
done<bounds.txt


#execute the catting commands
sh catTrees


#check that the input is correct size (should be 80)
wc -l allTrees.newick


#now run Twisst on them
~/twisst/twisst.py -t allTrees.newick\
 -w topology_weights.tsvtxt\
 -g Pun_X -g Pun_Y -g Sin -g Tym\
 --groupsFile groups_file.txt\
 --method threshold\
 --thresholdTable ~/twisst/threshold_tables/binom_threshold_wilson_CI95_0.1.tsv



#This takes a bit.
#Once it's done, format the output
grep "^#" topology_weights.txt > topos.txt
grep -v "^#" topology_weights.txt > weights.txt


#send the results (all end in *.txt) to personal computer to plot with plot_twisst_results.R

result