Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »


This exercise uses data generated from three species of nine-spine sticklebacks: Pungitius pungitius, Pungitius sinensis, and Pungitius tymensis.

Check out data

The data are divided by chromosome.

How many individuals do we have and how many variant sites for each chromosome?

solution
#check the number of individuals and sites for each vcf
for file in *.vcf
 do echo "----------------"
 vcftools --vcf $file
done

What are the samples names?

solution
#The names appear in the top of the file
less chrI.vcf

How many individuals from each species are there?

solution
#we can make a list of the sample names by extracting them from one of the VCFs
grep CHROM chrI.vcf | tr "\t" "\n" | grep [1-9] > all_samples.txt


#view the list
less all_samples.txt


#we can then count the different species
grep Pun all_samples.txt | wc -l
grep Sin all_samples.txt | wc -l
grep Tym all_samples.txt | wc -l

Principal Component Analysis

Use the R script snp_pca.R to do PCA on the data. What's going on with chromosome 12?

result


VCFtools to window stats

VCFtools has some nice features to get stats in windows along chromosomes.

Return to TACC to look closer at variation between male and female Pungitius along chromosome 12

First we'll get Fst between males and females. We'll use sample_data.txt to build a single-column table of each.

solution
grep Pun sample_data.txt | grep F | cut -f 1 > Pun_females.txt
grep Pun sample_data.txt | grep M | cut -f 1 > Pun_males.txt

Now run vcftools --weir-fst-pop in sliding window mode (see VCFtools documentation for how to build command).

solution
#First choose a window size and step size (here 100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000

#Run for chromosome 1
vcftools --vcf chrI.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrI_female_v_male

#chromosome 2
vcftools --vcf chrII.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrII_female_v_male

#chromosome12
vcftools --vcf chrXII.vcf \
 --weir-fst-pop Pun_males.txt \
 --weir-fst-pop Pun_females.txt \
 --fst-window-size $WINDOW_SIZE \
 --fst-window-step $WINDOW_STEP \
 --out chrXII_female_v_male


#look at result files
ls -l *windowed.weir.fst


#send these to pungitius_exercise/ on personal computer to plot with R

We also want to look at nucleotide diversity along the chromosomes within each species.

We'll need a file listing the samples names for each species.

solution
grep Pun sample_data.txt | cut -f 1 > Pun_samples.txt
grep Sin sample_data.txt | cut -f 1 > Sin_samples.txt
grep Tym sample_data.txt | cut -f 1 > Tym_samples.txt

Separate out the species from each chromosome into its own VCF.

solution
#make a directory to keep them in
mkdir species_separated


#loop through the vcfs and to make commands
>separateCommands
for vcfFile in chr*.vcf
do echo "vcftools --vcf $vcfFile --keep Pun_samples.txt --recode --out species_separated/${vcfFile/.vcf/}_punOnly &
vcftools --vcf $vcfFile --keep Sin_samples.txt --recode --out species_separated/${vcfFile/.vcf/}_sinOnly &
vcftools --vcf $vcfFile --keep Tym_samples.txt --recode --out species_separated/${vcfFile/.vcf/}_tymOnly &" >> separateCommands
done


#look at commands
cat separateCommands


#copy paste to execute
#look at result files
ls species_separated/*.vcf

Now use vcftools --window-pi to look at nucleotide diversity within each species

solution
#change to directory with species-separated vcfs
cd species_separated


#select window size and step (100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000


#loop through vcfs and build commands for running --window-pi
>piCommands
for vcfFile in *.recode.vcf
do echo "vcftools --vcf $vcfFile --window-pi $WINDOW_SIZE --window-pi-step $WINDOW_STEP --out ${vcfFile/.recode.vcf} &" >> piCommands
done


#execute them


#look at results files
ls -l *windowed.pi


#send to pungitius_exercise/ on personal computer to plot with R

A pseudoautosomal region (PAR) is a region of a sex chromosome that still undergoes recombination.

What is the likely boundary for the PAR in Pungitius pungitius? (plot window stats with plot_window_stats.R)

solution
probably at about 3 Mb

Fst result

Pi result

Phylogenetics

Install VCF-kit
 



  • No labels