Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Using FreeBayes and deepSNV to call variants in mixed populations

The program In this tutorial we will use two different programs to identify variants in mixed genomic samples where DNA from many individuals was pooled and sequenced together.

  1. FreeBayes can be used to call variants in genomes of any ploidy, pooled samples, or mixed populations

...

  1. .
  2. deepSNV can be used to call single-nucleotide variants and single-base deletions in ultra-deep sequencing data sets.

Install FreeBayes

This tutorial assumes that you have created the paths $WORK/src and $HOME/local/bin and added $HOME/local/bin to your $PATH. FreeBayes uses a git repository and requires the cmake build system to compile. You can install it with these commands:

Code Block
titleInstalling FreeBayes from source
login1$ module load git
login1$ mkdir -p $WORK/src && cd $WORK/src
login1$ git clone --recursive git://github.com/ekg/freebayes.git
login1$ cd freebayes
login1$ module load cmake
login1$ module load gcc login1$ make
login1$ mv bin/* $HOME/local/bin

This command from the FreeBayes instructions attempts to install to a system-wide location as super-user:

...

This won't work on Lonestar! (You aren't an admin.) However, the make command already created the executables inside of the bin directory in the source tree, so we can find and move them to our standard $HOME/local/bin directory with the last command.

...

Install deepSNV

The newest version of R module on lonestar is version 2.14, but deep SNV requires R version 2.15.

You can install your own version of R 2.15 on TACC using the instructions below, but this takes a while to compile, so you can also just add this location to your path by adding this line to your ~/.profile_user file. We have installed R in this location

Code Block
titleUsing the copy of R 2.15 that we have already installed
export PATH="/corral-repl/utexas/BioITeam/ngs_course/local/bin:$PATH"

If you want to go through installing R 2.15 and deep SNV deepSNV for yourself, here's how:

Code Block
titleInstalling R v2.15 from source in your
login1$ wget http://cran.wustl.edu/src/base/R-2/R-2.15.0.tar.gz
login1$ tar -xvzf R-2.15.0.tar.gz
login1$ cd R-2.15.0
login1$ ./configure --prefix=$HOME/local
login1$ make
login1$ make install

Once you have access to R 2.15, you can install deepSNV using these commands (which work for any BioConductor package).

Code Block
titleInstalling Bioconductor package deepSNV
login1$ R
...
> source("http://bioconductor.org/biocLite.R")
> biocLite("deepSNV")

...

The read files were downloaded from the ENA SRA study.

So that we you can treat all the data as single-ended for simplicity, we concatenated two separate FASTQ (paired-end) files for sample SRR030252 using this command

Code Block
cat SRR030252_1.fastq SRR030252_2.fastq > SRR030252.fastq

Alternatively, you could map that data set as paired-end.

The reference genome file was downloaded from the NCBI Genomes page.

...