Haploid reference genome
Relatively small (<20 Mb) reference genome
Input FASTQ reads can be from any sequencing technology
Average genomic coverage > 2030-fold
Less than ~1,000 mutations expected
Detects SNVs and structural variants SVs from single-end reads (does not use paired-end distance information)
Produces annotated HTML output

Here is a rough outline of the workflow in breseq with proposed additions.

Optional: Install breseq

We have already installed breseq in $BI/bin, so you should be ready to go! Skip this section.

See if you can install breseq and get it running from the installation instructions.

Warninginfo

title	Info for installing breseq on TACC

You

will need Bowtie version 2.0.0-beta7 or later to run breseq. The version available on TACC by module laod is currently not this new

do not need to install a compiler (GCC/ICC), bowtie2, or R on TACC as they are available via the module system.

Code Block
module load R module load GCC module load bowtie/2.1.0

You could add these commands to your $HOME/.profile_user if you wanted them available by default.

Expand

	I need help...
	I need help...

Hint: The previous lesson We have some optional info on Installing Linux tools that should help you get bowtie2 and breseq installed. A suitable version of R is already installed on TACC. Remember that you can load that using the command:

Code Block
module load R

Example 1: Bacteriophage lambda data set

...

Because this data set is relatively small (roughly 100x coverage of a 48,000 bp genome), a breseq run will take < 5 minutes. Submit this command to the TACC development queue.

Code Block
breseq -j 12 -r lambda.gbk lambda_mixed_population.fastq > log.txt

...

Expand

	Need some help?
	Need some help?

If you use scp then you will need to run it in a terminal that is on your desktop and not on the remote TACC system. It can be tricky to figure out where the files are on the remote TACC system, because your desktop won't understand what $HOME, $WORK, $SCRATCH mean (they are only defined on TACC).

To figure out the full path to your file, you can use the pwd command in your terminal on TACC:

Code Block
login1$ pwd

Then try a command like this on your desktop (if on a Linux machine or MacOS X):

Code Block
desktop1$ scp -r username@lonestar.tacc.utexas.edu:/the/directory/returned/by/pwd/output .

It would be even better practice to archive and gzip the output directory before copying it using tar -cvzf to archive. Then copying that file and using tar -xvzf to unarchive it.

Inside of the output directory is a file called index.html. Open this in a web browser on your desktop and click around to take a look at the mutation predictions and summary information.

Optional Exercise: Running breseq in mixed population mode

The data set you are examining is actually of a mixed population of many different phage lambda genotypes descended from a clonal ancestor. You have run breseq in a mode where it is predicting consensus mutations in what it thinks is one uniform haploid genome. Actually, some individuals in the population have certain mutations and others do not, so you might have noticed when you looked at some of the alignments that there was a mixture of bases at a position.

As an optional exercise, you can use a somewhat experimental feature of breseq to run in a mode where it estimates the frequencies of different mutations in the population. This process is most accurate for single nucleotide variants. Mutations at intermediate frequencies are not (yet) predicted for classes of mutations like large structural variants.

Code Block
login1$ breseq --polymorphism-prediction --polymorphism-no-indels -r lambda.gbk lambda_mixed_population.fastq

The option --polymorphism-prediction turns on these mixed population predictions. The option --polymorphism-no-indels turns off predictions of small insertions and deletions (which don't work as well for reasons too complicated to explain here). You're welcome to try it without this option.

Copy the output back to your computer and examine the HTML output in a web browser. Compare it to the output from before.

If you want to know more about identifying polymorphisms in mixed population or pooled samples see: Genome variation in mixed samples (FreeBayes, deepSNV).

Example 2: E. coli data sets

...

breseq includes a few utility commands that can be used on any BAM/FASTA set of files to draw an HTML read pileup or a plot of the coverage across a region.

Code Block
breseq bam2aln NC_012967:237462-237462 breseq bam2cov NC_012967:2300000-2320000

You can It's easiest to run these commands from inside the main output directory (e.g., output_20K) of a breseq run. They use information in the data directory.

Code Block
breseq bam2aln NC_012967:237462-237462 breseq bam2cov NC_012967:2300000-2320000

Additionally, the files in the data directory can be loaded in IGV if you copy them back to your desktop.

Optional Exercise: Running breseq in mixed population mode

The phage lambda data set you examined is actually a mixed population of many different phage lambda genotypes descended from a clonal ancestor. You ran breseq in a mode where it predicted consensus mutations in what it thinks is one uniform haploid genome. Actually, some individuals in the population have certain mutations and others do not, so you might have noticed when you looked at some of the alignments that there was a mixture of bases at a position.

We will talk more about analyzing mixed population data to predict rare variants in a later lesson. However, if you're curious you can now experimental with running breseq in a mode where it estimates the frequencies of different mutations in the population. This process is most accurate for single nucleotide variants. Mutations at intermediate frequencies are not (yet) predicted for all classes of mutations like large structural variants.

Code Block
login1$ breseq --polymorphism-prediction --polymorphism-no-indels -r lambda.gbk lambda_mixed_population.fastq

The option --polymorphism-prediction turns on these mixed population predictions. The option --polymorphism-no-indels turns off predictions of small insertions and deletions (which don't work as well for reasons too complicated to explain here). You're welcome to also try it without this option.

Copy the resulting output directory back to your computer and examine the HTML output in a web browser. Compare it to the output from before.

Versions Compared

Old Version 1

New Version Current

Key

Table of Contents

Optional: Install breseq

Example 1: Bacteriophage lambda data set

Optional Exercise: Running breseq in mixed population mode

Example 2: E. coli data sets

Optional Exercise: Running breseq in mixed population mode

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Table of Contents

Optional: Install breseq

Example 1: Bacteriophage lambda data set

Optional Exercise: Running breseq in mixed population mode

Example 2: E. coli data sets

Optional Exercise: Running breseq in mixed population mode