Breseq Basics
Introduction
On Monday we introduced breseq to you but were limited by time and setting up our .bashrc files and other environmental things. Here we present an extremely quick breseq example so you can familiarize yourself with the breseq command before moving on to more advanced uses of breseq. breseq is one of (if not the absolute) best and most complete programs for bacterial variant identification not just in our opinion but by multiple people at conferences and those that have used it during this class. More information can be found in the following reference:
- Deatherage, D.E., Barrick, J.E.. (2014) Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol. Biol. 1151:165-188. «PubMed»
Objectives
- Run a simple breseq analysis on a set of lambda phage sequencing data.
- Transfer output files off TACC to interrogate output and see mutation visualizations
Running breseq
First, we'll run breseq on a small data set to be sure that it is installed correctly, and to get a taste for what the output looks like. To do that we'll want to copy our data off of corral and onto scratch. The following 2 files are in the $BI/ngs_course/lambda_mixed_pop/data :
File Name | Description | Sample |
---|---|---|
| Single-end Illumina 36-bp reads | Evolved lambda bacteriophage mixed population genome sequencing |
| Reference Genome | Bacteriophage lambda |
See if you can figure out how to copy them to a new directory on $SCRATCH called BDIB_breseq_basic.
idev idev idev
By now hopefully you are guessing that you need to be on an idev node to be running a computationally intensive job like breseq. You can check this via the showq -u command.
idev -m 120 -r CCBB_Bio_Summer_School_2016_Day3 -A UT-2015-05-18 -N 2 -n 8
module load intel module load Rstats breseq -j 48 -r lambda.gbk lambda_mixed_population.fastq
The output text details several steps in a pipeline that combines the steps of mapping (using SSAHA2), variant calling, annotating mutations, etc. This should finish relatively quickly but while it runs lets look at what we put in the command line:
part | puprose |
---|---|
-j 48 | Use 48 processors (the max available on lonestar5 nodes) |
-r lambda.gbk | Use the lambda.gbk file as the reference to identify specific mutations |
lambda_mixed_population.fastq | breseq assumes any argument not preceded by a "-" or "--" option to be an input fastq file to be used for mapping |
Once you see a line that reads "Creating index HTML table..." breseq has finished running
Looking at breseq predictions
breseq produced a lot of directories beginning 01_sequence_conversion
, 02_reference_alignment
, ... Each of these contains intermediate files that can be deleted when the run completes, or explored if you are interested in the inner guts of what is going on. More importantly, breseq will also produce two directories called: data
and output
which contain files used to create .html output files and .html output files respectively. The most interesting files are the .html files which can't be viewed directly on lonestar. Therefore we first need to copy the output
directory back to your desktop computer.
Navigate to the output
directory in the finder and open the a file called index.html
. This will open the results in a web browser window that you can click through different mutations and other information and see the evidence supporting it. The summary page provides useful information about the percent of reads mapping to the genome as well as the overall coverage of the genome. The Mutation Predictions page is where most of the analysis time is spent in determining which mutations are important (and more rarely inaccurate).
While it is a somewhat unfair comparison, consider how long it took to complete this analysis compared to the SNV and SV tutorials and IGV tutorials from yesterday and today. You have access to better formatted data in a shorter amount of time using breseq which is what makes it so powerful
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.