...
Here is a rough outline of the workflow in breseq with proposed additions.
This tutorial was reformatted from the most recent version found here. Our thanks to the previous instructors.
Objectives:
- Use a very self contained/automated pipeline to identify mutations.
- Explain the types of mutations found in a complete manner before using methods better suited for higher order organisms.
Example 1: Bacteriophage lambda data set
...
Code Block |
---|
$BI/ngs_course/lambda_mixed_pop/data |
Copy this directory to a new directory called BDIB_breseq in your $SCRATCH
space . Name it something other than data
. And and cd
into it.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
cds
mkdir BDIB_breseq
cp $BI/ngs_course/lambda_mixed_pop/data/* BDIB_breseq
cd BDIB_breseq
ls |
If the copy worked correctly you should see the following 2 files:
File Name | Description | Sample |
---|---|---|
| Single-end Illumina 36-bp reads | Evolved lambda bacteriophage mixed population genome sequencing |
| Reference Genome | Bacteriophage lambda |
...
Because this data set is relatively small (roughly 100x coverage of a 48,000 bp genome), a breseq run will take < 5 minutes. Submit this command to the TACC development queue or run on an idev node. general
Code Block | ||||
---|---|---|---|---|
| ||||
breseq -j 12 -r lambda.gbk lambda_mixed_population.fastq > log.txt |
A bunch of progress messages will stream by during the breseq run . They detail which would be lost on the compute node if not for the redirection to the log.txt file. The output text details several steps in a pipeline that combines the steps of mapping (using SSAHA2), variant calling, annotating mutations, etc. You can examine them by peeking in the log.txt
file as your job runs using tail using tail -f
. The -f
option means to "follow" the file and keep giving you output from it as it gets bigger. You will need to wait for your job to start running before you can tail -f log.txt
.
The command that we used contains several parts with the following explanations:
part | puprose |
---|---|
-j 12 | Use 12 processors (the max available on lonestar nodes) |
-r lambda.gbk | Use the lambda.gbk file as the reference to identify specific mutations |
lambda_mixed_population.fastq | breseq assumes any argument not preceded by a - option to be an input fastq file to be used for mapping |
> log.txt | redirect the output the log.txt file |
Looking at breseq predictions
breseq will produce a lot of directories beginning 01_sequence_conversion
, 02_reference_alignment
, ... Each of these contains intermediate files that can be deleted when the run completes, or explored if you are interested in the inner guts of what is going on. More importantly, breseq will also produce two directories called: data
and output
.First, which contain files used to create .html output files and .html output files respectively. The most interesting files are the .html files which can't be viewed directly on lonestar. Therefore we first need to copy the output
directory back to your desktop computer.
Expand | ||||
---|---|---|---|---|
| ||||
If you use To figure out the full path to your file, you can use the
Then try a command like this on your desktop (if on a Linux machine or MacOS X):
It would be even better practice to archive and gzip the |
...