Here is a rough outline of the workflow in breseq with proposed additions.

This tutorial was reformatted from the most recent version found here. Our thanks to the previous instructors.

Objectives:

Use a very self contained/automated pipeline to identify mutations.
Explain the types of mutations found in a complete manner before using methods better suited for higher order organisms.

Example 1: Bacteriophage lambda data set

...

Code Block
$BI/ngs_course/lambda_mixed_pop/data

Copy this directory to a new directory called BDIB_breseq in your $SCRATCH space . Name it something other than data. And and cd into it.

Code Block

language	bash
title	Click here for the solution
collapse	true

cds
mkdir BDIB_breseq
cp $BI/ngs_course/lambda_mixed_pop/data/* BDIB_breseq

cd BDIB_breseq
ls

If the copy worked correctly you should see the following 2 files:

File Name	Description	Sample
`lambda_mixed_population.fastq`	Single-end Illumina 36-bp reads	Evolved lambda bacteriophage mixed population genome sequencing
`lambda.gbk`	Reference Genome	Bacteriophage lambda

...

Because this data set is relatively small (roughly 100x coverage of a 48,000 bp genome), a breseq run will take < 5 minutes. Submit this command to the TACC development queue or run on an idev node. general

Code Block

language	bash
title	breseq commands for commands file

breseq -j 12 -r lambda.gbk lambda_mixed_population.fastq > log.txt

A bunch of progress messages will stream by during the breseq run . They detail which would be lost on the compute node if not for the redirection to the log.txt file. The output text details several steps in a pipeline that combines the steps of mapping (using SSAHA2), variant calling, annotating mutations, etc. You can examine them by peeking in the log.txt file as your job runs using tail using tail -f. The -f option means to "follow" the file and keep giving you output from it as it gets bigger. You will need to wait for your job to start running before you can tail -f log.txt.

The command that we used contains several parts with the following explanations:

part	puprose
-j 12	Use 12 processors (the max available on lonestar nodes)
-r lambda.gbk	Use the lambda.gbk file as the reference to identify specific mutations
lambda_mixed_population.fastq	breseq assumes any argument not preceded by a - option to be an input fastq file to be used for mapping
> log.txt	redirect the output the log.txt file

Looking at breseq predictions

breseq will produce a lot of directories beginning 01_sequence_conversion, 02_reference_alignment, ... Each of these contains intermediate files that can be deleted when the run completes, or explored if you are interested in the inner guts of what is going on. More importantly, breseq will also produce two directories called: data and output.First, which contain files used to create .html output files and .html output files respectively. The most interesting files are the .html files which can't be viewed directly on lonestar. Therefore we first need to copy the output directory back to your desktop computer.

Expand

Need some help?

Need some help?	title	We have previously covered using scp to transfer files, but here we present another detailed example. Click to expand.

If you use scp then you will need to run it in a terminal that is on your desktop and not on the remote TACC system. It can be tricky to figure out where the files are on the remote TACC system, because your desktop won't understand what $HOME, $WORK, $SCRATCH mean (they are only defined on TACC).

To figure out the full path to your file, you can use the pwd command in your terminal on TACC:

Code Block
login1$ pwd

Then try a command like this on your desktop (if on a Linux machine or MacOS X):

Code Block
desktop1$ scp -r username@lonestar.tacc.utexas.edu:/the/directory/returned/by/pwd/output<the_directory_returned_by_pwd>output .

It would be even better practice to archive and gzip the output directory before copying it using tar -cvzf to archive. Then copying that file and using tar -xvzf to unarchive it.

...

Version	Old Version 4	New Version 5
Changes made by	Deatherage, Daniel E	Deatherage, Daniel E
Saved on	May 22, 2015	May 22, 2015

Versions Compared

Key

Objectives:

Example 1: Bacteriophage lambda data set

Looking at breseq predictions

Content Comparison

Versions Compared

Key

Objectives:

Example 1: Bacteriophage lambda data set

Looking at breseq predictions