First, we'll run breseq on a small data set to be sure that it is installed correctly, and to get a taste for what the output looks like. This sample is a mixed population of bacteriophage lambda that was co-evolved in lab with its E. coli hosts.

Data

The data files for this example are in the path:

Code Block
$BI/ngs_course/lambda_mixed_pop/data

Copy this directory to a new directory called BDIB_breseq in your $SCRATCH space and cd into it.

Code Block

language	bash
title	Click here for the solution
collapse	true

cds
mkdir BDIB_breseq
cp $BI/ngs_course/lambda_mixed_pop/data/* BDIB_breseq
cd BDIB_breseq
ls

If the copy worked correctly you should see the following 2 files:

File Name	Description	Sample
`lambda_mixed_population.fastq`	Single-end Illumina 36-bp reads	Evolved lambda bacteriophage mixed population genome sequencing
`lambda.gbk`	Reference Genome	Bacteriophage lambda

Running breseq

Because this data set is relatively small (roughly 100x coverage of a 48,000 bp genome), a breseq run will take < 5 minutes. Submit this command to the TACC development queue or run on an idev node. general

Code Block

language	bash
title	breseq commands for commands file

breseq -j 12 -r lambda.gbk lambda_mixed_population.fastq > log.txt

...

Environment

To set your profile up to run breseq, we need to add "module load bowtie/2.1.0" to your profile.

Code Block

language	bash
title	Adding bowtie to your profile

cdh  #move to your home directory
echo "module load bowtie/2.1.0" >> .profile  #this command updates your profile to automatically load the bowtie module

After you've completed these commands, exit lonestar and re log in to re run your profile.

Data

The data files for this example are in the path:

Code Block
$BI/ngs_course/lambda_mixed_pop/data

Copy this directory to a new directory called BDIB_breseq in your $SCRATCH space and cd into it.

Code Block

language	bash
title	Click here for the solution
collapse	true

cds
mkdir BDIB_breseq_lambda
cp $BI/ngs_course/lambda_mixed_pop/data/* BDIB_breseq_lambda
cd BDIB_breseq_lambda
ls

If the copy worked correctly you should see the following 2 files:

File Name	Description	Sample
`lambda_mixed_population.fastq`	Single-end Illumina 36-bp reads	Evolved lambda bacteriophage mixed population genome sequencing
`lambda.gbk`	Reference Genome	Bacteriophage lambda

Running breseq

Because this data set is relatively small (roughly 100x coverage of a 48,000 bp genome), a breseq run will take < 5 minutes. Submit this command to the TACC development queue or run on an idev node.

Code Block

language	bash
title	breseq prep and commands

idev  #idev starts an "interactive development" mode which allows you to run computationally intensive tasks
 
breseq -j 12 -r lambda.gbk lambda_mixed_population.fastq > log.txt

A bunch of progress messages will stream by during the breseq run which would be lost on the compute node if not for the redirection to the log.txt file. The output text details several steps in a pipeline that combines the steps of mapping (using SSAHA2), variant calling, annotating mutations, etc. You can examine them by peeking in the log.txt file as your job runs using tail -f. The -f option means to "follow" the file and keep giving you output from it as it gets bigger. You will need to wait for your job to start running before you can tail -f log.txt.

...

Expand

title	We have previously covered using scp to transfer files, but here we present another detailed example. Click to expand.

To use scp you will need to run it in a terminal that is on your desktop and not on the remote TACC system. It can be tricky to figure out where the files are on the remote TACC system, because your desktop won't understand what $HOME, $WORK, $SCRATCH mean (they are only defined on TACC).

To figure out the full path to your file, you can use the pwd command in your terminal on TACC in the window that you ran breseq in (it should contain an "output" folder). Rather than copying the entire contents of the folder which can be rather large, we are going to add a twist of compressing the entire folder into a single compressed archive using the tar command so that the size will be smaller and it will transfer faster:

Code Block

language	bash
title	Command to type in TACC

tar -czvf output.tar.gz output  # the czvf options in order mean Create, Zip, Verbose, Force
pwd

Then you can then copy paste that information (in the correct position) into the scp command on the desktop's command line:

Code Block

language	bash
title	Command to type in the desktop's terminal window

scp -r <username>@lonestar.tacc.utexas.edu:<the_directory_returned_by_pwd>/output.tar.gz .

tar -xvzf output.tar.gz  # the new "x" option at the front means eXtract

Navigate to the output directory in the finder and open the a file called index.html. This will open the results in a web browser window that you can click through different mutations and other information and see the evidence supporting it.

Example 2: E. coli data sets

Now we'll try running breseq on some Escherichia coli genomes from an evolution experiment. These files are larger. You don't want to run them in interactive mode. We'll submit them to the TACC queue all at once.

Data

The data files for this example are in the path:

Code Block
$BI/ngs_course/ecoli_clones/data

...

File Name

...

Description

...

Sample

...

SRR030252_1.fastq SRR030252_2.fastq

...