Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here is a rough outline of the workflow in breseq with proposed additions.

This tutorial was reformatted from the most recent version found here. Our thanks to the previous instructors.

Objectives:

  • Use a very self contained/automated pipeline to identify mutations.
  • Explain the types of mutations found in a complete manner before using methods better suited for higher order organisms.

 

Example 1: Bacteriophage lambda data set

...

Code Block
$BI/ngs_course/lambda_mixed_pop/data

Copy this directory to a new directory called BDIB_breseq in your $SCRATCH space . Name it something other than data. And and cd into it.

Code Block
languagebash
titleClick here for the solution
collapsetrue
cds
mkdir BDIB_breseq
cp $BI/ngs_course/lambda_mixed_pop/data/* BDIB_breseq

cd BDIB_breseq
ls 

If the copy worked correctly you should see  the following 2 files:

File Name

Description

Sample

lambda_mixed_population.fastq

Single-end Illumina 36-bp reads

Evolved lambda bacteriophage mixed population genome sequencing

lambda.gbk

Reference Genome

Bacteriophage lambda

...

Because this data set is relatively small (roughly 100x coverage of a 48,000 bp genome), a breseq run will take < 5 minutes. Submit this command to the TACC development queue or run on an idev node. general

Code Block
languagebash
titlebreseq commands for commands file
breseq -j 12 -r lambda.gbk lambda_mixed_population.fastq > log.txt

A bunch of progress messages will stream by during the breseq run . They detail which would be lost on the compute node if not for the redirection to the log.txt file. The output text details several steps in a pipeline that combines the steps of mapping (using SSAHA2), variant calling, annotating mutations, etc. You can examine them by peeking in the log.txt file as your job runs using tail using tail -f. The -f option means to "follow" the file and keep giving you output from it as it gets bigger. You will need to wait for your job to start running before you can tail -f log.txt.

The command that we used contains several parts with the following explanations:

partpuprose
-j 12Use 12 processors (the max available on lonestar nodes)
-r lambda.gbkUse the lambda.gbk file as the reference to identify specific mutations
lambda_mixed_population.fastqbreseq assumes any argument not preceded by a - option to be an input fastq file to be used for mapping
> log.txtredirect the output the log.txt file

 

Looking at breseq predictions

breseq will produce a lot of directories beginning 01_sequence_conversion02_reference_alignment, ... Each of these contains intermediate files that can be deleted when the run completes, or explored if you are interested in the inner guts of what is going on. More importantly, breseq will also produce two directories called: data and output.First, which contain files used to create .html output files and .html output files respectively. The most interesting files are the .html files which can't be viewed directly on lonestar. Therefore we first need to copy the output directory back to your desktop computer.

Expand
Need some help?
Need some help?titleWe have previously covered using scp to transfer files, but here we present another detailed example. Click to expand.

If you use scp then you will need to run it in a terminal that is on your desktop and not on the remote TACC system. It can be tricky to figure out where the files are on the remote TACC system, because your desktop won't understand what $HOME, $WORK, $SCRATCH mean (they are only defined on TACC).

To figure out the full path to your file, you can use the pwd command in your terminal on TACC:

Code Block
login1$ pwd

Then try a command like this on your desktop (if on a Linux machine or MacOS X):

Code Block
desktop1$ scp -r username@lonestar.tacc.utexas.edu:/the/directory/returned/by/pwd/output<the_directory_returned_by_pwd>output .

It would be even better practice to archive and gzip the output directory before copying it using tar -cvzf to archive. Then copying that file and using tar -xvzf to unarchive it.

...