Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is the absolute minimal command that breseq can do anything with: a reference file and a fastq file. When you executed the command without any options you saw more options and if you use breseq --help you will see more still. This will finish very quickly (likely shortly before you began reading thisless than 1 minute) with a final line of "+++   SUCCESSFULLY COMPLETED". If you instead see something different as the last line before getting your prompt back, get my attention.

...

Evaluating output

breseq produced a lot of directories beginning 01_sequence_conversion02_reference_alignment, ... Each of these contains intermediate files that are used to 'pickup where it left off' if the run doesn't complete successfully. These can be deleted when the run completes, or explored if you are interested in the inner guts of what is going on. More importantly, breseq will also produce two directories called: data and output which contain files used to create .html output files and .html output files respectively. The most interesting files are the .html files which can't be viewed directly on lonestar. Therefore we first need to copy the output directory back to your desktop computer. Use scp to transfer the contents of the output directory back to your local computer.

...

Click around through the different mutations and examine their evidence to see what kinds of mutations you can identify. If you cant understand what type of mutation each line represents, or how the images should help you understand what the mutation is, please dont hesitate to interact over zoom.help you understand what the mutation is, please dont hesitate to interact over zoom.

E. coli data from Mapping, SNV tutorials:

As a reminder, the read files we were working with in the bowtie2 and SNV tutorials were originally downloaded from the NCBI Sequence Read Archive via the corresponding European Nucleotide Archive record. They are Illumina Genome Analyzer sequencing of a paired-end library from a (haploid) E. coli clone that was isolated from a population of bacteria that had evolved for 20,000 generations in the laboratory as part of a long-term evolution experiment (Barrick et al, 2009). The reference genome is the ancestor of this E. coli population (strain REL606), so we expect the read sample to have differences from this reference that correspond to mutations that arose during the evolution experiment. If that description sounds like what breseq was made for ... breseq was literally developed at least in part to anlyze this data.

data

Like we did yesterday we'll start by downloading our reads and reference into a new folder on scratch:

Code Block
languagebash
titleRemember that to copy an entire folder requires the use of the recursive (-r) option.
collapsetrue
cds
cp -r $BI/gva_course/mapping/data GVA_breseq_comparison_to_bowtieSamtoolsTutorials
cd GVA_breseq_comparison_to_bowtieSamtoolsTutorials
ls

Running breseq

Code Block
languagebash
titlebreseq command
36
breseq -r NC_012967.1.gbk SRR030257_1.fastq SRR030257_2.fastq.gz

As mentioned early in the course, some programs can actually take compressed fastq files in as input and breseq is 1 such program. In the above example, it actually takes 2 fastq files in, 1 as a non-compressed file, the other as a gzipped file. Otherwise this is still the same basic command as we used in the first command that uses the bare minimum of inputs: a reference file, and read file(s).

In the advanced breseq tutorials, we'll start working with more complex options, such as storing input reads in 1 directory and breseq output in a separate directory, installing your own version of breseq on TACC so you aren't reliant on the BioITeam version, enabling multiple processors to speed up all the breseq runs, comparing across multiple samples, and more.

Now that you have this command running (estimated to take ~20 minutes) I suggest:

  1. Going back up to the lambda phage run you transferred back to your computer above and interrogate the results some as they will be fundamentally the same types of mutations and output that you will see for this sample, just on a smaller scale. 
  2. Reading below to anticipate what the results here will look like and how they will compare to the list of 40 variants you identified using samtools, and visualized with IGV.
  3. Reading the next section of the tutorial which deals with rerunning breseq on the lambda phage data in polymorphism mode (and the differences that makes in the results)
  4. Going back to the course home page and deciding what tutorial you'd like to work on next and begin reading through that material. 

evaluating output

Bacteriophage lambda data set revisited

...

Since we know it was a mixed population we can actually rerun the same fastq files against the same reference and add a flag for polymorphism mode (-p) and see what the difference in results is when we tell breseq that variants may exist at any frequency. 

 Data, and running breseq

Code Block
languagebash
titleCommands to copy the input data from the first breseq run to a new folder, and rerun breseq on the same fastq and reference file in polymorphism mode. Since this copy command is between 2 scratch locations i doubt there will be issues with it, but remember to restart an idev node if you experience difficulties
mkdir $SCRATCH/GVA_breseq_lambda_mixed_pop_polymode
cp $SCRATCH/GVA_breseq_lambda_mixed_pop/lambda* $SCRATCH/GVA_breseq_lambda_mixed_pop_polymode
cd $SCRATCH/GVA_breseq_lambda_mixed_pop_polymode
breseq -p -r lambda.gbk lambda_mixed_population.fastq

Evaluating output

Again you will need to transfer files back to your local computer to visualize the differences. The same exact compression command will work as the folder name is the same. In doing so you need to be careful where you transfer that file to on your local computer such that you don't overwrite the previously transferred files. Maybe add a _polymode to the directory you are transferring to as we did in our command above. Again help with SCP can be found here.

...

When you look at the summary statistic page, you will see none of the output has changed until you get quite far down the page and find that this time it was run in full polymorphism mode. When you look at the mutation predictions page, you now see more total mutations (with most of the new mutations being at frequencies less than 20%), a new column listing the frequency each mutation was listed at (with variants at less than 100% showing up in green), and if you look closely some mutations that were previously listed in white and at 100% are now listed as less than 100% (ie 82.0%). Hopefully from the discussions we've been having to this point it makes sense that mutations that are real but at low frequency would be mistaken for 0% rather than 100% when those are the only 2 choices. Again feel free to get my attention if you have any questions about the output, such as wondering why there are so many mutations at 100% even in a mixed population sample.

Additional tutorials dealing with breseq


Additional information on analyzing the output

...

  • Deatherage, D.E.Barrick, J.E.. (2014) Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseqMethods Mol. Biol. 1151:165-188. «PubMed»

...