...
As a reminder, the read files we were working with in the bowtie2 and SNV tutorials were originally downloaded from the NCBI Sequence Read Archive via the corresponding European Nucleotide Archive record. They are Illumina Genome Analyzer sequencing of a paired-end library from a (haploid) E. coli clone that was isolated from a population of bacteria that had evolved for 20,000 generations in the laboratory as part of a long-term evolution experiment (Barrick et al, 2009). The reference genome is the ancestor of this E. coli population (strain REL606), so we expect the read sample to have differences from this reference that correspond to mutations that arose during the evolution experiment. If that description sounds like what breseq was made for ... breseq was literally developed at least in part to anlyze this data.
data
Like we did yesterday we'll start by downloading our reads and reference into a new folder on scratch:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
cds
cp -r $BI/gva_course/mapping/data GVA_breseq_comparison_to_bowtieSamtoolsTutorials
cd GVA_breseq_comparison_to_bowtieSamtoolsTutorials
ls |
Running breseq
Code Block | ||||
---|---|---|---|---|
| ||||
36
breseq -r NC_012967.1.gbk SRR030257_1.fastq SRR030257_2.fastq.gz |
As mentioned early in the course, some programs can actually take compressed fastq files in as input and breseq is 1 such program. In the above example, it actually takes 2 fastq files in, 1 as a non-compressed file, the other as a gzipped file. Otherwise this is still the same basic command as we used in the first command that uses the bare minimum of inputs: a reference file, and read file(s).
In the advanced breseq tutorials, we'll start working with more complex options, such as storing input reads in 1 directory and breseq output in a separate directory, installing your own version of breseq on TACC so you aren't reliant on the BioITeam version, enabling multiple processors to speed up all the breseq runs, comparing across multiple samples, and more.
Now that you have this command running (estimated to take ~20 minutes) I suggest:
- Going back up to the lambda phage run you transferred back to your computer above and interrogate the results some as they will be fundamentally the same types of mutations and output that you will see for this sample, just on a smaller scale.
- Reading below to anticipate what the results here will look like and how they will compare to the list of 40 variants you identified using samtools, and visualized with IGV.
- Reading the next section of the tutorial which deals with rerunning breseq on the lambda phage data in polymorphism mode (and the differences that makes in the results)
- Going back to the course home page and deciding what tutorial you'd like to work on next and begin reading through that material.
...
Note | |||||||
---|---|---|---|---|---|---|---|
| |||||||
Running this data set yourself is not actually recommended, or at least not recommended until you have completed some of the advanced breseq tutorials for 3 reasons:
3 can be worked around fairly simply as can be seen in the multiqc tutorial 2 will be handled in breseq installation tutorial after which the data will still take ~1hr. Therefore while the information of how to run this data is below, it is recommended that the majority of you focus on just downloading the results of that run when I did it.
|
data
Like we did yesterday we'll start by downloading our reads and reference into a new folder on scratch:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
cds
cp -r $BI/gva_course/mapping/data GVA_breseq_comparison_to_bowtieSamtoolsTutorials
cd GVA_breseq_comparison_to_bowtieSamtoolsTutorials
ls |
Running breseq
Code Block | ||||
---|---|---|---|---|
| ||||
breseq -j 48 -r NC_012967.1.gbk SRR030257_1.fastq SRR030257_2.fastq.gz |
As mentioned early in the course, some programs can actually take compressed fastq files in as input and breseq is 1 such program. In the above example, it actually takes 2 fastq files in, 1 as a non-compressed file, the other as a gzipped file. The other thing we have introduced is the -j 48 option which should allow breseq to access 48 processors. Unfortunately if you have not completed the breseq installation tutorial, there is a version conflict between the module version of bowtie2 and breseq and will therefore throw an error when breseq tries to call bowtie2.
As noted above, the run will still take ~1 hour even after you have installed your own version of breseq and therefore it is recommended that you use the launcher_creator.py script to send this job to the queue after you have completed the installation tutorial.
Anchor | ||||
---|---|---|---|---|
|
Here are the comments from the IGV tutorial evaluating the data:
Expand | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||
|
Additional tutorials dealing with breseq
...