...
The following DNA sequencing read data files were downloaded from the NCBI Sequence Read Archive via the corresponding European Nucleotide Archive record. They are Illumina Genome Analyzer sequencing of a paired-end library from a (haploid) E. coli clone that was isolated from a population of bacteria that had evolved for 20,000 generations in the laboratory as part of a long-term evolution experiment (Barrick et al, 2009). The reference genome is the ancestor of this E. coli population (strain REL606), so we expect the read sample to have differences from this reference that correspond to mutations that arose during the evolution experiment.
Transferring Data
We have already downloaded data files for this example and put them in the pathRather than having to download these files from the SRA or EUN and NCBI, these data files are available in the following directory:
Code Block |
---|
$BI/gva_course/mapping/data |
...
In this case the bp_seqconvert.pl perl script is included as part of the bioperl module package. Rather than attempt to find it as part of a conda package, or in some other repository we will use the module version. If needing this script in the future outside of TACC, https://metacpan.org/dist/BioPerl/view/bin/bp_seqconvert.
Code Block | language | bash|
---|---|---|
| ||
module load bioperl/1.007002 which -a bp_seqconvert.pl |
...
|
Info | |||||||
---|---|---|---|---|---|---|---|
| |||||||
If you run on an idev node you get 1 result related to the bioperl module, but if you run on the head node (outside idev) you get 2 results. On the head node, 1 points to the BioITeam near where you keep finding your data (/corral-repl/utexas/BioITeam/) which is part of the |
...
BioITeam, specifically the "bin" folder |
...
which is full of binary or (typically small) bash/python/perl/R scripts that someone has written to help the TACC community. The other is in a folder specifically associated with the bioperl module. You can load and unload the bioperl module to see the difference.
If you try to run the BioITeam version of the script |
...
( |
...
We get this error message because because |
...
while perl is installed on stampede2, the required |
...
SeqIO.pm library is not |
...
available by default |
...
but it is easily installed with the bioperl module. As it is likely rare that you will need to convert sequence files between different format, bioperl is actually not listed as one of the modules on your .bashrc file in your $HOME directory that you set up yesterday |
...
After loading the bioperl library to get past the error message, run the script from the BioITeam without any arguments to get the help message:
Code Block | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
module load bioperl
, but if you find yourself using the command `module load bioperl` often, you may want to add it.
How does the computer know which location to use?
Using just the script name by itself, will use which ever is found first, but you can always force the computer to use a given copy by specifying the full path to the copy you want. Thus, the following 2 commands are not equal:
|
...
While the commands are different, both copies can use the same bioperl library SeqIO.pm when the bioperl module is loaded and thus work. |
Convert a gbk reference to a embl reference
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
module load bioperl
bp_seqconvert.pl --from genbank --to embl < NC_012967.1.gbk > NC_012967.1.embl
head -n 100 NC_012967.1.embl
|
...
While you could consult previous year's tutorial for installing bowtie2 via the module system, this year's course will be using the conda system to install it. The bowtie2 home page can be found here, and if you needed to download the program itself, version 2.45.5 could 1 could be downloaded here. Instead, we want to make sure the bowtie2 version 2.45.5 1 is installed via conda Like we did for fastqc and cutadapt. See if you can figure out how to install bowtie2 into a new conda environment named "GVA-bowtie2-mapping". Note that "2" is actually part of the program name, neither a typo nor a comment on the program version.
...
In the bowtie2 example, we mapped in
--local
mode. Try mapping in--end-to-end
mode (aka global mode).- Do the BWA tutorial so you can compare their outputs (note BWA has a conda package making it even easier to try).
- Did bowtie2 or BWA map more reads?
- In our examples, we mapped in paired-end mode. Try to figure out how to map the reads in single-end mode and create this output.
- Which aligner took less time to run? Are there any options you can change that:
- Lead to a larger percentage of the reads being mapped? (increase sensitivity)
- Speed up run time without causing many fewer reads to be mapped? (increase performance)
...