...
So why is it when the module is loaded it finds the 1.3 version rather than the 0.1.18 version? The answer is thats what the $PATH variable is set to find. Consider the following information :
You may have noticed that module list samtools told you that you had version 1.3 loaded already. This raises the question of why samtools points at version 0.1.18 in the BioITeam resources rather than the module version. To understand this, consider the following information about the module system and the $PATH variable:
...
- When you type a command, only locations that are in your PATH variable are searched for an executable command matching that name.
- When the command is found in your PATH, the computer immediately substitutes the location it was found for the short command name you entered, and stops searching.
- This means that things that are early in your path are always searched first. In some extreme cricumstances if you add a bunch of locations with lots of files to the front of your PATH, you can actually slow down your entire computer, so try to limit the path variable to only look in directories containing executables.
- The module system always assumes that when you load a module, you intend to use it, and thus puts the executables for that module at the front of your PATH.
- In your .bashrc file, modules are loaded first (including samtools).
- After modules are loaded, we further manipulate your PATH variable several times. The last section involving breseq has 2 alternative manipulations:
The first which you can see we have commented out:
No Format # export PATH=$BI/breseq/bin:$PATH
- This command says make the variable PATH equal to the variable BI plus /breseq/bin and then add on the existing value of $PATH
The second we actually use.
No Format export PATH=$PATH:$BI/breseq/bin
- This command says make the variable PATH equal to the existing value of $PATH variable and then add on BI plus /breseq/bin
anytimeWarning title One of the most important lessons you can ever learn
(Anytime you manipulate your PATH variable you always want to make sure that you include $PATH on the right side of the equation somewhere separated by : either before it, after it, or on both sides of it if you want it in the middle of 2 different locations. As we are explaining right now, there are reasons and times to put it in different relative places, but if you fail to include it (or include a typo in it by calling it say $PTAH) you can actually remove access to all existing commands
including the most basic things like "ls" "mkdir" "cd".
- Together all this means that the last modification to your PATH was the addition of a different program (breseq) to your PATH, unexpectedly breseq also includes a samtools distribution making it the first thing that is found when you use the command samtools. This is not an uncommon type of thing for programs to do when they rely on a specific version of a program to run themselves.
...
title | click here for instructions on how to test/verify that this is true |
---|
Expand | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Simply reload samtools using the module system, check the version, and where that which version is now being called fromused.
|
...
Warning | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||
Use showq -u to verify you are still on the idev node.
|
...
Expand title What new files were created by these commands? Code Block language bash title List the contents of the output directory ls
Code Block title Expected output NC_012967.1.fasta-1 # This is SRR030257.sorted.bam.bai NC_012967.1.fasta.fai SRR030257.sam SRR030257.bam SRR030257.sorted.bam
Expand title Why didn't we name the output SRR030257.sorted.bam in the samtools sort command? Samtools appends an extra .bam to whatever we put here, so it would have created SRR030257.sorted.bam.bam, and then we would have had to make a joke about the Flintstones. the number 1 not the letter L. Several people were confused about this yesterday, it is not necessary for any reason other than to give a single column of output regardless of window size.
Code Block title Expected output NC_012967.1.fasta NC_012967.1.fasta.fai SRR030257.bam SRR030257.sam SRR030257.sorted.bam SRR030257.sorted.bam.bai
Expand title Can you guess what a *.bai file is? Sure enough, it's the index file for the BAM file.
...
Expand | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
...
Code Block | ||
---|---|---|
| ||
bcftools viewcall -v -c -g SRR030257.bcf > SRR030257.vcf |
...
Expand | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
|
...
Expand | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||
Try looking at grep --help to see what you can come up with.
|
...
Follow the same directions to call variants in the BWA or Bowtie2 mapped reads with the improved quality. Just be sure you don't write over your old files. Maybe create new directories like BDIB_samtools_bwa
and BDIB_samtools_bowtie_improved
for the output in each case. As another fun version control issue, using the module system you can see that bowtie actually has 2 versions available: 1.1.2 and 2.2.6 ... these version could not be more different to the point where they should really be named different things. bowtie 1.1.2 is included only to keep old scripts that rely on it running, and is surpassed in everyway by bowtie2. If so desired, you could figure out how to run it yourself as a test of learning an unfamiliar program, or you could include any the output from said run from the precanned location below:
Warning | ||
---|---|---|
If you do not have the output from the Mapping tutorial, run the first 4 commands to copy over the output that would have been produced. Then, you can immediately start this tutorial!
These precanned results will be used in the optional upcoming bedtools tutorial as well, or you can simply compare the output .vcf files for more simple answers |
...
- Which mapper finds more variants?
- Can you figure out how to filter the VCF files on various criteria, like coverage, quality, ... ?
- How many high quality mutations are there in these E. coli samples relative to the reference genome?
- Look at how the reads supporting these variants were aligned to the reference genome in the Integrative Genomics Viewer (IGV). Look into more sophisticated variant calling with GATK. We recommend starting from the GATK best practice pageThis will be a separate tutorial for tomorrow.
Module version of samtools
...