Table of Contents |
---|
...
SAMtools is a suite of commands for dealing with databases of mapped reads. You'll be using it quite a bit throughout the course. It includes programs for performing variant calling (mpileup-bcftools). This tutorial expects you have already completed the Mapping tutorial.
Learning Objectives
- Gain important insight into version control and $PATH issuesWork with a more complex conda installation, and how to troubleshoot it.
- Familiarize yourself with SAMtools.
- Use SAMtools to identify variants in the E. coli genomes we mapped in the previous tutorial.
...
As we have done with: fastqc, cutadapt, and bowtie2, we want to install samtools. Unlike with the previous tools, there is a difficulty this time. If you use and arrive at https://anaconda.org/bioconda/samtools you will easily assume that the correct command you need is: conda install -c bioconda samtools
as this has been the command that has worked for our other tools so far. Instead the correct command is actually: conda install -c bioconda samtools bcftools openssl=1.0
There are 2 different things going on in this command.
- Forcing the installation of a specific version of openssl. In this case, a lower version than would normally be installed if samtools were installed by itself. According to https://github.com/bioconda/bioconda-recipes/issues/12100 my understanding is that when the conda package was put together there is an error wherein samtools specifically says to get version 1.1 of openssl, but the samtools program specifically requires version 1.0 to be present.
- We are installing both samtools and bcftools at the same time. This can clean up some installation problems when there are conflicts between individual packages and you want to use them in a single environment. An alternative would be to have a samtools environment and a bcftools environment, but that creates unnecessary steps of having to change environments in the middle of your analysis.
Expand | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||
The above command will appear to install correctly as other programs have, but the second command which you would expect to show you the version of samtools instead returns the following error:
Googling the entire error the top results clearly mention conda and several pages list problems associated "fixes" with different conda installation commands. Some of the "fixes" involve adding both bioconda and conda-forge to your channel list and forcing specific access orders. As noted earlier, adding channels to the default search list is something you can consider, but has the drawback of leading you to potentially install something you didn't mean to making me a fan of being explicit in what channels I'm accessing. More on this on Friday when we discuss what tools to use. Other "fixes" involve commands that may work, but at the expense of altering existing packages in our environment Rather than going through all that. My solution is simply to install an older version of samtools deliberately from the start as several webpages (including: https://github.com/bioconda/bioconda-recipes/issues/13958 suggest that the issue is specific to the 1.12 version). In order to do this, I first had to remove the existing (incorrect) samtools version.
which now gives a reasonable output of:
Unfortunately, this would then create a downstream problems with installing bcftools and a different set of conflicts. Therefore we will again remove our (now functioning) samtools package, and install samtools, bcftools, and openssl version 1.0 in a single command.
|
...
Now we use the mpileup
command from samtools
to compile information about the bases mapped to each reference position. The output is a BCF file. This is a binary form of the text Variant Call Format (VCF).
...
For more information about VCF files: https://docs.gdc.cancer.gov/Data/File_Formats/VCF_Format/
Code Block | ||||
---|---|---|---|---|
| ||||
samtools mpileup -u -f NC_012967.1.fasta SRR030257.sorted.bam > SRR030257.bcf bcftools mpileup -O u -f NC_012967.1.fasta SRR030257.sorted.bam -o bcftools.SRR030257.bcf |
...
Expand | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
|
The samtools mpileup
command will take a few minutes to run. As practice for a fairly common occurrence when working with the iDEV environment, once the command is running, you could try putting it in the background by pressing control-z
and then typing the command bg
so that you can do some other things in this terminal window at the same time. Remember, there are still many other processors available on this node for you to do other things! Just remember that if you have something running in the background you need to check in with it to see if it is still running with the ps
command or watch the command line every time you execute a new command as you may see information about your background task having finished. Alternatively, you could go back to the beginning of this tutorial and read through some examples of trying to install other versions of samtools and troubleshooting conda installations while you wait for the command to finish.
Convert genome variants to human readable format
Once the mpileup command is complete, convert the BCF file to a "human
Info |
---|
The |
Convert genome variants to human readable format
Once the mpileup command is complete, convert the BCF file to a "human-readable" VCF file using a bcftools command.
...
Take a look at the SRR030257.vcf
file using less
. It has a nice header explaining what the columns mean, including answers to some of your questions from yesterday's presentations. Below . https://docs.gdc.cancer.gov/Data/File_Formats/VCF_Format/ can be used to figure out the columns are and what types of information they provide. Below this are the rows of data describing potential genetic variants.
...