Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

As we have done with: fastqc, cutadapt, and bowtie2, we want to install samtools. Unlike with the previous tools, there is a difficulty this time. If you use and arrive at https://anaconda.org/bioconda/samtools you will easily assume that the correct command you need is: conda install -c bioconda samtools as this has been the command that has worked for our other tools so far. Instead the correct command is actually: conda install -c bioconda samtools bcftools openssl=1.0 

There are 2 different things going on in this command.

  1. Forcing the installation of a specific version of openssl. In this case, a lower version than would normally be installed if samtools were installed by itself. According to https://github.com/bioconda/bioconda-recipes/issues/12100 my understanding is that when the conda package was put together there is an error wherein samtools specifically says to get version 1.1 of openssl, but the samtools program specifically requires version 1.0 to be present.
  2. We are installing both samtools and bcftools at the same time. This can clean up some installation problems when there are conflicts between individual packages and you want to use them in a single environment. An alternative would be to have a samtools environment and a bcftools environment, but that creates unnecessary steps of having to change environments in the middle of your analysis.


Expand
titleClick here to expand and see what the outcome of the assumed installation command is, what the problem is, and steps you could take to fix it.
Warning
titleThis box contains example commands and outputs showing you something that does NOT work for educational and diagnostic purposes. If you use the code listed in this box, be sure you use ALL the code or you may run into downstream problems with this tutorial.


Code Block
languagebash
conda install -c bioconda samtools
samtools --version

The above command will appear to install correctly as other programs have, but the second command which you would expect to show you the version of samtools instead returns the following error:

No Format
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

Googling the entire error the top results clearly mention conda and several pages list problems associated "fixes" with different conda installation commands.

Some of the "fixes" involve adding both bioconda and conda-forge to your channel list and forcing specific access orders. As noted earlier, adding channels to the default search list is something you can consider, but has the drawback of leading you to potentially install something you didn't mean to making me a fan of being explicit in what channels I'm accessing. More on this on Friday when we discuss what tools to use.

Other "fixes" involve commands that may work, but at the expense of altering existing packages in our environment (conda install -c bioconda samtools=1.9 --force-reinstall) while this command may work based on community feedback, the number of programs that would be downgraded is concerning to me. If I were going to go this route, I would instead copy my existing conda environment to a new "test" environment (conda create --name GVA2021-samtools-test --clone GVA2021), run the struck-through command listed above, and make sure it works as expected before weighing what exact programs were being downgraded and what the immediate effects would be on other analysis. Most likely I would ultimately rename the environment to something more permanent, and keep samtools as a separate step (conda create --name GVA2021-samtools --clone GVA2021-samtools-test; conda env remove --name GVA2021-samtools-test)

Rather than going through all that. My solution is simply to install an older version of samtools deliberately from the start as several webpages (including: https://github.com/bioconda/bioconda-recipes/issues/13958 suggest that the issue is specific to the 1.12 version). In order to do this, I first had to remove the existing (incorrect) samtools version.

Code Block
languagebash
conda remove samtools
conda install -c bioconda samtools==1.11
samtools --version

which now gives a reasonable output of:

No Format
samtools 1.11
Using htslib 1.11
Copyright (C) 2020 Genome Research Ltd.

Unfortunately, this would then create a downstream problems with installing bcftools and a different set of conflicts. Therefore we will again remove our (now functioning) samtools package, and install samtools, bcftools, and openssl version 1.0 in a single command.

Code Block
languagebash
conda remove samtools
conda install -c bioconda samtools bcftools openssl=1.0

...

Now we use the mpileup command from samtools to compile information about the bases mapped to each reference position. The output is a BCF file. This is a binary form of the text Variant Call Format (VCF). For more information about VCF files: https://docs.gdc.cancer.gov/Data/File_Formats/VCF_Format/

Code Block
languagebash
titleYou should only execute *one* of these commands
samtools mpileup -u -f NC_012967.1.fasta SRR030257.sorted.bam > SRR030257.bcf
bcftools mpileup -O u -f NC_012967.1.fasta SRR030257.sorted.bam -o bcftools.SRR030257.bcf

...

Take a look at the SRR030257.vcf file using less. It has a nice header explaining what the columns mean, including answers to some of your questions from yesterday's presentations. Below https://docs.gdc.cancer.gov/Data/File_Formats/VCF_Format/ can be used to figure out the columns are and what types of information they provide. Below this are the rows of data describing potential genetic variants.

...