Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

SAMtools is a suite of commands for dealing with databases of mapped reads. You'll be using it quite a bit throughout the course. It includes programs for performing variant calling (mpileup-bcftools). This tutorial expects you have already completed the Mapping tutorial.

Learning Objectives

  1. Gain important insight into version control and $PATH issuesWork with a more complex conda installation, and how to troubleshoot it.
  2. Familiarize yourself with SAMtools.
  3. Use SAMtools to identify variants in the E. coli genomes we mapped in the previous tutorial.

...

As we have done with: fastqc, cutadapt, and bowtie2, we want to install samtools. Unlike with the previous tools, there is a difficulty this time. If you use and arrive at https://anaconda.org/bioconda/samtools you will easily assume that the correct command you need is: conda install -c bioconda samtools as this has been the command that has worked for our other tools so far. Instead the correct command is actually: conda install -c bioconda samtools bcftools openssl=1.0 

There are 2 different things going on in this command.

  1. Forcing the installation of a specific version of openssl. In this case, a lower version than would normally be installed if samtools were installed by itself. According to https://github.com/bioconda/bioconda-recipes/issues/12100 my understanding is that when the conda package was put together there is an error wherein samtools specifically says to get version 1.1 of openssl, but the samtools program specifically requires version 1.0 to be present.
  2. We are installing both samtools and bcftools at the same time. This can clean up some installation problems when there are conflicts between individual packages and you want to use them in a single environment. An alternative would be to have a samtools environment and a bcftools environment, but that creates unnecessary steps of having to change environments in the middle of your analysis.


Expand
titleClick here to expand and see what the outcome of the assumed installation command is, what the problem is, and steps you could take to fix it.
Warning
titleThis box contains example commands and outputs showing you something that does NOT work for educational and diagnostic purposes. If you use the code listed in this box, be sure you use ALL the code or you may run into downstream problems with this tutorial.


Code Block
languagebash
conda install -c bioconda samtools
samtools --version

The above command will appear to install correctly as other programs have, but the second command which you would expect to show you the version of samtools instead returns the following error:

No Format
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

Googling the entire error the top results clearly mention conda and several pages list problems associated "fixes" with different conda installation commands.

Some of the "fixes" involve adding both bioconda and conda-forge to your channel list and forcing specific access orders. As noted earlier, adding channels to the default search list is something you can consider, but has the drawback of leading you to potentially install something you didn't mean to making me a fan of being explicit in what channels I'm accessing. More on this on Friday when we discuss what tools to use.

Other "fixes" involve commands that may work, but at the expense of altering existing packages in our environment (conda install -c bioconda samtools=1.9 --force-reinstall) while this command may work based on community feedback, the number of programs that would be downgraded is concerning to me. If I were going to go this route, I would instead copy my existing conda environment to a new "test" environment (conda create --name GVA2021-samtools-test --clone GVA2021), run the struck-through command listed above, and make sure it works as expected before weighing what exact programs were being downgraded and what the immediate effects would be on other analysis. Most likely I would ultimately rename the environment to something more permanent, and keep samtools as a separate step (conda create --name GVA2021-samtools --clone GVA2021-samtools-test; conda env remove --name GVA2021-samtools-test)

Rather than going through all that. My solution is simply to install an older version of samtools deliberately from the start as several webpages (including: https://github.com/bioconda/bioconda-recipes/issues/13958 suggest that the issue is specific to the 1.12 version). In order to do this, I first had to remove the existing (incorrect) samtools version.

Code Block
languagebash
conda remove samtools
conda install -c biocoonda samtools==1.11
samtools --version

which now gives a reasonable output of:

No Format
samtools 1.11
Using htslib 1.11
Copyright (C) 2020 Genome Research Ltd.

Unfortunately, this would then create a downstream problems with installing bcftools and a different set of conflicts. Therefore we will again remove our (now functioning) samtools package, and install samtools, bcftools, and openssl version 1.0 in a single command.

Code Block
languagebash
conda remove samtools
conda install -c bioconda samtools bcftools openssl=1.0

...

Expand
titleWhat are all the options doing? Try calling samtools mpileup without any options to see if you can figure it out before clicking for the answer
Optionpurpose
-ugenerates uncompressed BCF output
-f NC_012967.1.fasta

reference sequence file that has a corresponding faidx index .fai file

SRR030257.sorted.bamBAM input file to calculate pileups from
> SRR030257.bcf

Direct output to SRR030257.bcf file, rather than printing to the screen

-O ulike -u in the samtools command, generates uncompressed BCF output
-o SRR030257.bcfOutput file SRR030257.bcf

...