Breseq - installation GVA2020
Overview
As we have seen though several points in the course, controlling for different versions of different programs can cause headaches. In this tutorial you will install your own copy of the breseq analysis pipeline. Additionally (and more importantly), you will install an updated version of bowtie2 which will address a bug in bowtie2 that prevents breseq from being able to run on multiple threads.
Learning objectives
- Check versions of bowtie2 and breseq to verify this tutorial is necessary
- upgrade bowtie2
- clone your own copy of breseq
Checking installations and versions
which -a bowtie2 bowtie2 --version
I expect the which command gives 1 line of data listing '/opt/apps/intel18/bowtie/2.3.4/bin/bowtie2' and the 2nd command gives several lines of output that includes 'version 2.3.4' in the first line, you will need to upgrade bowtie2 to a different version. This tutorial will go over installing bowtie 2.3.5.1 as this is version is known to address the error while more recent versions are unchecked.
which -a breseq breseq --version
I expect the which command gives 1 line of data listing '/corral-repl/utexas/BioITeam/breseq/bin/breseq' and the 2nd command gives 'breseq 0.35.1', if so you do not have to clone your own copy of breseq, but it is still encouraged particularly if you envision using breseq in your own work outside of this course.
Warning against an idev node
Unlike most warnings you get about idev nodes during this class, this one is actually a warning against being on an idev node as idev sessions typically download files slower.
Upgrading bowtie2
mkdir $WORK/src cd $WORK/src wget https://github.com/BenLangmead/bowtie2/releases/download/v2.3.5.1/bowtie2-2.3.5.1-linux-x86_64.zip unzip bowtie2-2.3.5.1-linux-x86_64.zip
Next we need to make sure that that version of bowtie2 is in your path variable. Do one of the following:
Make 2 changes to your .bashrc file using nano
modify your your .bashrc file in your $HOME directory# comment out the line in the module system listing module load bowtie/2.3.4 # add the following line in the section dealing with the path variable export PATH=$PATH:$WORK/src/bowtie2-2.3.5.1-linux-x86_64 #temp bowtie2 executables
Copy the updated version of the .bashrc file from $BI/scripts
Preferred solutioncp /corral-repl/utexas/BioITeam/scripts/GVA2020.bashrc.updated_bowtie2 $HOME/.bashrc chmod 700 .bashrc
Finally, log out of tacc and log back in using ssh.
which -a bowtie2 bowtie2 --version
should now return 1 line similar to '/work/01821/ded/lonestar/src/bowtie2-2.3.5.1-linux-x86_64/bowtie2' and the first line of the 2nd command end with "version 2.3.5.1". If not get my attention on zoom.
Testing what happens if you load the bowtie/2.3.4 module
As mentioned in an earlier tutorial, when you load a module TACC assumes you are about to use it and therefore appends the directories associated with the module at the front of the $PATH. Consider the output to the following block of code to figure out how you could rerun the mapping tutorial with bowtie2 version 2.3.4.
which -a bowtie2 module load bowtie/2.3.4 which -a bowtie2 module unload bowtie/2.3.4 which -a bowtie2
Remember, when you use the -a option on a which command and have multiple lines of output, it is always the top line that is executed.
Cloning breseq
As mention above, this is not required to complete other breseq based tutorials in the course, however, it is highly recommended for anyone who anticipates using breseq in their own work. Initially, cloning a github repository as exceptionally similar to using the wget
command to download bowtie2 above, it involves typing 'git clone
' followed by a web address where the repository is stored. As we did for installing bowtie2 with wget we'll clone the repository into a 'src' directory inside of $WORK.
In a web browser navigate to github and search for 'breseq' in the top right corner of the page. The top result will be for barricklab/breseq; click the green box for 'clone or download' and either control/command + C on the address listed, or click the clipboard icon to copy the repository address. This image may be helpful if you are are having trouble locating the green box.
You will see several download indicators increase to 100%, and when you get your command prompt back the ls
command will show a new folder named 'breseq' containing a set of files. Alternatively, you may get an error message saying "fatal: destination path 'breseq' already exists and is not an empty directory." This means you have previously clone the repository, don't worry that just means when you use the 'git pull' command in the compiling breseq section you will get a message saying a bunch of changes are being made rather than a message of 'already up to date' that a fresh cloning will show.
If you don't see said directory, or can't cd into that directory let the instructor know.
You may be thinking that this really does seem remarkably similar to the wget downloading you did for bowtie2 and wondering why you just don't do that
The answer is that in the future if you want to upgrade to the latest version of breseq, rather than having to navigate to the github page, check if the version is different, copy the download link, you can now use the 'git pull' command to check if there is a new version available and automatically download it. Further git control allows you to more quickly roll back to an older version if you need to (say you want to add another small set of samples to an existing analysis that you did a year ago without wanting to have to rerun all the old samples).
Compiling breseq
Much like how downloading bowtie2 was not enough to make it usable, the same will be true of breseq. Unlike bowtie2, rather than just editing our path or moving executable files around we have to compile the code.
mkdir -p $HOME/local/bin module unload bowtie/2.3.4 cd $WORK/src/breseq git pull make clean ./bootstrap.sh ./configure --prefix=$HOME/local make make install
The above will take several minutes to complete, but should always be printing something something new to the screen within 30 seconds or less.
Testing breseq
make test
This command is expected to take a total of ~13 minutes with no one step should take more than 30-60 seconds. Unfortunately, with all the the text scrolling around on the screen, it makes it difficult to notice that there are actually a number of different tests being conducted each of which can pass or fail as a way of informing you what exactly is going wrong. Just before you get a final report of how long the command took, you see a block of text that reads:
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO Passed check OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
It is a good idea to use 'find' option like you would on a web browser hitting command F to bring up a find box and hit the back arrow several times looking for the word ' check' which IS preceded by a space. I expect this will show all tests passed with 1 test failing in an area that is being actively developed currently relating to applying mutations to 'gbk' files. This should not cause any issues for your work in the course our outside of it.
Your installation has likely failed and you need to get my attention if:
You see multiple tests have failed, or the make test command takes less than 5 minutes
Next steps
Now that you have your own copy of breseq, you can:
- Go back to the intro breseq tutorial and map the set of data you worked through the mapping and SNV discovery process so you can compare it directly to the results you saw in the IGV tutorial.
- You may also move onto the advanced breseq tutorial.
- While it doesn't seem that it is directly applicable to anyone's work this year, there is also a tutorial that deals with molecular indexes and lower error rates that uses breseq.
- You could combine all the different parts of the required tutorials, and go back to the read processing tutorial and trim both read1 and read2, then run the improved samples through breseq, and compare the results to running those same files through bowtie2 tutorial and SNV tutorials separately.
- If you aren't sure what you should be working on, as always, just ask and I'll give some recommendations.
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.