...
After explaining the module system which we will use extensively throughout the course, we'll install 3 separate programs that we may use later in the class via 3 different means. This is an incomplete list of ways to install new programs to use, but is meant to be a good working example that you can adapt to install other programs in your future work. If you choose to do one of the optional tutorials that involve the programs installed here the program installation will be covered in more detail at that time.
TACC modules
Modules are programs or sets of programs that have been set up to run on TACC. They make managing your computational environment very easy. All you have to do is load the modules that you need and a lot of the advanced wizardry needed to set up the linux environment has already been done for you. New commands just appear.
...
It is always an alternative to download such files directly to your computer using a web browser and then use the scp command to transfer it to TACC. The wget command can help you avoid these intermediate steps and is more convenient most of the time unless you want to install the program on both your laptop and TACC, and have the same operating system on both.
Github
This is done about using the git clone
command. Git is a command often used for collaborative program development or sharing of files. Some groups also put the programs or scripts associated with a particular paper on a github project and publish the link in their paper or on their lab website.
Here we will clone the github repository for breseq which is developed by the Barrick lab here at UT and is used to comprehensively analyze haploid microbial genomes to identify all variants present. In some of the initial tutorials everyone will use a version of breseq that is available through the BioITeam, in the optional tutorials you may compile your own copy of breseq from this github project to underscore why binary files are typically preferred, or as a way of easily staying up to date on new developments with the program itself.
Initially cloning a github repository as exceptionally similar to using the wget
command to download the repository, it involves typing 'git clone
' followed by a web address where the repository is stored. As we did for installing trimmomatic with wget we'll clone the repository into a 'src' directory inside of $WORK.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
cd $WORK
mkdir src
cd src |
If you already have a src directory, you'll get a very benign error message stating that the folder already exists and thus can not be created.
In a web browser navigate to github and search for 'breseq' in the top right corner of the page. The top result will be for barricklab/breseq; click the green box for 'clone or download' and either control/command + C on the address listed, or click the clipboard icon to copy the repository address. This image may be helpful if you are are having trouble locating the green box.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
git clone https://github.com/barricklab/breseq.git |
You will see several download indicators increase to 100%, and when you get your command prompt back the ls
command will show a new folder named 'breseq' containing a set of files. As with Trimmomatic, these files will require additional work that is somewhat specific to the specific program and there for beyond the scope of this tutorial. A link to the advanced tutorials for getting your own copy of breseq up and running will be added later in the week.
pip
This is about using the pip3 install
command. pip is the standard package manager for the common programing language python. When labs put together new analysis programs/packages, increasingly they try to make these programs available for others to use via pip. pip3 rather than just pip references the current version of python.
...