Installing Linux tools
OK. So you just read the latest issue of Bioinformatics (or did a Google search) and have discovered some new pieces of software that promise to slice and dice your data in new, interesting, and useful ways. Most often, these tools will be designed to run in a Linux environment. Unfortunately, the helpful support staff at TACC may not have had time to test these tools and make a proper module out of them (or maybe they didn't want to make 1,000+ modules for every piece of bioinformatics software out there). Perhaps there is a TACC module, but it was made a month or two back when the software was at version 1.01 and now it's at version 1.03, which has a bug fix or some nifty new bell and whistle.
The bottom line is that you are going to find yourself in a situation where module spider
will come up empty and you're on your own to get the piece of software that you are dying to run on TACC.
Unfortunately, there is no double-click installer for TACC. Fortunately, a majority of the better and more mature programs out there (but by no means all bioinformatics software) can be readily installed in a few similar ways.
The overall steps for installing a program on a Linux system are:
- Download the executable or source code
- Compile or make the project (if installing from source code)
- Set up your
$PATH
to find the new executable
Note: Most Linux installs will work similarly on MacOSX, with just a few additional preambles (install XCode, maybe some extra libraries, etc). With much more extra work, it is possible to set up a Linux like environment in Windows as well. Both of these topics are outside the scope of what we are going to cover here.
Case 1: Installing a precompiled binary (executable)
For programs that are already compiled (converted from high level source code in a language like C into machine specific code), you are often given some choices download the version that has the correct CPU architecture for your machine.
You can get your CPU architecture with this command:
login1$ arch
Output might be something like i386 (for my MacBook) or x86_64 (for Lonestar).
Example: Install SSAHA2 precompiled binary
The website for the SSAHA2 read mapper has links to download several different architectures. Using commands that you have learned in earlier lessons, download the correct one to Lonestar and place it under the directory $HOME/local/bin
.
How the shell finds executables: $PATH
Now, you might want to tell your login shell that it should look for executable files in this new directory $HOME/local/bin
. This will allow you to use the executable as a one-word command like you are used to:
login1$ ssaha2
Instead of writing out the entire path to the executable to run it, like this:
login1$ /home1/01502/jbarrick/local/bin/ssaha2
Assuming you are using the bash shell, you can do this by editing your $HOME/.profile_user configuration file. This configuration file is basically just a bash script that is run whenever you log in. You want to add a line that looks like this:
export $PATH="$HOME/local/bin:$PATH"
This sets the environmental variable PATH
to point to its old value with your new directory appended to the front (the : separates multiple paths). This means the shell will look for executables in this new location first, then it will look in all of the standard locations after that. For more information on environmental variables see the Bash Beginner's Guide.
Important! In order to have this change take effect, you must log out or log in again to force the shell to re-read the ~/.profile_user
file. (Alternately, you can use the command source ~/.profile_user
to re-read it at any time.)
If your path is not working or you're curious about where else your shell is looking for commands and the order, then you might want to see the value of your $PATH
.
login1$ echo $PATH login1$ env
Warning! If you forget to include $PATH
in the above example, then you will tell your shell to not look in the usual places for executables any more. This means that ls
, cd
, and other common commands will no longer work without typing out their whole paths, e.g. /bin/ls
. This can be extremely confusing.
Handling multiple versions If you install a newer version of a command that is already available on TACC for yourself, then you might get confused about what version you are running when you type the command. You can see the whole path to the executable that will be run when you type a command using the which
command.
login1$ which ssaha2
Many tools will also have a -v
or --version
flag, or output their version information in a header when they are run. This can help you be sure that you are running the version that you think you are.
login1$ ssaha2 -v
Case 2: Install from the source code
Example: Install breseq from a source code archive
breseq uses the common GNU build system install sequence. If you install any GNU tools then the ./configure; make; make install
sequence will be used.
$login1 wget http://breseq.googlecode.com/files/breseq-0.17d.tar.gz $login1 tar -xvzf breseq-0.17d.tar.gz $login1 cd breseq-0.17d $login1 ./configure --prefix=$HOME/local $login1 make $login1 make install
The extra option --prefix
sets where the executable and any other files associated with the program will be installed. If you leave off this flag, then it will try to install them in a system-side location. You must have administrator privileges to do this and would generally have to substitute sudo make install
for the last step to get this to work. That won't work on TACC.
For some other tools you may skip straight to make
or have to follow other instructions or install some other tools that the tool you want to use needs to run in addition. Generally, you can find this information in the online documentation or an INSTALL
file in the root of the downloaded code.
Finally, remember to
Other Cases
In other lessons we'll cover various deviations and elaborations on these two procedures in order to install specific programs.