Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Tip
titleUsing the version numbers for module commands

While not always strictly necessary, using the version number (in this case "/2.3.2") is a very good habit to get into as it controls what version is to be loaded. In this case the because there are 2 very different versions available (2.3.2 and 1.2.1.1)  module load bowtie will actually throw an error which tells you to use the module spider command to figure out how to correctly load the module. 

While it is tempting to only use "module load name" without the version numbers, using the version numbers can help keep track of what versions were used for referencing in your future publications, and make it easier to identify what went wrong when scripts that have been working for months or years suddenly stop working (ie TACC changed the default version of a program you are using).

This is one of the big advantages of using the conda system we will describe shortly, it easily keeps track of all versions of all programs you use.


 Since the module load command doesn't give any output, it is often useful to check what modules you have installed with either of the following commands:

...

Code Block
languagebash
titleUsing the mkdir command to create a folder named 'src' inside of your $WORK2 $WORK directory
collapsetrue
cd $WORK2$WORK
mkdir src
cd src


Code Block
languagebash
titleUse the wget command to download the linux installer directly to your current directory
collapsetrue
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2latest-Linux-x86_64.sh

You should see a download bar showing you the file has begun downloading, when complete the ls command will show you a new compressed file named 'Miniconda3-py39_4.9.2latest-Linux-x86_64.sh'

Using scp.

This is not necessary if you followed the wget commands above. Again In a new browser or tab you would navigate to https://docs.conda.io/en/latest/miniconda.html but instead of right clicking on the "Miniconda3 Linux 64-bit" in the linux installers section and choosing copy link address you would simply left click and allow the file to download directly to your browser's Downloads folder. Using information from the SCP tutorial you would then transfer the local 'Miniconda3-py39_4.9.2latest-Linux-x86_64.sh' file to the stampede2 remote location '$WORK2/src'. Note that you are downloading a file that will work on TACC, but not on your own computer. Don't get confused thinking you need windows or mac versions.

Given that the wget command doesn't involve having to use MFA, or the somewhat cumbersome use of 2 differnt different windows, and is subject to many fewer typos, hopefully you see how wget is preferable provided left clicking on a link directly downloads a file.

...

Code Block
languagebash
titleThe following command is then used to install miniconda
bash Miniconda3-py39_4.9.2latest-Linux-x86_64.sh
logout
#log back in using the ssh command. 
conda config --set auto_activate_base false

...

For help with the ssh command please refer back to Windows10 or MacOS tutorials. If you log out and back in 1 more time, what do you notice is different?

...

Code Block
languagebash
titleusing the conda create command, make a new environment named "GVA2021GVA-fastqc", and activate it
conda create --name GVA2021GVA-fastqc
# enter 'y' to proceed
conda activate GVA2021GVA-fastqc

This will once again change your prompt. This time the expected prompt is:

No Format
(GVA-fastqc) tacc:~$

Again if you see something different, you need to get the instructors attention. For the rest of the course, it is assumed that your prompt will start with (GVA2021GVA-program_name) if not, remember that you need to use the conda activate GVA2021 command GVA-program_name command to enter the environment.

...

The anaconda or miniconda interfaces to the conda system is becoming increasingly popular for controlling one's environment, streamlining new program installation, and tracking what versions of programs are being used. A comparison of the two different interfaces can be found here. The The deciding factor on which interface we will use is hinted at, but not explicitly stated in the referenced comparison: TACC does not have a GUI and therefore anacondaa anaconda will not work, which is why we installed miniconda above.

Similar to the module system that TACC uses, the "conda" system allows for simple commands to download required programs/packages, and modify environmental variables (like $PATH discussed above). Two huge advantages of conda over the module system, are: #1 instead of relying on the employees at TACC to take a program and package it for use in the module system, anyone (including the same authors publishing a new tool they want the community to use) can create a conda package for a program; #2 rather than being restricted to use on the TACC clusters, conda works on all platforms (including windows and macOS), and deal with all the required dependency programs in the background for you. 

...

Code Block
languagebash
titleattempt to install the fastqc program using conda
conda activate GVA2021 GVA-fastqc
conda install fastqc

If you have already activated your GVA2021 GVA-fastqc environment, the first line will not do anything, but if you have not, you will see your promt has changed to now say (GVA2021GVA-fastqc) on the far left of the line. As to the second command, like we saw with the module system above, things aren't quite this simple. In this particular case, we get a very helpful error message that can guide our next steps:

...

More information about "channels" can be found here. By the end of this course you may find that the 'bioconda' channel is full of lots of programs you want to use, and may choose to permanently add it to your list of channels so the above command conda install fastqc and others used in this course would work without having to go through the intermediate of searching for the specific installation commands, or finding what channel the program you want is in. Information about how to do this, as well as more detailed information of why it is bad practice to go around adding large numbers of channels can be found here. Similarly, when we get to the read mapping tutorial, we will go over the conda-forge channel which is also very helpful to have.

For now, use the error message you saw above to try to install the fastqc program yourself.

...

This is about using the git clone command. Git is a command often used for collaborative program development or sharing of files. Some groups also put the programs or scripts associated with a particular paper on a github project and publish the link in their paper or on their lab website. Github repositories are a great thing to add to a single location in your $WORK2 $WORK directory.

Here we will clone the github repository for the E. coli Long-Term Evolution Experiment (LTEE) originally started by Dr. Richard Lenski. These files will be used in some of the later tutorials, and are a good source of data for identifying variants in NGS data as the variants are well documented, and emerge in a controlled manner over the course of the evolution experiment. Initially cloning a github repository as exceptionally similar to using the wget command to download the repository, it involves typing 'git clone' followed by a web address where the repository is stored. As we did for installing miniconda, with wget we'll clone the repository into a 'src' directory inside of $WORK2$WORK.

Code Block
languagebash
titleUsing the mkdir command to create a folder named 'src' inside of your $WORK2 directory
collapsetrue
cd $WORK2$WORK
mkdir src
cd src

If you already have a src directory, you'll get a very benign error message stating that the folder already exists and thus can not be created. 

...

This concludes the the linux and stampede2 refresher/introduction tutorial.

Genome Variant Analysis Course 2021 2022 home.