Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This portion of the class is devoted to making sure we are all starting from the same starting point on lonestarstampede. This tutorial was developed as a combined version of multiple other tutorials which were previously given credit here. Anyone wishing to use this tutorial is welcome.

...

So you may be asking yourself what the point of using stampede2 is at all if it is wrought with so many issues. The answer comes in the form of compute nodes. There are nearly 6,000 compute nodes with different configurations that can only be accessed by a single person for a specified amount of time. For the duration of the class, each student will interact with a single compute node using an interactive DEVelopment (iDEV) session so that you get immediate feedback of seeing commands being run and know when to use the next command. This is not the typical way you will analyze your own data. Friday's tutorial will deal with the queue system.

While stampede2 is tremendously powerful and will greatly speed up your analysis, it doesn't have much in the way of a GUI (graphical user interface). The lack of a GUI means it can't visualize graphs or other meaningful representations of our data that we are used to seeing. In order to do these types of things, we have to get our data off of stampede2 and onto our own computers. This course uses the scp ("secure copy command") exclusively to move files back to your local computer, as mentioned there are other programs that can be configured to more easily transfer files back and forth as you progress in your analysis.

...

If (or when) you looked at what our edits to the .bashrc file did, you would have seen that section 1 has a series of "module load XXXX" commands, and a promise to talk more about them later. I'm sure you will be thrilled to learn that now is that time... As a "classically trained wet-lab biologist" one of the most difficult things I have experienced in computational analysis has been in installing new programs to improve my analysis. Programs and their installation instructions tend (or appear) to be written by computational biologists in what at times feels like a foreign language, particularly when things start going wrong. Here we will discuss 4 3 ways of accessing new commands/programs/scripts and explain their benefit. This is an incomplete list of ways to install new programs to use, but is meant to be a good working example that you can adapt to install other programs in your future work.

...

No Format
The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    fastqc-0.11.9              |       hdfd78af_1         9.7 MB  bioconda
    font-ttf-dejavu-sans-mono-2.37|       h6964260_0         335 KB
    ------------------------------------------------------------
                                           Total:        10.0 MB

The following NEW packages will be INSTALLED:

  fastqc             bioconda/noarch::fastqc-0.11.9-hdfd78af_1
  font-ttf-dejavu-s~ pkgs/main/noarch::font-ttf-dejavu-sans-mono-2.37-h6964260_0
  openjdk            pkgs/main/linux-64::openjdk-8.0.152-h7b6447c_3


Proceed ([y]/n)? y


Downloading and Extracting Packages
fastqc-0.11.9        | 9.7 MB    | ####################################################################################################################################################################################### | 100% 
font-ttf-dejavu-sans | 335 KB    | ####################################################################################################################################################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Github

This is about using the git clone command. Git is a command often used for collaborative program development or sharing of files. Some groups also put the programs or scripts associated with a particular paper on a github project and publish the link in their paper or on their lab website. 

There are three commonly used methods to verify you have a given program installed. You should try all three in order for the fastqc program:

  1. Code Block
    languagebash
    titleThe 'which' command can be used to search your $PATH variable for a command with a specific name, and return the location the command is stored in
    which fastqc
  2. Code Block
    languagebash
    titleMany commands accept an option of '--version' to simply access the program and return what version of the program is installed
    fastqc --version
  3. Code Block
    languagebash
    titleNearly all commands/programs accept "-h" or "--help" options to give you basic information about how the command or program works
    fastqc --help

Throughout the course, you will routinely use the above 3 commands to make sure that you have access to a given program, that it is the correct version, and to get an idea of how to construct commands to perform a given analysis step. For now, be satisfied that if you get output that is not the following that you have correctly installed fastqc. In the next tutorial we will actually use fastqc. Examples of output you do not want to see to the above commands:

  1. /usr/bin/which: no fastqc in (<large list of directories specific to your TACC account>)

  2. -bash: fastqc: command not found

  3. -bash: fastqc: command not found

Additional common methods of getting files onto TACC

Github

This is about using the git clone command. Git is a command often used for collaborative program development or sharing of files. Some groups also put the programs or scripts associated with a particular paper on a github project and publish the link in their paper or on their lab website. Github repositories are a great thing to add to a single location in your $WORK2 directory.

Here we will clone the github repository for breseq which is developed by the Barrick lab here at UT and is used to comprehensively analyze haploid microbial genomes to identify all variants present. In some of the initial tutorials everyone will use a version of breseq that is available through the BioITeam, in the optional tutorials you may compile your own copy of breseq from this github project to underscore why binary files are typically preferred, or as a way of easily staying up to date on new developments with the program itself.the E. coli Long-Term Evolution Experiment (LTEE) originally started by Dr. Richard Lenski. These files will be used in some of the later tutorials, and are a good source of data for identifying variants in NGS data as the variants are well documented, and emerge in a controlled manner over the course of the evolution experiment. Initially cloning a github repository as exceptionally similar to using the wget command to download the repository, it involves typing 'git clone' followed by a web address where the repository is stored. As we did for installing trimmomatic miniconda, with wget we'll clone the repository into a 'src' directory inside of $WORK$WORK2.

Code Block
languagebash
titleUsing the mkdir command to create a folder named 'src' inside of your $WORK directory
collapsetrue
cd $WORK$WORK2
mkdir src
cd src

If you already have a src directory, you'll get a very benign error message stating that the folder already exists and thus can not be created. 

In a web browser navigate to github and search for 'breseqLTEE-Ecoli' in the top right corner of the page. The top only result will be for barricklab/breseqLTEE-Ecoli; click the green box for 'clone or download' and either control/command + C on the address listed, or click the clipboard icon to copy the repository address. This image may be helpful if you are are having trouble locating the green box

...