Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Familiarize yourself with the way course material will be presented.
  2. Log into stampede2.
  3. Change your lonestar profile stampede2 profile to the course specific format.
  4. Refresh understanding of basic linux commands with some course organization.
  5. Review use of the nano text editor program, and become familiar with several other text editor programs.

...

Code Block
languagebash
titleCopy the course provided .profile file and change its name and permissions
collapsetrue
cp /corral-repl/utexas/BioITeam/scriptsgva_course/GVA2021.bashrc .bashrc
cp /corral-repl/utexas/BioITeam/scriptsgva_course/GVA2021.profile .profile
chmod 700 .bashrc
chmod 700 .profile

...

Code Block
languagebash
titleGo log back in to Lonestar
collapsetrue
ssh <username>@ls5<username>@stampede2.tacc.utexas.edu

If everything is working correctly you should now see this as your prompt:  

No Format
tacc:~$
Warning

If you see anything besides "tacc:~$", get my attention and be ready to share your screen rather than continuing forward.

  • Setting up other shortcuts:

In order to make navigating to the different file systems on stampede2 a little easier ($SCRATCH and $WORK), you can set up some shortcuts with these commands that create folders that "link" to those locations. Run these commands when logged into stampede2 with a terminal, from your home directory.

Code Block
titleCreating a shortcut to the main Lonestar working directories
cdh
ln -s $SCRATCH scratch
ln -s $WORK work
ln -s $BI BioITeam

Several people report seeing an error message stating "ln: failed to create symbolic link 'BioITeam/BioITeam': Permission denied." This is being investigated, but is not expected to impact today's tutorial.

  • Understanding what your .bashrc file actually does.

...

titleWhile interesting and useful information to have, understanding it is not critical to variant analysis. I suggest you to look through this information after you complete the rest of the tutorial, in your free time, or when you need to modify your profile or bashrc files in the future.

...

Let's look at what your .bashrc profile actually does. Use the cat command to print contents of the .bashrc file to the screen.

Code Block
languagebash
titlePrint the contents of the .profile file to the screen
cat .bashrc

This will print several lines of text to the terminal window. Let's look at what some of these lines do with a little more information:

...

lines that start with #

  • Any line begins with a # symbol, is "commented out". Anything after a # symbol will not be executed by any program. Programers commonly make use of behavior to leave notes for others, or even themselves at a later date as to what particular lines of a script are actually doing.

...

Section 1 has multiple lines involving "module load <NAME>"

  • This loads different modules by default. We have included basic ones that will help with basic TACC things. After we review the use of the nano text editor we'll go into more depth with TACC modules. But for now trust us when we say that not having to load a bunch of modules every time you log into TACC is a good thing.

  • In previous years the module system was used more extensively. Here we will attempt to rely more on miniconda installations for increased portability.

Section 2 has multiple lines starting with "export"

...

It is also likely or expected that upon logging in you see the following:

No Format
The following have been reloaded with a version change:
  1) impi/18.0.2 => impi/17.0.3     2) intel/18.0.2 => intel/17.0.4     3) python2/2.7.15 => python2/2.7.14

These messages have to do with some of the core compilers and associated tools on TACC. You could use the module spider commands detailed below to find out more information of any of these modules and track down why such changes might be being made, but they are not concerning.


Warning

If you see anything besides "tacc:~$" as your prompt, get my attention and be ready to share your screen rather than continuing forward.



  • Setting up other shortcuts:

In order to make navigating to the different file systems on stampede2 a little easier ($SCRATCH and $WORK), you can set up some shortcuts with these commands that create folders that "link" to those locations. Run these commands when logged into stampede2 with a terminal, from your home directory.

Code Block
titleCreating a shortcut to the main Lonestar working directories
cdh
ln -s $SCRATCH scratch
ln -s $WORK work
ln -s $BI BioITeam

Several people report seeing an error message stating "ln: failed to create symbolic link 'BioITeam/BioITeam': Permission denied." This is being investigated, but is not expected to impact today's tutorial.

  • Understanding what your .bashrc file actually does.

Expand
titleWhile interesting and useful information to have, understanding it is not critical to variant analysis. I suggest you to look through this information after you complete the rest of the tutorial, in your free time, or when you need to modify your profile or bashrc files in the future.
Info

Let's look at what your .bashrc profile actually does. Use the cat command to print contents of the .bashrc file to the screen.

Code Block
languagebash
titlePrint the contents of the .profile file to the screen
cat .bashrc

This will print several lines of text to the terminal window. Let's look at what some of these lines do with a little more information:

  • lines that start with #

    • Any line begins with a # symbol, is "commented out". Anything after a # symbol will not be executed by any program. Programers commonly make use of behavior to leave notes for others, or even themselves at a later date as to what particular lines of a script are actually doing.
  • Section 1 has multiple lines involving "module load <NAME>"

    • This loads different modules by default. We have included basic ones that will help with basic TACC things. After we review the use of the nano text editor we'll go into more depth with TACC modules. But for now trust us when we say that not having to load a bunch of modules every time you log into TACC is a good thing.

    • In previous years the module system was used more extensively. Here we will attempt to rely more on miniconda installations for increased portability.
  • Section 2 has multiple lines starting with "export"

    • The export lines define shell variables for example BI and PATH. You've already seen how using $BI can come in handy accessing our shared course directory. As for PATH, that is a well-known environment variable that defines a set of directories where the shell will look when you type in a program's name. Our shared profile adds the common course directories that we copied at the start of this tutorial and your local ~/local/bin directory (which does not exist yet) to the location list. You can see the entire list of locations by doing this:

      Code Block
      languagebash
      titleHow to see where the bash shell looks for programs
      echo $PATH

      As you can see, there are a lot of locations on the path. That's because when you load modules at TACC (see above), that mechanism makes the programs available to you by putting their installation directories on your $PATH.

  • umask 002

    • The umask command is used to set the default permissions of newly created files and directories limiting the need to use the chmod command. umask functions as the inverse of chmod meaning that it subtracts the values from the default permissions. In this case the command umask 002 is the equivalent of the command chmod 775 for directories, and chmod 664 for files. in summary, having this command in your .profile gives all new files you create read and write access to both you and your group while giving read only access to everyone else.
  • PS1='tacc:\w$ '

    • The PS1='tacc:\w$ ' line is a special setting that tells the shell to display the current directory as part of its prompt. It saves you typing pwd all the time to see where you are in the directory hierarchy. Try using the mkdir command to make a new directory called tmp and change into that directory to see what it does to your prompt.

      Code Block
      languagebash
      titleSee how your prompt reflects your current directory
      collapsetrue
      mkdir tmp
      cd tmp
    • Your prompt should have changed from: "tacc:~$"to now be "tacc:~/tmp$". Your prompt now tells you you are in the tmp subdirectory of your home directory (~). See if you can figure out how to return to your home directory without expanding the code block. Expand the following code block to see the different ways of returning to your home directory.

      Code Block
      languagebash
      titleHow to return to your home directory
      collapsetrue
      cd
      cdh
      cd $HOME
      cd ~
      cd -

      The last example in the above code block will return you to your previous directory. In this case, that means the home directory, but it can be very useful in other situations when you change directories to do something in 1 place then need to hop back to where you were, or if you mistakenly leave a directory.

...

Expand
Komodo Edit for Mac and Windows
Komodo Edit for Mac and Windows

Komodo Edit is another free, full-featured text editor with syntax coloring for many programming languages and a remote file editing interface. It has versions for both Macintosh and Windows. Download the appropriate install image here.

Once installed, start Komodo Edit and follow these steps to configure it:

  • Configure the default line separator for Unix
    • On the Edit menu select Preferences
    • Select the New Files Category
    • For Specify the end-of-line (EOL) indicator for newly created files select UNIX (\n)
    • Select OK
  • Configure a connection to TACC
    • On the Edit menu select Preferences
    • Select the Servers Category
    • For Server type select SFTP
    • Give this profile the Name of stampede2
    • For Hostname enter stampede2.tacc.utexas.edu
    • Enter your TACC user ID for Username
    • Leave Port and Default path blank
    • Select OK

When you want to open an existing file at Lonestarstampede2, do the following:

  • Select the File menu -> Open -> Remote File
    • Select your stampede2 profile from the top Server drop-down menu
    • Once you log in, it should show you all the files and directories in your lonestar $HOME stampede2 $HOME directory
  • Navigate to the file you want and open it
    • Often you will use the work or scratch directory links to help you here

To create and save a new file, do the following:

  • From the Komodo Edit Start Page, select New File
    • Select the file type (Text is good for commands files)
  • Edit the contents
  • Select the File menu -> Save As Other -> Remote File
    • Select your Lonestar profile Stampede2 profile from the Server drop-down menu
    • Once you log in, it should show you all the files and directories in your stampede $HOME directory
  • Navigate to where you want the put the file and save it
    • Often you will use the work or scratch directory links to help you here

...

Note that this may not be an inclusive list as it requires the name of the program, or its description to contain the word "alignment". Looking through the results you may notice some of the programs you already know and use for aligning 2 sequences to each other such as blast and clustalw. Try broadening your results a little by searching for "align" rather than "alignment" to see how important word choice is. When you compare the two sets of results you will see that one of the new results is:

...

Tip
titleUsing the version numbers for module commands

While not always strictly necessary, using the version number (in this case "/2.3.2") is a very good habit to get into as it controls what version is to be loaded. In this case the because there are 2 very different versions available (2.3.2 and 1.2.1.1)  module load bowtie will actually throw an error which tells you to use the module spider command to figure out how to correctly load the module. 

While it is tempting to only use "module load name" without the version numbers, using the version numbers can help keep track of what versions were used for referencing in your future publications, and make it easier to identify what went wrong when scripts that have been working for months or years suddenly stop working (ie TACC changed the default version of a program you are using).

 Since the module load command doesn't give any output, it is often useful to check what modules you have installed with either of the following commands:

Code Block
module list
module list bowtie

The first example will list all currently installed modules while the second will only list modules containing bowtie in the name. If you see that you have installed the wrong version of something, a module is conflicting with another, or just don't feel like having it turned on anymore, use the following command:

Code Block
 module unload bowtie

You will notice when you type module list you have several different modules installed already. These come from both TACC defaults (TACC, linux, etc), and several that are used so commonly both in this class and by biologists that it becomes cumbersome to type "module load python3" all the time and therefore we just have them turned on by default by putting them in our profile to load on startup.  As you advance in your own data analysis you may start to find yourself constantly loading modules as well. When you become tiered of doing this (or see jobs fail to run because the modules that load on the compute nodes are based on your .bashrc file plus commands given to each node), you may want to add additional modules to your .bashrc file. This can be done using the "nano .bashrc" command from your home directory.

2. Downloading from the web directly to TACC

When files are hosted online as direct downloads, you can use the wget (Web get) command to skip your local computer and download the file directly to TACC. Typically this makes use of the "Copy Link Address" option when you right click on a link in a web browser that would otherwise start a download to your computer. 

Here we will download the installation file for miniconda (which we will use in the next section and throughout the course) using both scp and wget to compare and contrast their functionality. 

Using wget.

...

suddenly stop working (ie TACC changed the default version of a program you are using).


 Since the module load command doesn't give any output, it is often useful to check what modules you have installed with either of the following commands:

Code Block
module list
module list bowtie

The first example will list all currently installed modules while the second will only list modules containing bowtie in the name. If you see that you have installed the wrong version of something, a module is conflicting with another, or just don't feel like having it turned on anymore, use the following command:

Code Block
 module unload bowtie

You will notice when you type module list you have several different modules installed already. These come from both TACC defaults (TACC, linux, etc), and several that are used so commonly both in this class and by biologists that it becomes cumbersome to type "module load python3" all the time and therefore we just have them turned on by default by putting them in our profile to load on startup.  As you advance in your own data analysis you may start to find yourself constantly loading modules as well. When you become tiered of doing this (or see jobs fail to run because the modules that load on the compute nodes are based on your .bashrc file plus commands given to each node), you may want to add additional modules to your .bashrc file. This can be done using the "nano .bashrc" command from your home directory.

2. Downloading from the web directly to TACC

When files are hosted online as direct downloads, you can use the wget (Web get) command to skip your local computer and download the file directly to TACC. Typically this makes use of the "Copy Link Address" option when you right click on a link in a web browser that would otherwise start a download to your computer. 

Here we will download the installation file for miniconda (which we will use in the next section and throughout the course) using both scp and wget to compare and contrast their functionality. 

Using wget.

In a new browser or tab navigate to https://docs.conda.io/en/latest/miniconda.html and right click on the "Miniconda3 Linux 64-bit" in the linux installers section and choose copy link address.

Code Block
languagebash
titleUsing the mkdir command to create a folder named 'src' inside of your $WORK2 directory
collapsetrue
cd $WORK2
mkdir src
cd src
Code Block
languagebash
titleUse the wget command to download the linux installer directly to your current directory
collapsetrue
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh

You should see a download bar showing you the file has begun downloading, when complete the ls command will show you a new compressed file named 'Miniconda3-py39_4.9.2-Linux-x86_64.sh'

Using scp.

This is not necessary if you followed the wget commands above. Again In a new browser or tab you would navigate to https://docs.conda.io/en/latest/miniconda.html and  but instead of right click clicking on the "Miniconda3 Linux 64-bit" in the linux installers section and choose copy link address.

Code Block
languagebash
titleUsing the mkdir command to create a folder named 'src' inside of your $WORK2 directory
collapsetrue
cd $WORK2
mkdir src
cd src

...

languagebash
titleUse the wget command to download the linux installer directly to your current directory
collapsetrue

...

choosing copy link address you would simply left click and allow the file to download directly to your browser's Downloads folder. Using information from the SCP tutorial you would then transfer the local 'Miniconda3-py39_4.9.2-Linux-x86_64.sh

...

You should see a download bar showing you the file has begun downloading, when complete the ls command will show you a new compressed file named 'Miniconda3-py39_4.9.2-Linux-x86_64.sh'

Using scp.

This is not necessary if you followed the wget commands above. Again In a new browser or tab you would navigate to https://docs.conda.io/en/latest/miniconda.html but instead of right clicking on the "Miniconda3 Linux 64-bit" in the linux installers section and choosing copy link address you would simply left click and allow the file to download directly to your browser's Downloads folder. Using information from the SCP tutorial you would then transfer the local '' file to the stampede2 remote location '$WORK2/src'.

Given that the wget command doesn't involve having to use MFA, or the somewhat cumbersome use of 2 differnt windows, and is subject to many fewer typos, hopefully you see how wget is preferable provided left clicking on a link directly downloads a file.

Finishing conda installation, and 

Regardless of what method you chose to use, the following set of commands will work to install conda. For later reference, if you are planning to install miniconda on other systems or your local laptop, the 'regular installation' links on this link may be useful.


Code Block
languagebash
titleThe following command is then used to install miniconda
bash Miniconda3-py39_4.9.2-Linux-x86_64.sh

...

Given that the wget command doesn't involve having to use MFA, or the somewhat cumbersome use of 2 differnt windows, and is subject to many fewer typos, hopefully you see how wget is preferable provided left clicking on a link directly downloads a file.

Finishing conda installation, and 

...


logout
#log back in using the ssh command. 
conda config --set auto_activate_base false

Following the installation prompts you will need to:

  1. hit enter to page through the license agreement
  2. enter 'yes' to agree to said license agreement
  3. enter to confirm the default installation location
  4. enter 'yes'  to initialize Miniconda3 by running conda init?


bash Miniconda3-py39_4.9.2-Linux-x86_64.sh
Code Block
languagebash
titleThe following commands are then used to install miniconda, and only activate when you explicitly tell it to.
as per the directions at the end of the installation process, logout, log back in, and disable conda base environment being activated
logout
#log back in using the ssh command. 
conda config --set auto_activate_base false

...

The first time you logged back in, your promt prompt should have looked like this:

...

No Format
(base) tacc:~$


The second time you logged back in, your prompt should now look like thisgo back to looking like it did before you installed conda:

No Format
tacc:~$


If your prompt is different, please get the instructor's attention.

...

Code Block
languagebash
titleusing the conda create command, make a new environment named "gva2021GVA2021", and activate it
conda create --name GVA2021
# enter 'y' to proceed
conda activate GVA2021

This will once again change your prompt. This time the expected prompt is:


Again if you see something different, you need to get the instructors attention. For the rest of the course, it is assumed that your prompt will start with (GVA2021) if not, remember that you need to use the conda activate GVA2021 command to enter the environment.

3. Using miniconda on TACC

...

In previous years, the pip installation program was used to install a few programs. While those programs will be installed through conda this year, the link here is provided to give a detailed walk through of how to use pip on TACC resources. This is particularly helpful for making use of the '--user' flag during the installation process as you do not have the expected permissions to install things in the default directories.

...