Introduction:

Throughout the course we you have had you been running anything of substance (ie programs and scripts) on iDev nodes. This was done in large part thanks to the availability of the reservation system which allowed you to access an iDev node without having to wait. In previous years tutorials were planned around a:

"hurry up and get the job started its going to sit for some amount of time in the que"
"what can we tell them while they wait for their job ok let me tell you about those commands that are sitting around waiting to run"
"DRAT! there is a typo in their command's file they have to edit your commands file edit that command and go back to the end of the que ".

...

while we talk about the results you can't actually see"

I hope you can see that using idev nodes has enabled each of you to accomplish more tutorials than previous years while hopefully learning more.

Objectives:

This tutorial aims to:

Familiarize yourself with TACC's job submission system.
Tidy up some other loose ends from the course.

Running jobs on TACC

Understanding "jobs" and compute nodes.

When you log into lonestar using ssh you are connected to what is known as the login node or "the head node". There are several different head nodes, but they are shared by everyone that is logged into lonestar (not just in this class, or from campus, or even from texas, but everywhere in the world). Anything you type onto the command line has to be executed by the head node. The longer something takes to complete, or the more it will slow down you and everybody else. Get enough people running large jobs on the head node all at once (say a classroom full of Big Data in Biology summer school students) and lonestar can actually crash leaving nobody able to execute commands or even log in for minutes -> hours -> even days if something goes really wrong. To try to avoid crashes, TACC tries to monitor things and proactively stop things before they get too out of hand. If you guess wrong on if something should be run on the head node, you may eventually see a message like the one pasted below. If you do, its not the end of the world, but repeated messages will become revoked TACC access and emails where you have to explain what you are doing to TACC and your PI and how you are going to fix it and avoid it in the future.

...

So you may be asking yourself what the point of using lonestar is at all if it is wrought with so many issues. The answer comes in the form of compute nodes. There are 1,252 compute nodes that can only be accessed by a single person for a specified amount of time. These compute nodes are divided into different queues called: normal, development, largemem, etc. Access to nodes (regardless of what queue they are in) is controlled by a "Queue Manager" program. You can personify the Queue Manager program as: Heimdall in Thor, a more polite version of Gandalf in lord of the rings when dealing with with the balrog, the troll from the billy goats gruff tail, or any other "gatekeeper" type. Regardless of how nerdy your personification choice is, the Queue Manager has an interesting caveat: you can only interact with it using the sbatch command. "sbatch <filename.slurm>" tells the que manager to run a set job based on information in filename.slurm (i.e. how many nodes you need, how long you need them for, how to charge your allocation, etc). The Queue manager doesn't care WHAT you are running, only HOW to find what you are running (which is specified by a setenv CONTROL_FILE commands line in your filename.slurm file). The WHAT is then handled by the file "commands" which contains what you would normally type into the command line to make things happen.

Further sbatch reading

The following are the options available for the sbatch command file (note it may be helpful to close the table of contents on the left side of the window to better see the table)

Iframe

scrolling	no
src	https://portal.tacc.utexas.edu/user-guides/lonestar5#table4
width	1200
frameborder	yes
id	sbatch options
align	left
title	sbatch options
height	500

Using `launcher_creator.py`

To make things easier on all of us, there is a script called launcher_creator.py that you can use to automatically generate a .slurm file. This can all be summarized in the following figure:

...

Code Block

language	bash
title	How to display all available options of the launcher_createor.py script
collapse	true

launcher_creator.py -h

Short option	Long option	Required	Description
-n	name	Yes	The name of the job.
-t	time	Yes	Time allotment for job, format must be hh:mm:ss.
-b	Bash commands	-b OR -j must be used	Optional String of Bash commands to execute.
-j	Command list	-b OR -j must be used	Optional Filename of list of commands to be distributed to nodes.
-a	allocation

		The allocation you want to charge the run to. If you only have one allocation you don't need this option
-m	modules

		Optional String of module management commands. `module load launcher` is always in the launcher, so there's no need to include that. Think to all the times in the class that you had to type 'module load xxxxx' while on the idev node. The same will be true for the launcher script. As you are more familiar with what types of analysis you will be doing, you will likely change your .bashrc file to limit the things you have to specify here.
-q	queue	Default: Development	The queue to submit to, like 'normal' or 'largemem', etc. You will usually want to change this to 'normal'
-w	wayness

		Optional The number of jobs in a job list you want to give to each node. (Default is 12 for Lonestar, 16 for Stampede.)
-N	number of nodes

		Optional Specifies a certain number of nodes to use. You probably don't need this option, as the launcher calculates how many nodes you need based on the job list (or Bash command string) you submit. It sometimes comes in handy when writing pipelines.
-e	email

		Optional Your email address if you want to receive an email from Lonestar when your job starts and ends. If you set an environmental variable EMAIL_ADDRESS it will use that variable if you don't put anything after the -e
-l	launcher

		Optional Filename of the launcher. (Default is `<name>.sge`)
-s	stdout

Optional Setting this flag outputs the name of the launcher to stdout.

We should mention that launcher_creator.py does some under-the-hood magic for you and automatically calculates how many cores to request on lonestar, assuming you want one core per process. You don't know it, but you should be grateful that this saves you from ever having to think about a confusing calculation that even the most seasoned computational biologists rutinely routinely got wrong (and hence made a script to avoid having to do it anymore).

Running a job

Now that we have an understanding of what the different parts of running a job is, let's actually run a job. Our goal of this sample job will be to provide you with something to look back on and remember what you did whlie while you were here. As a saftey safety measure, you can not submit jobs from inside an idev node (similarly you can not run a commands file that submits new jobs on the compute nodes). So check if you are on an idev node (showq -u), and if so, logout before continuing. Navigate to the $SCRATCH directory before doing the following.

Code Block

language	bash
title	how to make a sample commands file
collapse	true

# remember that things after the # sign are ignored by bash and most all programing languages
cds  # move to your scratch directory
nano commands
 
# the following lines should be entered into nano
echo "My name is _____ and todays date is:" > BDIBGVA2019.output.txt
date >> BDIBGVA2019.output.txt
echo "I have just demonstrated that I know how to redirect output to a new file, and to append things to an already created file. Or at least thats what I think I did" >> BDIBGVA2019.output.txt
 
echo "i'm going to test this by counting the number of lines in the file that I am writing to. So if the next line reads 4 I remember I'm on the right track" >> BDIBGVA2019.output.txt
wc -l BDIBGVA2019.output.txt >> BDIBGVA2019.output.txt
 
echo "I know that normally i would be typing commands on each line of this file, that would be executed on a compute node instead of the head node so that my programs run faster, in parallel, and do not slow down others or risk my tacc account being locked out" >> BDIBGVA2019.output.txt
 
echo "i'm currently in my scratch directory on lonestar. there are 2 main ways of getting here: cds and cd $SCRATCH:" >>BDIB>>GVA2019.output.txt
pwd >> BDIBGVA2019.output.txt
 
echo "over the last week I've conducted multiple different types of analysis on a variety of sample types and under different conditions. Each of the exercises was taken from the website https://wikis.utexas.edu/display/bioiteam/Genome+Variant+Analysis+Course+20172019" >> BDIBGVA2019.output.txt
 
echo "using the ls command i'm now going to try to remind you (my future self) of what tutorials I did" >> BDIBGVA2019.output.txt
 
ls -1 >> BDIBGVA2019.output.txt
 
echo "the contents of those directories (representing the data i downloaded and the work i did) are as follows: ">> BDIBGVA2019.output.txt
lsfind */*. >> BDIBGVA2019.output.txt
 
echo "the commands that i have run on the headnode are: " >> BDIBGVA2019.output.txt
history >> BDIBGVA2019.output.txt
 
echo "the contents of this, my commands file, which i will use in the launcher_creator.py script are: ">>BDIB>>GVA2019.output.txt
cat commands >> BDIBGVA2019.output.txt
 
echo "finally, I will be generating a job.slurm file using the launcher_creator.py script using the following command:" >> BDIBGVA2019.output.txt
echo 'launcher_creator.py -w 1 -N 1 -n "what_i_did_at_BDIB_2017GVA2019" -t 00:0215:00 -a "UT-2015-05-18"' >> BDIBGVA2019.output.txt # this will create a my_first_job.slurm file that will run for 215 minutes
echo "and i will send this job to the que using the the command: sbatch what_i_did_at_BDIB_2017GVA2019.slurm" >> BDIBGVA2019.output.txt  # this will actually submit the job to the Queue Manager and if everything has gone right, it will be added to the development queue.

 
ctrl ctrloo # keyboard command to write your nano output
crtl crtlxx # keyboard command to close the nano interface
wc -l commands  # use this command to verify the number of lines in your commands file.
# expected output:
3130 commands

# if you get a much larger number than 3130 edit your commands file with nano so each command is a single line as they appear above. 
launcher_creator.py -w 1 -N 1 -n "what_i_did_at_BDIB_2017GVA2019" -t 00:0215:00 -a "UT-2015-05-18"
sbatch what_i_did_at_BDIB_2017GVA2019.slurm

Interrogating the launcher queue

Here are some of the common commands that you can run and what they will do or tell you:

Command

Purpose

Output(s)

showq -u

Shows only your jobs

Shows all of your currently submitted jobs, a state of:

"qw" means it is still queued and has not run yet

"r" means it is currently running

scancel <job-ID>

Delete a submitted job before it is finished running

note: you can only get the job-ID by using showq -u

There is no confirmation here, so be sure you are deleting the correct job.

There is nothing worse than deleting a job that has sat a long time by accident because you forgot something on a job you just submitted.

showq

You are a nosy person and want to see everyone that has submitted a job

Typically a huge list of jobs, and not actually informative

If the queue is moving very quickly you may not see much output, but don't worry, there will be plenty of opportunity once you are working on your own data.

Evaluating your first job submission

Based on our example you may have expected 1 new file to have been created during the job submission (BDIBGVA2019.output.txt), but instead you will find 2 extra files as follows: what_i_did.e(job-ID), and what_i_did.o(job-ID). When things have worked well, these files are typically ignored. When your job fails, these files offer insight into the why so you can fix things and resubmit.

...

Code Block

language	bash
title	make a single final file using the cat command and copy to a useful work directory
collapse	true

# remember that things after the # sign are ignored by bash 
cat BDIBGVA2019.output.txt > end_of_class_job_submission.final.output 
mkdir $WORK/BDIB_GVA_2017GVA2019
mkdir $WORK/BDIB_GVA_2017GVA2019/end_of_course_summary/  # each directory must be made in order to avoid getting a no such file or directory error
cp end_of_class_job_submission.final.output $WORK/BDIB_GVA_2017GVA2019/end_of_course_summary/
cp what_i_did* $WORK/BDIB_GVA_2017GVA2019/end_of_course_summary/  # note this grabs the 2 output files generated by tacc about your job run as well as the .slurm file you created to tell it how to run your commands file

cp commands $WORK/BDIB_GVA_2017GVA2019/end_of_course_summary/

Return to GVA2017 GVA2019 to work on any additional tutorials you are interested in.

Versions Compared

Old Version 1

New Version Current

Key

Table of Contents

Introduction:

Objectives:

Running jobs on TACC

Understanding "jobs" and compute nodes.

Further sbatch reading

Using `launcher_creator.py`

Running a job

Interrogating the launcher queue

Evaluating your first job submission

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Table of Contents

Introduction:

Objectives:

Running jobs on TACC

Understanding "jobs" and compute nodes.

Further sbatch reading

Using launcher_creator.py

Running a job

Interrogating the launcher queue

Evaluating your first job submission

Using `launcher_creator.py`