Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

What are the tasks we want to do? Each task corresponds to one line in the simple.cmds commands file, so let's take a look at it using the cat (concatenate) command that simply reads a file and writes each line of content to standard output (here, your Terminal):

Code Block
languagebash
titleView simple commands
cat simple.cmds

The tasks we want to perform look like this:

Code Block
languagebash
titlesimple.cmds commands file
sleep 5; echo "Command 1 on `hostname` - `date`" > cmd1.log 2>&1
sleep 5; echo "Command 2 on `hostname` - `date`" > cmd2.log 2>&1
sleep 5; echo "Command 3 on `hostname` - `date`" > cmd3.log 2>&1
sleep 5; echo "Command 4 on `hostname` - `date`" > cmd4.log 2>&1
sleep 5; echo "Command 5 on `hostname` - `date`" > cmd5.log 2>&1
sleep 5; echo "Command 6 on `hostname` - `date`" > cmd6.log 2>&1
sleep 5; echo "Command 7 on `hostname` - `date`" > cmd7.log 2>&1
sleep 5; echo "Command 8 on `hostname` - `date`" > cmd8.log 2>&1

There are 8 tasks. Each is a simple echo command that just outputs task sleeps for 5 seconds, then uses the echo command to output a string containing the task number and date to a different log file after sleeping for 5 secondsnamed for the task number. Notice that we can put two commands on one line if they are separated by a semicolon ( ; ).

Use the handy launcher_creator.py program to create the job submission scriptcontrol file.

Code Block
languagebash
titleCreate batch submission script for simple commands
launcher_creator.py -j simple.cmds -n simple -t 00:01:00 -a OTH21164 -q normal

...

Code Block
Project simple.
Using job file simple.cmds.
Using normal queue.
For 00:01:00 time.
Using OTH21164 allocation.
Not sending start/stop email.
Launcher successfully created. Type "sbatch simple.slurm" to queue your job.

Submit your batch job like this(with or without the --reservation), then check the batch queue to see the job's status.

Code Block
languagebash
titleSubmit simple job to batch queue
sbatch --reservation CoreNGSday2CoreNGS-Tue simple.slurm  # or: sbatch simple.slurm
showq -u

# Output looks something like this:
-------------------------------------------------------------
          Welcome to the Lonestar6 Supercomputer
-------------------------------------------------------------
--> Verifying valid submit host (login1)...OK
--> Verifying valid jobname...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (normal)...OK
--> Checking available allocation (OTH21164)...OK
Submitted batch job 232542

The queue status will show your job as ACTIVE while its running, or WAITING if not.

Code Block
SUMMARY OF JOBS FOR USER: <abattenh>

ACTIVE JOBS--------------------
JOBID     JOBNAME    USERNAME      STATE   NODES REMAINING STARTTIME
================================================================================
232542924965    simple     abattenh      Running 1      0:00:5442  ThuSat Jun  93 1121:3033:1831

WAITING JOBS------------------------
JOBID     JOBNAME    USERNAME      STATE   NODES WCLIMIT   QUEUETIME
================================================================================

Total Jobs: 1     Active Jobs: 1     Idle Jobs: 0     Blocked Jobs: 0

If you don't see your simple job in either the ACTIVE or WAITING sections of your queue, it probably already finished – it should only run for a few seconds!

...

Every job, no matter how few tasks requested, will be assigned at least one node. Each tlonestar6 lonestar6 node has 128 physical cores, so each of the 8 tasks can be assigned to a different core.

Exercise: What files were created by your job?

Expand
titleAnswer

ls should show you something like this:

Code Block
cmd1.log  cmd3.log  cmd5.log  cmd7.log  simple.cmds      simple.o2916562o924965 
cmd2.log  cmd4.log  cmd6.log  cmd8.log  simple.e2916562e924965  simple simple.slurm

The newly created files are the .log files, as well as error and output logs simple.e2916562e924965 and simple.o2916562o924965.

filename wildcarding

You can look at one of the output log files like this:

Code Block
title
languagebashMulti-character filename wildcarding
cat cmd1.log

But here's a cute trick for viewing the contents all your output files at once, using the cat command and filename wildcarding.

...

The cat command actually takes a list of one or more files (if you're giving it files rather than standard input – more on this shortly) and outputs the concatenation of them to standard output. The asterisk ( * ) in cmd*.log is a multi-character wildcard that matches any filename starting with cmd then ending with .log. So it would match cmd_hello_world.log.

You can also specify single-character matches inside brackets ( [ ] ) in either of the ways below, this time using the ls command so you can better see what is matching:

...

This technique is sometimes called filename globbing, and the pattern a glob. Don't ask me why – it's a Unix thing. Globbing – translating a glob pattern into a list of files – is one of the handy thing the bash shell does for you. (Read more about Wildcards and special filenames. Pathname wildcards)

Exercise: How would you list all files starting with simple?

...

Here's what my cat output looks like. Notice the times are all nearly the same because all the tasks ran in parallel. That's the power of cluster computing!

Code Block
Command 1 on c305c304-005.ls6.tacc.utexas.edu - ThuSat Jun  93 1121:3033:3950 CDT 20222023
Command 2 on c305c304-005.ls6.tacc.utexas.edu - ThuSat Jun  93 1121:3033:3344 CDT 20222023
Command 3 on c305c304-005.ls6.tacc.utexas.edu - ThuSat Jun  93 1121:3033:3346 CDT 20222023
Command 4 on c305c304-005.ls6.tacc.utexas.edu - ThuSat Jun  93 1121:3033:3847 CDT 20222023
Command 5 on c305c304-005.ls6.tacc.utexas.edu - ThuSat Jun  93 1121:3033:4051 CDT 20222023
Command 6 on c305c304-005.ls6.tacc.utexas.edu - ThuSat Jun  93 1121:3033:3847 CDT 20222023
Command 7 on c305c304-005.ls6.tacc.utexas.edu - ThuSat Jun  93 1121:3033:3351 CDT 20222023
Command 8 on c305c304-005.ls6.tacc.utexas.edu - ThuSat Jun  93 1121:3033:3949 CDT 20222023

echo

Lets take a closer look at a typical task in the simple.cmds file.

...

The echo command is like a print statement in the bash shell. Echo   echo takes its arguments and writes them to one line of standard output. While not always required, it is a good idea to put echo's output string in double quotes.

...

So what is this funny looking `date` bit doing? Well, date is just another Linux command (try just typing it in) that just displays the current date and time. Here we don't want the shell to put the string "date" in the output, we want it to execute the date command and put the result text into the output. The backquotes ( ` ` also called backticks) around the date command tell the shell we want that command executed and its standard output substituted into the string. (Read more about Quoting in the shell.)

Code Block
languagebash
titleBacktick evaluation
# These are equivalent:
date
echo `date`

# But different from this:
echo date

...

Code Block
languagebash
sleep 5; echo "Command 3 `date`" > cmd3.log 2>&1

Every command and Unix program has three "built-in" streams: standard input, standard output and standard error

Image Added


Normally echo writes its string to standard output. If you invoke echo in an interactive shell like Terminal, standard output is displayed to the Terminal window.

Image Removed

All outputs generated by tasks in your batch job are directed to one output and error file per job. Here they have names like simple.e2916562e924965 and simple.o2916562o924965; simple.o2916562o924965 contains all all standard output and simple.o2916562o924965 contains all standard error generated by your tasks that was not redirected elsewhere, as well as information relating to running your job and its tasks. For large jobs with complex tasks, it is not easy to troubleshoot execution problems using these files.

So we usually we want a best practice is to separate the outputs of all our tasks into individual log files, one per task. Why is this important? Suppose we run a job with 100 commands, each one a whole pipeline (alignment, for example). 88 finish fine but 12 do not. Just try figuring out which ones had the errors, and where the errors occurred, if all the standard output is in one intermingled file and all standard error error in the other intermingled file!

...