...
If you do try to run a long job in interactive mode. It will be killed after 10-15 minutes and you may see a message like this:
Code Block |
---|
Message from root@login1.ls4.tacc.utexas.edu on pts/127 at 09:16 ...
Please do not run scripts or programs that require more than a few minutes of
CPU time on the login nodes. Your current running process below has been
killed and must be submitted to the queues, for usage policy see
http://www.tacc.utexas.edu/user-services/usage-policies/
If you have any questions regarding this, please submit a consulting ticket.
|
...
First, let's make a very simple job to run. All we need to do is create a text file. Each line in this text file, which we will call simply commands
, is a command exactly as you would type it into the terminal yourself to have it run.
Code Block |
---|
nano commands
|
Code Block | ||
---|---|---|
| ||
date > date.out
ls > ls.out
|
...
TACC has supplied a sample launcher script which we will modify to queue and execute our job. First, type
Code Block |
---|
module load launcher
|
Now let's copy the example launcher file.
Code Block |
---|
cp $TACC_LAUNCHER_DIR/launcher.sge ./
|
There's a few things we should change inside of this file. Open the file using nano like so:
Code Block |
---|
nano launcher.sge
|
First, Let's change the name of the job.
...
Under -V, add 2 new lines like so:
Code Block |
---|
#$ -M my_email@something.com
#$ -m be
|
...
Change the line that says "setenv CONTROL_FILE" to say:
Code Block |
---|
setenv CONTROL_FILE job.csh
|
...
Now that we have our job file and our launcher, we need to queue the launcher. Type:
Code Block |
---|
qsub launcher.sge
|
Lonestar will make sure that everything you've specified is correct and if it is, your job will be queued.
You can check the status of your job like so:
Code Block |
---|
qstat
|
This will tell you its job priority and what state it is in.
...
If you happen to notice that your job will run incorrectly, you can delete your job like so:
Code Block |
---|
qdel job-ID
|
You can obtain the job-ID by typing "qstat."
If you are nosy and want to see all of the jobs queued and running on Lonestar, then use this command:
Code Block |
---|
showq
|
You can also see just your jobs in this format:
Code Block |
---|
showq -u |
You can create a job that is dependent on another job finishing only start after the first job has completed using this command:
Code Block |
---|
qsub -hold_jid job-ID launcher.sge |
Further reading
...
While your job is running, TACC creates 3 different files with names based on the -o field in the launcher. These files are named like so:
Code Block |
---|
(job_name).e(job-ID)
(job_name).pe(job-ID)
(job_name).o(job-ID)
|
...
We have created a Python script called launcher_creator.py
that makes creating a launcher.sge
script a breeze. You will probably want to use this for the rest of the course.
First, let's copy the script to the directory we've been doing everything in:
...
.
...
Now run the script with the -h
option to show the help message:
Code Block |
---|
module load python ./launcher_creator.py -h |
-n | name | The name of the job. |
-a | allocation | The allocation you want to charge the run to. |
-q | queue | The queue to submit to, like 'normal' or 'largemem', etc. |
-w | wayness | Optional The number of jobs in a job list you want to give to each node. (Default is 12 for Lonestar, 16 for Stampede.) |
-N | number of nodes | Optional Specifies a certain number of nodes to use. You probably don't need this option, as the launcher calculates how many nodes you need based on the job list (or Bash command string) you submit. It sometimes comes in handy when writing pipelines. |
-t | time | Time allotment for job, format must be hh:mm:ss. |
-e | Optional Your email address if you want to receive an email from Lonestar when your job starts and ends. | |
-l | launcher | Optional Filename of the launcher. (Default is |
-m | modules | Optional String of module management commands. |
-b | Bash commands | Optional String of Bash commands to execute. |
-j | Command list | Optional Filename of list of commands to be distributed to nodes. |
-s | stdout | Optional Setting this flag outputs the name of the launcher to stdout. |
We should mention that launcher_creator.py
does some under-the-hood magic for you and automatically calculates how many cores to request on lonestar, assuming you want one core per process. You don't know it, but you should be grateful that this saves you from ever having to think about a confusing calculation.
...
Now let's go back to the course outline