Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Overview

The main point of using Lonestar is that it is a massive computer cluster. If we run a command when logged into lonestar, we are running it on one of the two low memory, low power  "head" or "login" nodes on TACC. WHen we do serious computations that are going to take more than a few minutes or use a lot of RAM, we need to submit them to one of the other 1,888 computer nodes and 22,656 cores on Lonestar.

In this section we are going to learn how to submit a job to the Lonestar cluster.

Diagram of how a job gets run on Lonestar

...

The launcher_creator.py script just helps you by creating jobs.sge easily - saves you some time editing a file (and potentially messing it up).

Launcher

The main point of using Lonestar is that it is a massive computer cluster. If we run a command when logged into lonestar, we are running it on one of the two low memory, low power  "head" or "login" nodes on TACC. WHen we do serious computations that are going to take more than a few minutes or use a lot of RAM, we need to submit them to one of the other 1,888 computer nodes and 22,656 cores on Lonestar.

In this section we are going to learn how to submit a job to the Lonestar cluster.In the examples we tend to say that a job can be "interactive" or should be "submitted to the TACC queue". The first means that you can type it and run it directly. It should be short enough that it does not tie up the TACC head node. The second means that you should go through the launcher submission process described here.

...

A launcher file tells Lonestar which executables to run with your desired options and for how long. It requests a certain amount of resources (cores and time) so that Lonestar's scheduling program figure out where to fit your job in.

First, let's make a very simple job to run. All we need to do is create a text file. Each line in this text file, which we will call simply commands, is a command exactly as you would type it into the terminal yourself to have it run.

...

Code Block
titleAdd this line to your "commands " file
nano commands
    date > date.out
    ls > ls.out

Optional exercise

  • The minimum number of processors that you can request on Lonestar is 12, so you might as well add up to 10 more lines to this file that are different shell commands that will give some sort of output. Each will be run on a different core in parallel.

Launcher script

Two ways to skin this cat

1.TACC has supplied a sample launcher script which we will can modify to queue and execute our job. First, typeHere's how:

Code Block
 module load launcher

Now let's copy the example launcher file.

Code Block
 cp $TACC_LAUNCHER_DIR/launcher.sge ./

There's a few things we should change inside of this file. Open the file using nano like so:

Code Block
 nano launcher.sge

First, Let's change the name of the job.Typically, we would change:

The -N line specifies the name of the job. Let's change it to (what). 

The -o line specifies the names of the output files that Lonestar makes. Let's change them We would change that to the name of this job.

The -l line specifies the length of time given to the job. The more time we give our job, the longer in the queue our job will wait to be run. When the time is up, Lonestar will terminate our job whether or not it's finished. So it's best to give our job slightly more time than it'll take.

We can also, optionally, add a few lines to have Lonestar send an email to your email address when the job starts and finishes.

Under To do that, under -V, we would add 2 new lines like so:

Code Block
 #$ -M my_email@something.com
 #$ -m be

...

Also, if we are part of multiple allocations, we'll need to specify the job file.

Change the line that says "setenv CONTROL_FILE" to say:

Code Block
 setenv CONTROL_FILE job.csh

Now let's save our changes and quit.

The Launcher Queue

Now that we have our job file and our launcher, we need to queue the launcher. Type:

Code Block
 qsub launcher.sge

Lonestar will make sure that everything you've specified is correct and if it is, your job will be queued.

You can check the status of your job like so:

Code Block
 qstat

This will tell you its job priority and what state it is in.

A state of "qw"  means "queued."

A state of "r" means "running."

If you happen to notice that your job will run incorrectly, you can delete your job like so:

Code Block
 qdel job-ID

You can obtain the job-ID by typing "qstat."

If you are nosy and want to see all of the jobs queued and running on Lonestar, then use this command:

Code Block
 showq

You can also see just your jobs in this format:

Code Block
 showq -u

You can create a job that is dependent on another job finishing only start after the first job has completed using this command:

Code Block
 qsub -hold_jid job-ID launcher.sge

Further reading

TACC Output Files

While your job is running, TACC creates 3 different files with names based on the -o field in the launcher. These files are named like so:

Code Block
 (job_name).e(job-ID)
 (job_name).pe(job-ID)
 (job_name).o(job-ID)

These files have the output of your job that would have been sent to standard output or standard error and messages from TACC about your job. These files will be useful if your job fails.

...

which allocation to use (Case sensitive).

Code Block
 #$ -A DNAdenovo  #(or CCBB)

Lastly, we need to specify the job file.

Code Block
 setenv CONTROL_FILE commands

 

2. We can use launcher_creator.py to edit the launcher file without even opening it.

We have created a Python script called launcher_creator.py that makes creating a launcher.sge script a breeze. You will probably want to use this for the rest of the course.

Now run the script with the -h option to show the help message:

Code Block
 module load python
 launcher_creator.py -h

...

We should mention that launcher_creator.py does some under-the-hood magic for you and automatically calculates how many cores to request on lonestar, assuming you want one core per process. You don't know it, but you should be grateful that this saves you from ever having to think about a confusing calculation.

Lonestar Queue

Next step would be to submit the job to the queue by using the launcher file.

Code Block
 qsub launcher.sge

Lonestar will make sure that everything specified in the launcher file  is correct and if it is, the job will be queued.

To check the status of the job, the command is:

Code Block
 qstat

This will tell you its job priority and what state it is in.

A state of "qw"  means "queued.

Exercise

  • Take it for a test drive: use launcher_creator.py to create a launcher.sge script for your previous commands file and run it again.

Now let's go back to the course outline"

A state of "r" means "running."

In case we notice something wrong with the job, we can delete it like so:

Code Block
 qdel job-ID

To obtain the job-ID, look at the  "qstat" output.

For those who are nosy and want to see all of the jobs queued and running on Lonestar, this commands is handy:

Code Block
 showq

To see just your jobs in this format:

Code Block
 showq -u

You can create a job that is dependent on another job finishing only start after the first job has completed using this command:

Code Block
 qsub -hold_jid job-ID launcher.sge

Further reading

TACC Output Files

While your job is running, TACC creates 3 different files with names based on the -o field in the launcher. These files are named like so:

Code Block
 (job_name).e(job-ID)
 (job_name).pe(job-ID)
 (job_name).o(job-ID)

These files have the output of your job that would have been sent to standard output or standard error and messages from TACC about your job. These files will be useful if your job fails.

Now let's go try all new skills out with a simple exercise...