SGE Multiprocessing

Programs can use multiprocessing to speed up the processing of data sets. This is done by splitting the work up among the various processors the program is capable of using. There are two main ways to do this. First, parallel jobs can run on one computer which has multiple processors that share memory. This is called Symmetric Multiprocessing or SMP. This is provided via threading libraries that allow your program to be split up into tasks which then run in parallel on the processors of the system. Many programming and scripting languages already support this. You may also see threads referred to as light weight processes. For C/C++, and Fortran programmers the current suggested way to use threads is via OpenMP which handles much of the overhead of threads for you. The other paradigm for parallel processing is to split the job up on multiple identical computers. This method of programming is called distributed computing, and the current way to do this is via the Message Passing Interface. Specifically, the clusters at CCBB support the use of OpenMPI.

Note that fully using either MPI, or SMP from scratch requires learning new programming techniques, and tools. The bulk of this document is geared towards using it. If you are interested in learning to program with either, there are many books which have been written on MPI, and OpenMP. There are also periodic short courses held by TACC, and credit courses held in the SSC department.

Using Multiprocessing Capable Programs

This section of the tutorial will assume that you have a MP capable program already, and that you wish to run it on the cluster. Note that in the case that you have multiple queues that you can use, you should be sure to use -q together with a queue name. Otherwise, you may end up submitting jobs into different queues, and in some cases those queues may point to the same host. If this happens, you'll just overload the node, and some of your processes won't run.

Using OpenMPI

For this section we'll concentrate on MrBayes which in version 3.1.2 added MPI. Specifically, on the clusters it is compiled to use the OpenMPI version of MPI. This requires that you first load the openmpi module with the command

module load openmpi

After this you can load the mrbayes module. Then, you start processing with mpirun the OpenMPI command which starts up the various nodes, and then starts them processing data. A sample qsub script to do this is

#!/bin/bash

#$ -S /bin/bash

# make the module command available
. /etc/profile

# load our modules we need
module load openmpi
module load mrbayes

mpirun mb < mb-command-file

This script can then be submitted using

qsub -pe orte N mrbayes.qsub

where N is the number of processors you want, and mrbayes.qsub is the name of the batch file that is given above. This prepares the OpenMPI environment for your job, and launches the N jobs as many nodes as are needed in order to fulfill the request. For example, using N=1 (ie, the normal single processor mode of running for a particular job) took 1 hour, 37 minutes, and 42 seconds, while with N=16 the job took 11 minutes and 53 seconds. Note that if you read a book on MPI, they will say that you need to provide a file of hostnames that should be used by the job, and also the will say that you need to use "-np N" with the mpirun command. Neither is needed on the cluster because the SGE system will set those up for you.

Using SMP Programs

If you are using a program that uses some sort of threaded model, then you should submit your job script using the serial queue like so


qusb -pe serial N my_job_script.qsub

Here N is the number of processors that you wish to use. Unlike the OpenMPI case, there is no standard way for the program to know how many processors have been made available. SGE does set a variable called NSLOTS equal to the number of requested processors. You can then use this in your script file. For example, with an OpenMP program you must set an environmental variable called OMP_NUM_THREADS by adding this to your job script:


export OMP_NUM_THREADS=$NSLOTS

For programs that have been compiled in other ways, the program must provide some way for you the user to specify this information, and you will need to read its documentation to see how. Also, notice that that N should always be less than the maximum number of processors in the nodes that you have access to. Otherwise, your job will stay in the queue forever.

Compiling Multiprocessing Programs

If you wish to compile your own OpenMP, or OpenMPI program, then a good first step is read the docs of the package you are trying to build and follow any instructions. If you are just working on your own, then to compile an OpenMPI program you should first load the openmpi module. This will provide you with the mpicc (C), mpiCC (C++), mpif77 (Fortran 77), or mpif90 (Fortran 90) commands that you can use to compile your program. If you are using OpenMP instead load the gcc module. This provides access to gcc, and gfortran. Both allow you to specify the option -fopenmp to enable OpenMP compilation.

Writing Multiprocessing Programs

To be added.