Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Available PODs

The table below describes

Table of Contents

Available PODs

The table below describes the available BRCF PODs, servers and currently available groups. Unless otherwise noted, PODs authenticate using BRCF account credentials initialized by the user in the BRCF account management application (https://rctf-account-request.icmb.utexas.edu).

...

POD nameDescriptionBRCF delegatesCompute serversStorage serverUnix Groups
AMD GPU POD

PUD with GPU resources available for instructional and research use.

Note: This POD uses UT EID authentication


Anna Battenhouse
  • amdgcomp01.ccbb.utexas.edu, amdgcomp02.ccbb.utexas.edu, amdgcomp03.ccbb.utexas.edu
    • Dual 64-core EPYC 7V13 CPUs
    • 512 GB RAM
    • 8 AMD Radeon Instinct MI-100 GPUs w/32GB onboard RAM each

amdbstor01.ccbb.utexas.edu

  • 12 6-TB disks
  • 72 TB raw, 42 TB usable

Per course and research project. See

CBRS PODShared POD for CBRS core facilitiesAnna Battenhouse
  • cbrscomp01.ccbb.utexas.edu,
    cbrscomp02.ccbb.utexas.edu
    • Dell PowerEdge R640
    • dual 26-core/52-thread CPUs
    • 768 GB RAM
    • 960 GB SATA SSD for ultra-high-speed local I/O, mounted as /ssd1 (not backed up)

cbrsstor01.ccbb.utexas.edu

  • 24 16-TB disks
  • 384 TB raw, 220 TB usable
BCG, CBRS_BIC, CBRS_CryoEM, CBRS_microscopy, CBRS_org, CBRS_proteomics
Chen/Wallingford/Raccah PODShared POD for members of the Jeffrey Chen and , John Wallingford labs
  • Qingxin Song (Chen lab)
  • Jaime Hibbard (Wallingford lab)
chencomp01.ccbb.utexas.edu (a.k.a. chencomp02and Doran Raccah labs


  • chencomp01.ccbb.utexas.edu)
    • Dell PowerEdge R410
    • dual 4-core/8-thread CPUs
    • 64 GB RAM
chenstor01
  • chencomp02.ccbb.utexas.edu
    24 8chencomp03.ccbb.utexas.edu
    • Dell AMD node
    • dual 64-core/128-thread AMD EPYC CPUs
    • 768 GB RAM
    • 1.9 GB NVMe for ultra-high-speed local I/O, mounted as /ssd1 (not backed up)

chenstor01.ccbb.utexas.edu

  • 24 8-TB disks
  • 192 TB raw, 106 TB usable


Chen, Raccah, Wallingford
Dickinson/Cambronne PODShared POD for members of the Dan Dickinson and Lulu Cambronne labs
  • Dan Dickinson
  • Lulu Cambronne
  • djdicomp01.ccbb.utexas.edu
    • Dell PowerEdge R410
    • dual 4-core/8-thread CPUs
    • 64 GB RAM

djdistor01.ccbb.utexas.edu

  • 24 8-TB disks
  • 192 TB raw, 106 TB usable


Dickinson, Cambronne
Educational (EDU) POD

Dedicated instructional POD

Note: This POD uses UT EID authentication

Course instructors.

See The Educational PODPODs

  • edupod.cns.utexas.edu
    • virtual host for pool of 3 physical servers listed below
  • educcomp01.ccbb.utexas.edu
  • educcomp02.ccbb.utexas.edu
  • educcomp04.ccbb.utexas.edu
    • Dell PowerEdge R640
    • dual 28-core/52-thread CPUs
    • 1 TB RAM

educstor01.ccbb.utexas.edu

  • 24 4-TB disks
  • 96 TB raw, 53 TB usable


Per course. See The Educational PODPODs
Georgiou/WCAAR PODShared POD for members of the Georgiou lab and the Waggoner Center for Alcoholism & Addiction Research (WCAAR)
  • Russ Durrett (Georgiou lab)
  • Dayne Mayfield (WCAAR)
  • wcarcomp01.ccbb.utexas.edu
    • Dell PowerEdge R430
    • dual 16-core/32-thread CPUs
    • 256 GB RAM
  • wcarcomp02.ccbb.utexas.edu
    • Dell PowerEdge R430
    • dual 18-core/36-thread CPUs
    • 384 GB RAM
  • wcarcomp03.ccbb.utexas.edu
    • Dell PowerEdge R640
    • dual 26-core/52-thread CPUs
    • 1 TB RAM
    • 1.8 TB SATA SSD for ultra-high-speed local I/O, mounted as /ssd1 (not backed up)

georstor01.ccbb.utexas.edu

  • 12 8-TB disks + 12 14-TB disks
  • 264 TB raw, 158 TB usable


Georgiou, WCAAR

GSAF POD

Anchor
GSAF_POD
GSAF_POD

Shared POD for use by GSAF customers. 2TB Work area allocation available for participating groups.

Contact Anna Battenhouse, for more information.

  • Anna Battenhouse
  • Dhivya Arasappan
  • gsafcomp01.ccbb.utexas.edu
  • gsafcomp02.ccbb.utexas.edu
    • Dell PowerEdge R410
    • dual 4-core/8-thread CPUs
    • 64 GB RAM
  • gsafcbig01.ccbb.utexas.edu
    • Dell PowerEdge R720
    • dual  6-core/12-thread CPUs
    • 192 GB RAM

gsafstor01.ccbb.utexas.edu

  • 24 6-TB disks
  • 144 TB raw, 90 TB usable

GSAF customer groups:
Alper, Atkinson, Baker, Barrick, Bolnick, Bray,  Browning, Cannatella, Contrearas, Crews, Drew, Dudley, Eberhart, Ellington, GSAFGuest, Hawkes, HoWinson, HyunJunKim, Kirisits, Leahy, Leibold, LiuHw, Lloyd, Manning, Matz, Mueller, Paull, Press, SSung, ZhangYJ

GSAF internal & instructional groups:
GSAF, 
BioComputing2017, CCBB_Workshops_1,   FRI-BigDataBio

Hopefog (Ellington) PODShared POD for Ellington & Marcotte lab special projects
  • Anna Battenhouse
  • hfogcomp01.ccbb.utexas.edu
    • Dell PowerEdge R730xd
    • dual 10-core/20-thread CPUs
    • 250 GB RAM
    • 37 TB local RAID storage,  mounted as /raid (not backed up)
  • hfogcomp02.ccbb.utexas.edu,
    hfogcomp03.ccbb.utexas.edu
    • AMD GPU servers
    • 48-core/96-hyperthread EPYC CPU
    • 512 GB RAM
    • 8 AMD Radeon Instinct MI-50 GPUs w/32GB onboard RAM each
  • hfogcomp04.ccbb.utexas.edu
    • Dell PowerEdge R750XA
    • dual 24-core/48-thread CPUs
    • 512 GB RAM
    • 2 NVIDIA Ampere A100 GPUs w/80GB onboard RAM each
  • hfogcomp05.ccbb.utexas.edu
    • GIGABYTE MC62-G40-00
    • 32-core/64-thread AMD Ryzen CPU
    • 512 GB RAM
    • 4 NVIDIA RTX 6000 Ada GPUs, 48G RAM each

hfogstor01.ccbb.utexas.edu

  • 24 6-TB disks
  • 144 TB raw, 90 TB usable
Ellington, Marcotte, Wilke
Iyer/Kim PODShared POD for members of the Vishy Iyer and Jonghwan Kim labs
  • Anna Battenhouse
  • iyercomp02.ccbb.utexas.edu (aka dragonfly.icmb.utexas.edu)
    • Dell PowerEdge R410
    • dual 4-core/8-thread CPUs
    • 64GB RAM
  • iyercomp03.ccbb.utexas.edu (aka adler3.icmb.utexas.edu)
    • Dell PowerEdge R720
    • dual  6-core/12-thread CPUs
    • 192 GB RAM

iyerstor01.ccbb.utexas.edu

  • 24 6-TB disks
  • 144 TB raw, 90 TB usable


Iyer, JKim
Kirkpatrick PODShared POD for members of Kirkpatrick and Harpak labsTBD
  • kirkcomp01.ccbb.utexas.edu
    • Dell PowerEdge R640
    • dual 26-core/52-thread CPUs
    • 768 GB RAM
    • 1.9 TB SSD for high-speec local I/O, mounted as /ssd1 (not backed up)

kirkstor01.ccbb.utexas.edu

  • 12 18-TB disks
  • 216 TB raw, 124 TB usable
Kirkpatrick, Harpak
Lambowitz /CCBB POD

Shared POD for use by CCBB affiliates and the Alan Lambowitz lab.


  • Hans, Hofmann, Rebecca Young Brim (Hofmann lab & CCBB affiliates)
  • Jun Yao (Lambowitz lab)
  • lambcomp01lambcomp02.ccbb.utexas.edu
    •  Dell PowerEdge R410R660xs
    • dual 428-core/856-thread CPUs
    • 64 GB RAM
    ccbbcomp01.ccbb.utexas.edu
    • Dell PowerEdge R420
    • dual 4-core CPUs
    • 96 512 GB RAM
  • ccbbcomp02.ccbb.utexas.edu
    • Dell PowerEdge R720
    • dual  6-core/12-thread CPUs
    • 192 GB RAM

lambstor01.ccbb.utexas.edu

  • 18 16-TB disks
  • 288 TB raw, 170 TB usable


Lambowitz groups:
Lambowitz, LambGuest

CCBB groups:
Cannatella, Hawkes, Hillis, Hofmann, Jansen

 

Instructional groups:
FRI-BigDataBio


LiveStrong DT POD

POD for members of Dell Medical School's LiveStrong Diagnostic Theraputics group.

Note: This POD uses UT EID authentication

  • Jeanne KowalskiSong (Stephen) Yi
  • livecomp01.ccbb.utexas.edu
    • Dell PowerEdge R440
    • dual 14-core/28-thread CPUs
    • 192 GB RAM
    • 480 GB SATA SSD for ultra-high-speed local I/O, mounted as /ssd1 (not backed up)
  • livecomp02.ccbb.utexas.edu, livecomp03.ccbb.utexas.edu
    • AMD GPU server
    • 48-core/96-hyperthread EPYC CPU
    • 512 GB RAM
    • 8 AMD Radeon Instinct MI-50 GPUs with 32GB onboard RAM each
  • livecomp04.ccbb.utexas.edu
    • Dell PowerEdge R640
    • dual 26-core/52-hyperthread CPUs
    • 768 GB RAM
    • 1.9 TB SSD for high-speec local I/O, mounted as /ssd1 (not backed up)

livestor01.ccbb.utexas.edu

  • 24 10-TB disks
  • 240 TB raw, 132 TB usable

Jeanne Kowalski groups:
CancerClinicalGenomics, ColoradoData, MultipleMyeloma
Stephen Yi groups:
SongYi

Lauren Ehrlich groups:
Ehrlich_COVID19, Ehrlich,

Other groups:
Kim, Matsui, Melamed_COVID

Instructional groups:
FRI-BigDataBio

Marcotte PODSingle-lab POD for members of the Edward Marcotte lab
  • Anna Battenhouse
  • marccomp01.ccbb.utexas.edu (aka hopper.icmb.utexas.edu)
    • Dell PowerEdge R730
    • dual 18-core/36-thread CPUs
    • 768 GB RAM
  • marccomp02.ccbb.utexas.edu (aka ada.icmb.utexas.edu)
    • Dell PowerEdge R610
    • dual 4-core/8-thread CPUs
    • 96 GB RAM
  • marccomp03.ccbb.utexas.edu (aka perutz.ccbb.utexas.edu)
    • Dell PowerEdge R610
    • dual 4-core/8-thread CPUs
    • 96 GB RAM

marcstor02.ccbb.utexas.edu

  • 24 12-TB disks
  • 288 TB raw, 160 TB usable


Marcotte
Ochman/Moran PODShared POD for members of the Howard Ochman and Nancy Moran labs
  • Howard Ochman
  • ochmcomp01.ccbb.utexas.edu
    • Dell PowerEdge R430
    • dual 18-core/36-thread CPUs
    • 384 GB RAM
  • ochmcomp02.ccbb.utexas.edu
    • Dell PowerEdge R640
    • dual 26-core/52-hyperthread CPUs
    • 1024 GB RAM
    • 1.9 TB SSD for high-speec local I/O, mounted as /ssd1, (not backed up)

ochmstor01.ccbb.utexas.edu

  • 24 8-TB disks
  • 192 TB raw, 106 TB usable


Ochman, Moran
Rental PODShared POD for POD rental customers
  • Anna Battenhouse (overall)
  • Daylin Morgan (Brock)
  • rentcomp01.ccbb.utexas.edu
    • Dell PowerEdge R640
    • dual 18-core/36-thread CPUs
    • 768 GB RAM
    • 900 GB SATA SSD for ultra-high-speed local I/O, mounted as /ssd1 (not backed up)
  • rentcomp02.ccbb.utexas.edu
    • Dell PowerEdge R640
    • dual 18-core/36-thread CPUs
    • 256 GB RAM
    • 450 GB SATA SSD for ultra-high-speed local I/O, mounted as /ssd1 (not backed up)

rentstor01.ccbb.utexas.edu

  • 12 12-TB disks
  • 144 TB raw, 90 TB usable
Brock, Calder, Champagne, Curley, Fleet, Gaydosh (AddHealth, FragileFamilies, VUSNAPS), GrayGore, Gross, Hillis, Nguyen, Raccah, Seidlits, Sullivan, YiLu, Zamudio
Wilke PODFor use by members of the Claus Wilke lab and the AG3C collaboration
  • Adam HockenberryAaron Feller
  • Alexis Hill
  • wilkcomp01.ccbb.utexas.edu
  • wilkcomp02.ccbb.utexas.edu
    • Dell PowerEdge R930
    • quad 14-core/28-thread CPUs
    • 1 TB RAM
  • wilkcomp03.ccbb.utexas.edu
    • GIGABYTE MC62-G40
    • 48-core AMD Ryzen 5975 CPU
    • 500 G system RAM
    • 4 NVIDIA RTX 6000 Ada GPUs, 48G RAM each
    • 2 TB SSD for fast local I/O, mounted as /ssd1 (not backed up)

wilkstor01.ccbb.utexas.edu

  • 18 16-TB disks
  • 288 TB raw, 170 TB usable


Wilke

...

ResourceDescriptionNetwork availabilityFor details
SSH

Remote access to the bash shell's command line, and remote file transfer commands such as scp and rsync.


  • Standard ssh command unrestricted from the UT campus network (excluding Dell Medical School)
  • Off-campus ssh access:
    • UT VPN service active, or
    • Public key installed in ~/.ssh/authorized_keys
  • Notes:
    • Direct storage server access for file transfers are only accessible from the UT campus network or with the UT VPN service active.
SambaAllows mounting of shared POD storage as a remote file system that can be browsed from your Windows or Mac desktop/laptop computer
  • Unrestricted from the UT campus network (excluding Dell Medical School)
  • Off-campus access requires the UT VPN service to be active
HTTPSAccess to web-based R Studio server and JupyterHub applications
  • Unrestricted for BRCF-managed accounts
    • For PODs using EID authentication (e.g. Livestrong), an active UT EID is required

...

Expand
titleDetailed instructions for Windows

To connect to your Group's Work area as a network drive in Windows:

  • Bring up Windows Explorer (Windows key-E)
  • On Windows 10:
    • You'll see "Computer" in the menu
    • You'll see "Map network drive" in the sub-menu
    • Select "This PC" icon
    • Select "Computer" menu item
  • On Window 7:
    • You'll see "Map network drive" in the menu
    • Select "Computer" icon
  • Click "Map network drive". 
  • In the "Map Network Drive" dialog
    • Select a drive letter
    • In the "Folder" text box, enter your Group area URL
      • e.g. for the Sullivan group on the GSAF pod:
        \\gsafstor01.ccbb.utexas.edu\Sullivan
    • Check the "Connect using different credentials" checkbox
      • Enter your BRCF account name and password
      • If your computer is on the UT Austin Active Directory Domain you need to add "./\" before your BRCF account name
        • e.g. ./\mybrcfaccout
      • Click "Finish". This will bring up the "Enter Network Password" dialog
  • In the "Enter Network Password" dialog 
    • Select "Use another account"
    • Enter your BRCF user name and password
    • Check the "Remember my credentials" checkbox if desired
    • Click "OK"
    • A new Windows Explorer will appear with your Work area in focus

...

Code Block
languagebash
# change to your home directory where the symlinks will be created
cd 
ln -s -fsf /stor/work/BCG bcg_work
ln -ssf -f /stor/scratch/BCG bcg_scratch

# Then, use the symbolic link when copying data from TACC
rsync -avrW $SCRATCH/analysis/ abattenh@cbrsstor01.ccbb.utexas.edu:~/bcg_scratch/analysis/

...

Shared Work areas are backed up weekly. Scratch areas are not backed up. Both Work and Scratch areas may have quotas, depending on the POD (e.g. on the Rental or GSAF pod); such quotas are generally in the multi-terabyte range.

Because it has a large quota and is regularly backed up and archived, your group's Work area is where large research artifacts that need to be preserved should be located.

Scratch, on the other hand, can be used for artifacts that are transient or can easily be re-created (such as downloads from public databases).

See Manage storage areas by project activity for important guidelines for Work and Scratch area contents.

...

Note that any directory in any file system tree named tmp, temp, or backups is not backed up. Directories with these names are intended for temporary files, especially large numbers of small temporary files. See "Cannot create tempfile" error and Avoid having too many small files.

Periodic and long-term archiving

...

What is too many? Ten million or more.

If the files are small, they don't take up much storage space. But the fact that there are so many causes the backup or archiving to run for a really long time. For weekly backups, this can mean that the previous week's backup is not done by the time the next one starts. For archiving, it means it can take weeks on end to archive a single directory that has many millions of small files.

Backing up gets even worse when a directory with many files is just moved or renamed. In this case the files need to be deleted from the old location and added to the new one – and both of these operations can be extremely long-running.








To see how many files (termed "inodes" in Unix) there are under a directory tree, use the df -i command. For example:

Code Block
languagebash
df -i /stor/work/MyGroup/my_dir

...

1) Move the files to a temporary directory.
The backup process excludes any sub-directory anywhere in the file system directory tree named tmp, temp, or backups. So if there are files you don't care about, just rename the directory to, for example, tmp. There will be a one-time deletion of the directory under its previous name, but that would be it. 

...

3) Zip or Tar the directory
If these are important files you need to have backed up, ziping or taring the directory is the way to go. This converts a directory and all its contents into a single, larger file that can be backed up or archived efficiently. Please Contact Us if you would like us to help with this, since with our direct access to the storage server we can perform zip and tar operations much more efficiently than you can from a compute server.

If your analysis pipeline creates many small files as a matter of course, you should consider modifying the processing to create small files in a tmp directory then ziping or taring the as a final step.

Memory usage considerations

...

And in a pathological (but unfortunately not uncommon) pattern, a program (or programs) that need more memory than available can cause "thrashing" where swapping in and out of RAM is happening continuously. This will bring a computer to its knees, making it virtually impossible to do anything on it (slow logins, or logins timing out; any simple command just "hanging" for a long time or never returning). We monitor system usage, and will intervene when we see this happenNote that when each processes a user starts itself spawns multiple threads, as described at Do not run too many processes, this situation can happen. We monitor system usage, and will intervene when we see this happen, by termininating the offending process(es) if possible, or by rebooting the compute server if not.

...

  • Know the memory configuration of the compute server you're using
    • free -g will show you total RAM and swap in Gigabytes
  • Before starting a memory intensive job, check the system's current memory status
    • free -g also shows used and available for both main memory and swap
  • Know the memory requirements of your program.
    • Monitor its the memory usage of one typical process while it is running using top (see https://www.booleanworld.com/guide-linux-top-command/) or htop
    • This is particularly important if you plan to run multiple instances of a program, since it will guide you in knowing how many such instances you should run.
  • Run memory intensive processes when system load is otherwise light (e.g. overnight)

Computational considerations

Running processes unattended

While POD compute servers do not have a batch system, you can still run multiple tasks simultaneously in several different ways. 

For example, you can use terminal multiplexer tools like screen or tmux to create virtual terminal sessions that won't go away when you log off. Then, inside a screen  or  tmux  session you can create multiple sub-shells where you can run different commands.

You can also use the command line utility nohup to start processes in the background, again allowing you to log off and still have the process running.

 Here are some links on how to use these tools:

Do not run too many processes

Having described how to run multiple processes, it is important that you do not run too many processes at a time, because you are just using one compute server, and you're not the only one using the machine!

How many is "too many"? That really depends on what kind of job it is, what compute/input-output mix it has, and how much RAM it needs. As a general rule, don't run more simultaneous jobs on a POD compute server than you would run on a single TACC compute node.

...

  • Use ulimit -H -m <max_ram> to limit the memory a given process can use.
  • Run memory intensive processes when system load is otherwise light (e.g. overnight)
  • No single user should run programs that use excessive RAM
    • Less than 75% of total RAM if running when system load is otherwise light (e.g. overnight), and the programs are not expected to run for more than a few hours
    • Less than 25% of total RAM otherwise

Computational considerations

This section describes a number of computation-related considerations.

Multi-processing: cores vs hyperthreads

Many programs offer an option to divide their work among multiple processes, which can reduce the total clock time the program will run. The option may refer to "processes", "cores" or "threads", but actually target the available computing units on a server. Examples include: samtools sort --threads option; bowtie2 -p/--threads option; in R, library(doParallel); registerDoParallel(cores = NN) and the OMP_NUM_THREADS environment variable for OpenMP programs.

A "computing unit" is a server's cores and hyperthreads, and it is important to keep in mind the difference between the two. Cores are physical computing units, while hyperthreads are virtual computing units – kernel objects that "split" each core into two hyperthreads so that the single compute unit can be used by two processes.

The POD Resources and Access: AvailablePODs table describes the compute servers that are associated with each BRCF pod, along with their available cores and (hyper)threads. (Note that most servers are dual-CPU, meaning that total core count is double the per-CPU core count, so a dual 4-core CPU machine would have 8 cores.) You can also see the hyperthread and core counts on any server via:

Code Block
languagebash

...

cat /proc/cpuinfo | grep -

...

c 'core id'           # actually the number of hyperthreads!
cat /proc/cpuinfo | grep 

...

'siblings' | 

...

head -

...

1 

...

 

...

 

...

# 

...

the 

...

real 

...

number 

...

of 

...

Here is a good article on all the aspects of the top command: https://www.booleanworld.com/guide-linux-top-command/

Finally, be sure to lower the priority of your processes using renice as described below (e.g. renice -n 15 -u `whoami`).

Lower priority for large, long-running jobs

If you have one or more jobs that uses multiple threads, or does significant I/O, its execution can affect system responsiveness for other users.

To help avoid this, please use the renice tool to manipulate the priority of your tasks (a priority of 15 is a good choice). It's easy to do, and here's a quick tutorial: http://www.thegeekstuff.com/2013/08/nice-renice-command-examples/?utm_source=tuicool

For example, before you start any tasks, you can set the default priority to nice 15 as shown here. Anything you start from then on (from this shell) should inherit the nice 15 value.

Code Block
languagebash
renice +15 $$

Once you have tasks running, their priority can be changed for all of them by specifying your user name:

Code Block
languagebash
renice +15 -u `whoami`

or for a particular process id (PID):

Code Block
languagebash
renice +15 -p <some PID number>

Multi-processing: cores vs hyperthreads

Many programs offer an option to divide their work among multiple processes, which can reduce the total clock time the program will run. The option may refer to "processes", "cores" or "threads", but actually target the available computing units on a server. Examples include: samtools sort --threads option; bowtie2 -p/--threads option; in R, library(doParallel); registerDoParallel(cores = NN).

One thing to keep in mind here is the difference between cores and hyperthreads. Cores are physical computing units, while hyperthreads are virtual computing units -- kernel objects that "split" each core into two hyperthreads so that the single compute unit can be used by two processes.

The AvailablePODs table describes the compute servers that are associated with each BRCF pod, along with their available cores and (hyper)threads. (Note that most servers are dual-CPU, meaning that total core count is double the per-CPU core count, so a dual 4-core CPU machine would have 8 cores.) You can also see the hyperthread and core counts on any server via:

Code Block
languagebash
cat /proc/cpuinfo | grep -c 'core id'           # actually the number of hyperthreads!
cat /proc/cpuinfo | grep 'siblings' | head -1   # the real number of physical cores

(Yes, the fact that 'core id' gives hyperthreads and 'siblings' the number of cores is confusing. But what do you expect -- this is Unix (smile))

Since hyperthreads look like available computing units ("CPUs in OS displays), parallel processing options that detect "cores" usually really detect hyperthreads. Why does this matter? 

The bottom line:

  • virtual Hyperthreads are useful if the work a process is doing periodically "yields", typically to perform input/output operations, since waiting for I/O allows the core to be used by other work. Many NGS tools fall into this category since they read/write sequencing files.
  • phycical Cores are best used when a program's work is compute-bound. When processing is compute bound -- as is typical of matrix-intensive machine learning algorithms -- hyperthreads actually degrade performance, because two compute-bound hyperthreads are competing for the same physical core, and there is OS-level overhead involved in process switching between the two.

So before you select a process/core/thread count for your program, consider whether it will perform significant I/O. If so, you can specify a higher count. If it is compute bound (e.g. machine learning), be sure to specify a count low enough to leave free hyperthreads for others to use.

Note that this issue with machine learning (ML) workflows being incredibly compute bound is the main reason ML processing is best run on GPU-enabled servers. While none of our current PODs have GPUs, GPU-enabled servers are available at TACC. Additionally, Austin's Advanced Micro Devices, who are trying to compete with NVIDIA in the GPU market, will soon be offering a "GPU cloud" that will be available to UT researchers. We're working with them on this initiative and will provide access information when it is available.

Input/Output considerations

Avoid heavy I/O load

Please be aware of the potential effects of the input/output (I/O) operations in your workflows.

physical cores

(Yes, the fact that 'core id' gives hyperthreads and 'siblings' the number of cores is confusing. But what do you expect -- this is Unix (smile))

Since hyperthreads look like available computing units ("CPUs in OS displays), parallel processing options that detect "cores" usually really detect hyperthreads. Why does this matter? 

The bottom line:

  • virtual Hyperthreads are useful if the work a process is doing periodically "yields", typically to perform input/output operations, since waiting for I/O allows the core to be used for other work. Many NGS tools fall into this category since they read/write sequencing files.
  • phycical Cores are best used when a program's work is compute-bound. When processing is compute bound -- as is typical of matrix-intensive machine learning algorithms -- hyperthreads actually degrade performance, because two compute-bound hyperthreads are competing for the same physical core, and there is OS-level overhead involved in process switching between the two.

So before you select a process/core/thread count for your program, consider whether it will perform significant I/O. If so, you can specify a higher count. If it is compute bound (e.g. machine learning), be sure to specify a count low enough to leave free hyperthreads for others to use.

Note that the issue with machine learning (ML) workflows being incredibly compute bound is the main reason ML processing is best run on GPU-enabled servers, either at TACC or on one of the BRCF pods with GPUs (see BRCF GPU servers).


Do not run too many processes

Having described how to run multiple processes, it is important that you do not run too many processes at a time, because you are just using one compute server, and you're not the only one using the machine!

Note that starting a single instance of a program can sometimes spawn many threads. For example, Each instance of an OpenMP program by default will use all available threads. To avoid this with OpenMP, set the OMP_NUM_THREAD environment variable (e.g. export OMP_NUM_THREADS=1). However it is important to check the documentation for the particular program being used, and also use top (press the 1 key to see per-hyperthread load) or htop to see how many threads a single instance of a program uses.

How many is "too many"? That really depends on what kind of job it is, what compute/input-output mix it has, and how much RAM it needs. As a general rule, don't run more simultaneous jobs on a POD compute server than you would run on a single TACC compute node.

Before running mutiple jobs, you should check RAM usage (free -g will show usage in GB) and see what is already running using the top program (press the 1 key to see per-hyperthread load), or using the who command, or with a command like this:

Code Block
languagebash
ps -ef | grep -v root | grep -v bash | grep -v sshd | grep -v screen | grep -v tmux | grep -v 'www-data'

Here is a good article on all the aspects of the top command: https://www.booleanworld.com/guide-linux-top-command/

Finally, be sure to lower the priority of your processes using renice as described below (e.g. renice -n 15 -u `whoami`).

Lower priority for large, long-running jobs

If you have one or more jobs that uses multiple threads, or does significant I/O, its execution can affect system responsiveness for other users.

To help avoid this, please use the renice tool to manipulate the priority of your tasks (a priority of 15 is a good choice). It's easy to do, and here's a quick tutorial: http://www.thegeekstuff.com/2013/08/nice-renice-command-examples/?utm_source=tuicool

For example, before you start any tasks, you can set the default priority to nice 15 as shown here. Anything you start from then on (from this shell) should inherit the nice 15 value.

Code Block
languagebash
renice +15 $$

Once you have tasks running, their priority can be changed for all of them by specifying your user name:

Code Block
languagebash
renice +15 -u `whoami`

or for a particular process id (PID):

Code Block
languagebash
renice +15 -p <some PID number>

Running processes unattended

While POD compute servers do not have a batch system, you can still run multiple tasks simultaneously in several different ways. 

For example, you can use terminal multiplexer tools like screen or tmux to create virtual terminal sessions that won't go away when you log off. Then, inside a screen  or  tmux  session you can create multiple sub-shells where you can run different commands.

You can also use the command line utility nohup to start processes in the background, again allowing you to log off and still have the process running.

 Here are some links on how to use these tools:

Input/Output considerations

Avoid heavy I/O load

Please be aware of the potential effects of the input/output (I/O) operations in your workflows.

Many common bioinformatics workflows are largely I/O bound; in other words, they do enough input/output that it is essentially the gating factor in execution time. This is in contrast to simulation or modeling type applications, which are essentially compute bound.

...

For example, as few as three simultaneous invocations of gzip or samtools sort on large files can degrade system responsiveness for other users. If you notice that doing an ls or command completion on the command line seems to be taking forever, this can be a sign of an ls or command completion on the command line seems to be taking forever, this can be a sign of an excessive I/O load (although very high compute loads can occasionally cause similar issues).

To gauge your program's I/O usage:

  1. Run it on smaller datasets first
  2. Check I/O effects by exercising tab-completion from the command line (see below)
    1. tab completion is directly impacted by I/O load, so if it slow there's too much I/O going on

...

excessive I/O load (although very high compute loads can occasionally cause similar issues).

To gauge your program's I/O usage:

  1. Run it on smaller datasets first
  2. Check I/O effects by exercising tab-completion from the command line (see below)
    1. tab completion is directly impacted by I/O load, so if it slow there's too much I/O going on
Code Block
ls /st                   # Typing this + Tab expands to /stor
ls /stor/sy              # Typing this + Tab expands to /stor/system
ls /stor/system/o        # Typing this + Tab expands to /stor/system/opt
ls /stor/system/opt/sam  # Typing this + Tab expands to /stor/system/opt/samtools (not uniquely)

# Typing this + Tab twice will list many possible completions:
ls /stor/system/opt/samtools/bam

Reduce the I/O priority of your processes

Similar to the way renice reduces the CPU priority of your processes (see above), ionice can reduce the I/O priority. This can be done for all your processes or for specific ones:

Code Block
# lower I/O priority for process number <pid>
ionice -c 2 -n 7 -p <pid>

# lower I/O priority for all your processes
ionice -c 2 -n 7 -u <uid>

# and here's how to find your <uid> (user ID)
grep $USER /etc/passwd | awk -F ':' '{print $3}'

Transfer large files directly to the storage server

...

Please see this FAQ for more information: I'm having trouble transferring files to/from TACC.

Other available POD services

...