Page Comparison

Tip

title	Reservations

Use our summer school reservation (CoreNGS-Tue) when submitting batch jobs to get higher priority on the ls6 normal queue today.

Code Block

language	bash
title	Request an interactive (idev) node

# Request a 180 minute interactive node on the normal queue using today's reservation
idev -m 180 -N 1 -A OTH21164 -r CoreNGS-Tue

# Request a 120 minute idev node on the development queue 
idev -m 120 -N 1 -A OTH21164 -p development

Code Block

language	bash
title	Submit a batch job

# Using today's reservation
sbatch --reseservation=CoreNGS-Tue <batch_file>.slurm

Note that the reservation name (CoreNGS-Tue) is different from the TACC allocation/project for this class, which is OTH21164.

...

Here is a comparison of the configurations and ls6 and stampede2. As you can see, stampede2 stampede3. stampede3 is the newer (and larger) cluster, just launched in 2017, but ls6, launched om 2022, has fewer but more powerful nodes2024; ls6 lwas launched in 2022.

	ls6	stampede2stampede3
login nodes	3 128 cores each 256 GB memory	64 28 96 cores each 128 250 GB memory
standard compute nodes	560 AMD Epyc Milan processors 128 cores per node 256 GB memory 4,200 KNL (Knights Landing) processors 68	560 Intel Xeon "Sapphire Rapids" nodes 112 cores per node (272 virtual) 96 128 GB memory 1,736 SKX (Skylake) processors1060 Intel Platinum 8160 "Skylake" nodes 48 cores per node (96 virtual) 192 GB memory 224 Intel Xenon Platinum 8380 "Ice Lake" nodes 80 cores per node 256 GB memory
GPU nodes	16 AMD Epyc Milan processors 128 cores per nodnode 256 GB memory 2x NVIDIA A100 GPUs w/ 40GB RAM onboard --	20 GPU Max 1550 "Ponte Vecchio" nodes 96 cores per node 512 GB memory 4x Intel GPU Max 1550 GPUs w/ 128GB RAM onboard
batch system	SLURM	SLURM
maximum job run time	48 hours, normal queue 2 hours, development queue	96 48 hours on KNL GPU nodes, normal queue48 24 hours on SKX other nodes, normal queue 2 hours, development queue

...

Note the use of the term virtual core on stampede2. Compute cores are standalone processors – mini CPUs, each of which can execute separate sets of instructions. However modern cores may also have hyperthreading enabled, where a single core can appear as more than one virtual processor to the operating system (see https://en.wikipedia.org/wiki/Hyper-threading). For example, stampede2 nodes have 2 or 4 hyperthreads (HTs) per core. So KNL nodes with 4 HTs for each of the 68 physical cores, have a total of 272 virtual cores.

Threading is an operating system scheduling mechanism for allowing one CPU/core to execute multiple computations, seemingly in parallel.

The writer of a program that takes advantage of threading first identifies portions of code that can run in parallel because the computations are independent. The programmer assigns some number of threads to that work (usually based on a command-line option) using specific thread and synchronization programming language constructs. An example is the the samtools sort -@ N option to specify N threads can be used for sorting independent sets of the input alignments.

If there are multiple cores/CPUs available, the operating system can assign a program thread to each of them for actual parallelism. But only "seeming" (or virtual) parallelism occurs if there are fewer cores than the number of threads specified.

Suppose there's only one core/CPU. The OS assigns program thread A to the core to run until the program performs an I/O operation that causes it to be "suspended" for the I/O operation to complete. During this time, when normally the CPU would be doing nothing but waiting on the I/O to complete, the OS assigns program thread B to the CPU and lets it do some work. This threading allows more efficient use of existing cores as long as the multiple program threads being assigned do some amount of I/O or other operations that cause them to suspend. But trying to run multiple compute-only, no-I/O programs using multiple threads on one CPU just causes "thread thrashing" -- OS scheduler overhead when threads are suspended for time, not just I/O.

The analogy is a grocery store where there are 5 customers (threads). If there are 5 checkout lines (cores), each customer (thread) can be serviced in a separate checkout line (core). But if there's only one checkout line (core) open, the customers (threads) will have to wait in line. To be a more accurate analogy, any checkout clerk would be able to handle some part of checkout for each customer, then while waiting for the customer to find and enter credit card information, the clerk could handle a part of a different customer's checkout.

Hyperthreading is just a hardware implementation of OS scheduling. Each CPU offers some number of "virtual cores" (hyperthreads) that can "almost" act like separate cores using various hardware tricks. Still, if the work assigned to multiple hyperthreads on a single core does not pause from time to time, thread thrashing will occur.

Software at TACC

...

Versions Compared

Old Version 114

New Version 115

Key

Software at TACC