Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Tip
titleReservations

Use our summer school reservation (CoreNGS-Tue) when submitting batch jobs to get higher priority on the ls6 normal queue today.

Code Block
languagebash
titleRequest an interactive (idev) node
# Request a 180 minute interactive node on the normal queue using today's reservation
idev -m 180 -N 1 -A OTH21164 -r CoreNGS-Tue

# Request a 120 minute idev node on the development queue 
idev -m 120 -N 1 -A OTH21164 -p development


Code Block
languagebash
titleSubmit a batch job
# Using today's reservation
sbatch --reseservation=CoreNGS-Tue <batch_file>.slurm

Note that the reservation name (CoreNGS-Tue) is different from the TACC allocation/project for this class, which is OTH21164.

...

Here is a comparison of the configurations and ls6 and stampede2. As you can see, stampede2 stampede3. stampede3 is the newer (and larger) cluster, just launched in 2017, but ls6, launched om 2022, has fewer but more powerful nodes2024; ls6 lwas launched in 2022.


ls6stampede2stampede3
login nodes

3

128 cores each
256 GB memory

64

28 96 cores each
128 250 GB memory

standard compute nodes

560 AMD Epyc Milan processors

  • 128 cores per node
  • 256 GB memory

4,200 KNL (Knights Landing) processors

68

560 Intel Xeon "Sapphire Rapids" nodes

  • 112 cores per node (272 virtual)
  • 96 128 GB memory

1,736 SKX (Skylake) processors1060 Intel Platinum 8160 "Skylake" nodes

  • 48 cores per node (96 virtual)
  • 192 GB memory

224 Intel Xenon Platinum 8380 "Ice Lake" nodes

  • 80 cores per node
  • 256 GB memory
GPU nodes

16 AMD Epyc Milan processors

128 cores per nodnode
256 GB memory

2x NVIDIA A100 GPUs
w/ 40GB RAM onboard

--

20 GPU Max 1550 "Ponte Vecchio" nodes

96 cores per node
512 GB memory

4x Intel GPU Max 1550 GPUs
w/ 128GB RAM onboard

batch systemSLURMSLURM
maximum job run time

48 hours, normal queue

2 hours, development queue

96 48 hours on KNL GPU nodes, normal queue48

24 hours on SKX other nodes, normal queue

2 hours, development queue

...

Note the use of the term virtual core on stampede2. Compute cores are standalone processors – mini CPUs, each of which can execute separate sets of instructions. However modern cores may also have hyperthreading enabled, where a single core can appear as more than one virtual processor to the operating system (see https://en.wikipedia.org/wiki/Hyper-threading). For example, stampede2 nodes have 2 or 4 hyperthreads (HTs) per core. So KNL nodes with 4 HTs for each of the 68 physical cores, have a total of 272 virtual cores.

Threading is an operating system scheduling mechanism for allowing one CPU/core to execute multiple computations, seemingly in parallel.

The writer of a program that takes advantage of threading first identifies portions of code that can run in parallel because the computations are independent. The programmer assigns some number of threads to that work (usually based on a command-line option) using specific thread and synchronization programming language constructs. An example is the the samtools sort -@ N option to specify N threads can be used for sorting independent sets of the input alignments.

If there are multiple cores/CPUs available, the operating system can assign a program thread to each of them for actual parallelism. But only "seeming" (or virtual) parallelism occurs if there are fewer cores than the number of threads specified.

Suppose there's only one core/CPU. The OS assigns program thread A to the core to run until the program performs an I/O operation that causes it to be "suspended" for the I/O operation to complete. During this time, when normally the CPU would be doing nothing but waiting on the I/O to complete, the OS assigns program thread B to the CPU and lets it do some work. This threading allows more efficient use of existing cores as long as the multiple program threads being assigned do some amount of I/O or other operations that cause them to suspend. But trying to run multiple compute-only, no-I/O programs using multiple threads on one CPU just causes "thread thrashing" -- OS scheduler overhead when threads are suspended for time, not just I/O.

The analogy is a grocery store where there are 5 customers (threads). If there are 5 checkout lines (cores), each customer (thread) can be serviced in a separate checkout line (core). But if there's only one checkout line (core) open, the customers (threads) will have to wait in line. To be a more accurate analogy, any checkout clerk would be able to handle some part of checkout for each customer, then while waiting for the customer to find and enter credit card information, the clerk could handle a part of a different customer's checkout.

Hyperthreading is just a hardware implementation of OS scheduling. Each CPU offers some number of "virtual cores" (hyperthreads) that can "almost" act like separate cores using various hardware tricks. Still, if the work assigned to multiple hyperthreads on a single core does not pause from time to time, thread thrashing will occur.

Software at TACC

...