About X-Windows applications

Draft content...

Since this is a recurring issue for some BRCF participants, we should beef up our documentation on how to run X11 apps installed on our compute servers.

Specifically, we need to detail what exact steps are needed to get X11 forwarding working:

on a Mac from Terminal
on a Mac with XQuartz
on Windows with Putty
on Windows with WSL

There are different use cases:

Where X11 forwarding just has to be configured because a command line application requires it (e.g. megacc)
Where X11 forwarding is needed to visualize the GUI of a basic X-Windows app installed on the pod (e.g. test with firefox)
Where X11 forwarding is needed to visualize the GUI of a complex X-Windows app, e.g. 3D Slicer, which requires a GLX extension

Maorong has provided this guidance for running the CLC genomics X11 application GUI:

On Macs:

Make sure you have Xquartz installed.
Run Xquartz, start an xterm.
In the xterm, run:
ssh -Y your_user_name@wcarcomp01.ccbb.utexas.edu
- It will ask you to login.
Once login, you run
/stor/opt/bin/clcgenomicswb24
- It will popup a GUI on your desktop, and you should be able to interact with it.

On Windows:

Xming is a standalone X server for windows, and it can be used along with Putty to do remote display.
- E.g. https://it.engineering.oregonstate.edu/run-x11-application-windows

Hyperthreads

About cores and hyperthreads

Note the use of the term virtual core on stampede3. Compute cores are standalone processors – mini CPUs, each of which can execute separate sets of instructions. However modern cores may also have hyperthreading enabled, where a single core can appear as more than one virtual processor to the operating system (see https://en.wikipedia.org/wiki/Hyper-threading). For example, stampede3 nodes have 2 or 4 hyperthreads (HTs) per core. So KNL nodes with 4 HTs for each of the 68 physical cores, have a total of 272 virtual cores.

Threading is an operating system scheduling mechanism for allowing one CPU/core to execute multiple computations, seemingly in parallel.

The writer of a program that takes advantage of threading first identifies portions of code that can run in parallel because the computations are independent. The programmer assigns some number of threads to that work (usually based on a command-line option) using specific thread and synchronization programming language constructs. An example is the the samtools sort -@ N option to specify N threads can be used for sorting independent sets of the input alignments.

If there are multiple cores/CPUs available, the operating system can assign a program thread to each of them for actual parallelism. But only "seeming" (or virtual) parallelism occurs if there are fewer cores than the number of threads specified.

Suppose there's only one core/CPU. The OS assigns program thread A to the core to run until the program performs an I/O operation that causes it to be "suspended" for the I/O operation to complete. During this time, when normally the CPU would be doing nothing but waiting on the I/O to complete, the OS assigns program thread B to the CPU and lets it do some work. This threading allows more efficient use of existing cores as long as the multiple program threads being assigned do some amount of I/O or other operations that cause them to suspend. But trying to run multiple compute-only, no-I/O programs using multiple threads on one CPU just causes "thread thrashing" -- OS scheduler overhead when threads are suspended for time, not just I/O.

The analogy is a grocery store where there are 5 customers (threads). If there are 5 checkout lines (cores), each customer (thread) can be serviced in a separate checkout line (core). But if there's only one checkout line (core) open, the customers (threads) will have to wait in line. To be a more accurate analogy, any checkout clerk would be able to handle some part of checkout for each customer, then while waiting for the customer to find and enter credit card information, the clerk could handle a part of a different customer's checkout.

Hyperthreading is just a hardware implementation of OS scheduling. Each CPU offers some number of "virtual cores" (hyperthreads) that can "almost" act like separate cores using various hardware tricks. Still, if the work assigned to multiple hyperthreads on a single core does not pause from time to time, thread thrashing will occur.