Page Comparison

Two BRCF research pods have NVIDIA GPU servers; however their use is restricted to the groups who own those pods.

Table of Contents

Servers

Hopefog pod

hfogcomp04.ccbb.utexas.edu compute server on the Hopefog pod (Ellington/Marcotte):

Dell PowerEdge R750XA
dual 24-core/48-thread CPUs (48 cores, 96 hyperthreads total)
512 GB system RAM
2 NVIDIA Ampere A100 GPUs w/32GB 80GB onboard RAM each

hfogcomp05.ccbb.utexas.edu

GIGABYTE MC62-G40-00
32-core/64-thread AMD Ryzen CPU
512 GB RAM
4 NVIDIA RTX 6000 Ada GPUs, 48G RAM each

Wilke pod

wilkcomp03.ccbb.utexas.edu compute server on the Wilke pod:

...

The AlphaFold protein structure solving software is available on all AMD GPU servers. The /stor/scratch/AlphaFold directory has the large required database, under the data.3 sub-directory. There is also an AMD example script /stor/scratch/AlphaFold/alphafold_example_amd.shand an alphafold_example_nvidia.sh script if the POD also has NVIDIA GPUs, (e.g. the Hopefog pod). Interestingly, our timing tests indicate that AlphaFold performance is quite similar on all the AMD and NVIDIA GPU servers.

TensorFlow and PyTorch

...

Two Python example scripts are located in /stor/scratch/GPU_info that can be used to ensure you have access to the server's GPUs from TensorFlow or PyTorch. Run them from the command line using time to compare the run times.

...

If GPUs are available and accessible, the output generated will indicate they are being used.

Resources

Command-line diagnostics

Use nvidia-smi to verify access to the server's GPUs.

Note that our system-wide CUDA-enabled TensorFlow and PyTorch versions are only available in the default Python 3 command-line environment (e.g. python3 or python3.8 on the command line). They are not yet available in the global JupyterHub environment that uses the Python 3.9 kernel. If you need a different combination of Python and TensorFlow/PyTorch versions, you'll need to construct an appropriate custom Conda environment (e.g. miniconda3 or anaconda).

GROMACS

An NVIDIA GPU-enabled version of the Molecular Dynamics (MD) GROMACS program is available on all NVIDIA GPU servers, and a CPU-only version is installed also.

The /stor/scratch/GROMACS directory has several useful resources:

benchmarks/ - a set of MD benchmark files from https://www.mpinat.mpg.de/grubmueller/bench
gromacs_nvidia_example.sh - a simple GROMACS example script taking advantage of the GPU, running the benchMEM.tpr benchmark by default.
gromacs_cpu_example.sh - an GROMACS example script using the CPUs only.

Resources

CUDA

Both hfogcomp04 and wilkcomp03 have both CUDA 11.8 and CUDA 12.x installed, under version-specific subdirectories of /usr/local.

To ensure CUDA 11 is made active:

...

Code Block

language	bash

export CUDA_HOME=/usr/local/cuda-12
export PATH=$CUDA_HOME/bin:$PATH

Neither version is specified by default, and some (but not all) programs rely on these environment variables. So you should activate one or the other before running software that uses GPUs.

After setting these environment variables, type nvcc --version to ensure you have access to the desired version.

CUDA drivers are installed under /usr/lib/x86_64-linux-gnu/. To see what version is currently installed:
ls /usr/lib/x86_64-linux-gnu/libnvidia-gl*. See https://saturncloud.io/blog/where-did-cuda-get-installed-in-my-computer/.

Command-line diagnostics

Use nvidia-smi to verify access to the server's GPUs and to monitor GPU usage.

Sharing resources

Since there's no batch system on BRCF POD compute servers, it is important for users to monitor their resource usage and that of other users in order to share resources appropriately.

...

Versions Compared

Old Version 8

New Version Current

Key

Servers

Hopefog pod

Wilke pod

TensorFlow and PyTorch

Resources

Command-line diagnostics

GROMACS

Resources

CUDA

Command-line diagnostics

Sharing resources