Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

Austin's own Advanced Micro Devices (AMD) has most generously donated a number of GPU-enabled servers to UT.

While it is still true that AMD GPUs do not support as many 3rd party applications as NVIDIA, they do support many popular Machine Learning (ML) applications such as TensorFlow, PyTorch, and AlphaFold, and Molecular Dynamics (MD) applications such as GROMACS, all of which are installed and ready for use.

Our recently announced AMD GPU pod is available for instructional use and for both research and instructional use, for any qualifying UT-Austin affiliated PIs. To request an allocation, ask your PI to contact us at rctf-support@utexas.edu, and provide the UT EIDs of those who should be granted access.

...

The AlphaFold protein structure solving software is available on all AMD GPU servers. The /stor/scratch/AlphaFold directory has the large required database, under the data.4 sub-directory. There is also an AMD example script /stor/scratch/AlphaFold/alphafold_example_amd.shand an alphafold_example_nvidia.sh script if the POD also has NVIDIA GPUs, (e.g. the Hopefog pod). Interestingly, our timing tests indicate that AlphaFold performance is quite similar on all the AMD and NVIDIA GPU servers.

On AMD GPU servers, AlphaFold is implemented by a run_alphafold.py Python script inside a Docker image, See the run_alphafold_rocm.sh and run_multimer_rocm.sh scripts under /stor/scratch/AlphaFold for a complete list of options to that script.

Pytorch and TensorFlow

Two Python scripts are located in /stor/scratch/GPU_info that can be used to ensure you have access to the server's GPUs from TensorFlow or PyTorch. Run them from the command line using time to compare the run times.

...

We are working hard to get AMD-GPU-enabled versions of TensorFlow and PyTorch working in all three environments. Current status is as follows:

PODAMD-GPU-enabled PyTorchAMD-GPU-enabled TensorFlow
AMD GPU
Hopefog
  • command-line python3, python3.8
  • command-line python3, python3.8
Livestrong
  • command-line python3, python3.8
  • command-line python3, python3.8

If you need a different combination of Python and  TensorFlow/PyTorch versions, you'll need to construct an appropriate custom Conda environment (e.g. miniconda3 or anaconda) as well as your own Jupyter Notebook environment if needed.

...

  • benchmarks/ - a set of MD benchmark files from https://www.mpinat.mpg.de/grubmueller/bench
  • gromacs_amd_example.sh - a simple GROMACS example script taking advantage of the GPU, running the benchMEM.tpr benchmark by default.
  • gromacs_cpu_example.sh - a GROMACS example script using the CPUs only.

You'll see warnings like these when you run the GPU-enabled examples script; they can be ignored:

Code Block
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
...

libibverbs: Warning: couldn't load driver 'libefa-rdmav34.so': libefa-rdmav34.so: cannot open shared object file: No such file or directory
libibverbs: Warning: couldn't load driver 'libbnxt_re-rdmav34.so': libbnxt_re-rdmav34.so: cannot open shared object file: No such file or directory
...

Resources

ROCm environment

...

  • Use top to monitor running tasks (or top -i to exclude idle processes)
    • commands while top is running include:
    • M - sort task list by memory usage
    • P - sort task list by processor usage
    • N - sort task list by process ID (PID)
    • T - sort task list by run time
    • 1 - show usage of each individual hyperthread
      • they're called "CPUs" but are really hyperthreads
      • this list can be long; non-interactive mpstat may be preferred
  • htop is another popular program for monitoring running processes
  • Use mpstat to monitor overall CPU usage
    • mpstat -P ALL to see usage for all hyperthreads
    • mpstat -P 0 to see specific hyperthread usage
  • Use free -g to monitor overall RAM memory and swap space usage (in GB)
  • Use rocm-smi to see GPU usage

...

...