Quick-Start Guide

Welcome to the AMD University Program (AUP) AI & HPC Cluster! This guide will help you get up and running quickly. For comprehensive documentation, please explore the full site.


1. Cluster Hardware

Compute Nodes

Multi-GPU Nodes

Nodes

CPUs

GPUs

DRAM

4

[2x] 128-core EPYC 9755

[8x] MI350X 288 GB

3072 GB DDR5

1

[2x] 128-core EPYC 9755

[8x] MI325X 256 GB

3072 GB DDR5

2

[2x] 96-core EPYC 9684X

[8x] MI300X 192 GB

2304 GB DDR5

10

[2x] 64-core EPYC 7763

[4x] MI250 128 GB

1536 GB DDR4

21

[2x] 64-core EPYC 7V13

[4x] MI210 64 GB

512 GB DDR4

Single-GPU Nodes (Virtual)

Nodes

CPUs

GPU

DRAM

8

16 cores of EPYC 9755

[1x] MI350X 288 GB

334 GB DDR5

8

16 cores of EPYC 9684X

[1x] MI300X 192 GB

238 GB DDR5

28

16 cores of EPYC 7V13

[1x] MI210 64 GB

64 GB DDR4

Note

MI350X will be deployed starting in Q2 2026. The mi3508x and mi3501x partitions will have charge factors of 1.4 and 0.175, respectively.

Login Node

Nodes

CPUs

GPUs

DRAM

2

[2x] 64-core EPYC 7V13

[2x] MI210 64 GB

512 GB DDR4

Warning

The login node is shared by all users. Do not run compute-intensive workloads on it. Use Slurm to submit jobs to compute nodes instead.


2. Logging In

Connect via SSH, replacing <username> with your assigned username:

ssh <username>@hpcfund.amd.com

3. Storage Areas

You have two storage areas available:

Variable

Path

Description

Capacity

$HOME

/home1/<username>

Your personal home directory

25 GB

$WORK

/work/<projectid>/<username>

Your directory within your project’s workspace

2 TB default (shared across project members)


4. Software & Programming Environment

See also

Software Section

We use Lmod to manage software modules. Key commands:

module avail          # List all available packages
module list           # Show currently loaded packages
module load <pkg>     # Load a package (e.g., module load hdf5)
module unload <pkg>   # Unload a package

5. Running Jobs

See also

Running Jobs

We use Slurm for job scheduling. Two primary modes:


6. Jupyter

See also

Jupyter Section

We provide a helper script to launch JupyterLab sessions that tunnel to your local browser.

Tip

You can replace jupyter notebook with jupyter lab in the script if you prefer the full JupyterLab interface.


7. ROCm Profiling & Debugging Tools

If you’re coming from the NVIDIA ecosystem, here’s a mapping of equivalent AMD ROCm tools:

AMD Tool

NVIDIA Tool

Reference

ROCm Compute Profiler

Nsight Compute

Documentation

ROCm Systems Profiler

Nsight Systems

Documentation

rocprof

nvprof

Run rocprof -h

rocm-smi / amd-smi

nvidia-smi

Run amd-smi -h


8. Getting Help

GitHub Issues

The primary support channel for help requests and technical issues is to submit a GitHub issues on our companion GitHub site: github.com/AMDResearch/hpcfund

Tip

If you would like to receive announcements and notifications related to the cluster (e.g., system down times), go to the GitHub site, click the Watch button at the top-right, and select All Activity (or Custom → Discussions). Make sure your GitHub notification settings have email delivery enabled.”

Email

For general questions about the AUP AI & HPC Cluster program or your project, please send emails to: hpc.fund@amd.com


9. Additional Resources

Hardware

Software & Documentation

Learning & Tutorials