Perlmutter (NERSC)¶

Build and run guidance for ERF on NERSC Perlmutter. For shared build concepts (machine profiles, Cray detection, script reference), see Machine Profiles, Cray Detection, Build Scripts, and Workstation Builds.

Building with GNU Make

Simple build using GNU compiler and CUDA:

# Load environment
module load PrgEnv-gnu cudatoolkit cray-mpich cray-netcdf-hdf5parallel

# Navigate to Exec
cd ${ERF_HOME}/Exec

# Build
make -j4 COMP=gnu USE_MPI=TRUE USE_CUDA=TRUE

This produces an executable like ERF3d.gnu.MPI.CUDA.ex.

Building with CMake

Using the provided build script:

# Load environment
source $ERF_HOME/Build/machines/perlmutter_erf.profile

# Configure and build (out-of-source)
mkdir build && cd build
../Build/cmake_with_kokkos_many_cuda.sh

Executable location: build/Exec/erf_exec (or install/bin/erf_exec if installed)

Or manual configuration:

cmake -DCMAKE_BUILD_TYPE=Release \
      -DERF_ENABLE_MPI=ON \
      -DERF_ENABLE_CUDA=ON \
      -DERF_ENABLE_NETCDF=ON \
      -DERF_ENABLE_RRTMGP=ON \
      ..
make -j4

Basic GPU Job

This example runs ERF on 4 nodes with GPU-aware MPI enabled.

Before submitting, load the environment:

source $ERF_HOME/Build/machines/perlmutter_erf.profile

# Run from scratch filesystem with executable and inputs in same directory
mkdir -p $PSCRATCH/ERF/rundir
cd $PSCRATCH/ERF/rundir

# Verify paths before launching
ls -lh ./ERF3d.*.ex inputs

Job submission script:

#!/bin/bash
#SBATCH --account=m4106_g
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-node=4
#SBATCH --gpu-bind=none
#SBATCH --time=00:30:00
#SBATCH --constraint=gpu&hbm40g
#SBATCH --job-name=ERF
#SBATCH --output=erf_%j.out

# GPU-aware MPI optimizations
export MPICH_OFI_NIC_POLICY=GPU
export MPICH_GPU_SUPPORT_ENABLED=1
export SLURM_CPU_BIND="cores"

# Launch with CUDA device ordering
srun -n 16 --cpus-per-task=4 --cpu-bind=cores bash -c "
  export CUDA_VISIBLE_DEVICES=\$((3-SLURM_LOCALID));
  ./ERF3d.gnu.MPI.CUDA.ex inputs amrex.use_gpu_aware_mpi=1"

Submit with: sbatch job_script.sh

80GB GPU Nodes

For the 256 nodes with 80GB HBM per GPU, replace:

#SBATCH --constraint=gpu&hbm40g

with:

#SBATCH --constraint=gpu&hbm80g