Aurora (ALCF): Build and Run with SYCL¶
Copy-paste instructions for building and running ERF on ALCF Aurora using Intel GPUs (SYCL backend).
For general HPC build concepts, see Machine Profiles, Cray Detection, Build Scripts, and Workstation Builds. For library configuration details, see Library Configuration.
Prerequisites¶
Aurora compute allocation (PBSPro scheduler)
Access to required modules (loaded via
Build/machines/aurora_erf.profile)Environment variables set by the modules after sourcing the profile:
NETCDF_C_ROOT— path to NetCDF-C installation (set bynetcdf-cmodule)NETCDF_FORTRAN_ROOT— path to NetCDF-Fortran installation (set bynetcdf-fortranmodule)HDF5_ROOT— path to HDF5 installation (set byhdf5module)
NetCDF Requirements¶
Aurora provides system NetCDF modules (netcdf-c, netcdf-fortran, netcdf-cxx4) that are
loaded automatically by Build/machines/aurora_erf.profile. No user-built NetCDF installation
is required.
Quick checks for your NetCDF installation after sourcing the profile:
# C headers and library (required)
ls $NETCDF_C_ROOT/include/netcdf.h
ls $NETCDF_C_ROOT/lib64/libnetcdf*
# Fortran module and library (required for Noah-MP)
ls $NETCDF_FORTRAN_ROOT/include/netcdf.mod
ls $NETCDF_FORTRAN_ROOT/lib64/libnetcdff*
# HDF5 library
ls $HDF5_ROOT/lib/libhdf5*
Quick Checks¶
Verify your environment before building:
module list
which cmake mpicc mpicxx mpifort icpx
echo $NETCDF_C_ROOT
echo $NETCDF_FORTRAN_ROOT
echo $HDF5_ROOT
Interactive Build and Run
1) Get an interactive allocation
qsub -I -A <PROJECT> -q <QUEUE> -l select=<NODES> -l walltime=<HH:MM:SS> -l filesystems=<FILESYSTEMS>
Example with typical debug settings:
qsub -I -A <PROJECT> -q debug -l select=1 -l walltime=1:00:00 -l filesystems=flare
2) Load software environment
Source the Aurora machine profile (recommended):
export ERF_HOME=<PATH_TO_ERF>
source $ERF_HOME/Build/machines/aurora_erf.profile
Alternative: Manual module loads
If the profile is outdated or unavailable, load modules manually:
module load mpich/opt/4.2.3-intel
module load hdf5/1.14.6
module load netcdf-cxx4
module load netcdf-c
module load netcdf-fortran
module load python/3.10.14
module load cmake
# Intel compilers for MPICH wrappers
export MPICH_CC=icx
export MPICH_CXX=icpx
export MPICH_FC=ifx
export MPICH_F90=ifx
# Derive NETCDF_FORTRAN_ROOT if not set by the module
if [[ -z "${NETCDF_FORTRAN_ROOT:-}" ]]; then
export NETCDF_FORTRAN_ROOT=$(module show netcdf-fortran 2>&1 | \
sed -n 's/.*setenv("NETCDF_FORTRAN_ROOT","\([^"]*\)").*/\1/p')
fi
3) Set search paths
export CPPFLAGS="-I${NETCDF_C_ROOT}/include -I${NETCDF_FORTRAN_ROOT}/include -I${HDF5_ROOT}/include"
export LDFLAGS="-L${NETCDF_C_ROOT}/lib64 -L${NETCDF_FORTRAN_ROOT}/lib64 -L${HDF5_ROOT}/lib"
export LD_LIBRARY_PATH="${NETCDF_C_ROOT}/lib64:${NETCDF_FORTRAN_ROOT}/lib64:${HDF5_ROOT}/lib:${LD_LIBRARY_PATH:-}"
4) Configure and build
cd $ERF_HOME
git submodule update --init --recursive
cmake -S . -B build_aurora \
-DCMAKE_INSTALL_PREFIX="$(pwd)/install_aurora" \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_CXX_COMPILER=mpicxx \
-DCMAKE_Fortran_COMPILER=mpifort \
-DCMAKE_PREFIX_PATH="${NETCDF_C_ROOT};${NETCDF_FORTRAN_ROOT};${HDF5_ROOT}" \
-DCMAKE_CXX_FLAGS="-fsycl-max-parallel-link-jobs=8 --offload-compress -flink-huge-device-code" \
-DERF_ENABLE_MPI=ON \
-DERF_ENABLE_NETCDF=ON \
-DERF_ENABLE_HDF5=ON \
-DERF_ENABLE_NOAHMP=ON \
-DERF_ENABLE_RRTMGP=ON \
-DERF_ENABLE_SYCL=ON \
-DAMReX_GPU_BACKEND=SYCL \
-DAMReX_INTEL_ARCH=pvc \
-DAMReX_SYCL_AOT=ON \
-DAMReX_SYCL_SPLIT_KERNEL=NO \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_SYCL=ON \
-DKokkos_ENABLE_SYCL_RELOCATABLE_DEVICE_CODE=ON \
-DKokkos_ARCH_INTEL_PVC=ON
cmake --build build_aurora -j 10
Note
The first full build takes significant time — SYCL AOT compilation for Intel PVC is
expensive. If compilation runs out of memory on the login node, submit the build
as an interactive PBS job (qsub -I -l select=1 -q debug ...) or reduce parallelism:
cmake --build build_aurora -j 4.
CMake flag reference:
Flag |
Purpose |
|---|---|
|
MPI parallelism |
|
NetCDF I/O (needed for real cases / WPS initialization) |
|
Parallel HDF5 I/O |
|
Noah-MP land-surface model |
|
RRTMGP radiation package |
|
GPU offload via SYCL |
|
AMReX SYCL backend |
|
Target Intel Data Center GPU Max (PVC) |
|
Ahead-of-time compilation for PVC (slower build, faster runtime) |
|
Disable kernel splitting (required for large kernels on PVC) |
|
Kokkos serial backend (always required alongside SYCL) |
|
Kokkos SYCL backend (used by RRTMGP and Noah-MP) |
|
Required for linking Kokkos SYCL device code across translation units |
|
Kokkos target architecture for Intel PVC |
5) Run
From within a PBS allocation ($PBS_NODEFILE must be set):
cd $ERF_HOME
# Compute node count from PBS
NNODES=$(wc -l < $PBS_NODEFILE)
# MPI layout (Aurora: 12 tiles/GPUs per node)
NRANKS=12 # MPI ranks per node (one per GPU tile)
NDEPTH=8 # Hardware threads per rank
NTHREADS=1 # OpenMP threads per rank
NTOTRANKS=$((NNODES * NRANKS))
echo "NNODES=$NNODES NTOTRANKS=$NTOTRANKS NRANKS=$NRANKS NTHREADS=$NTHREADS"
mpiexec --np ${NTOTRANKS} --ppn ${NRANKS} --depth=${NDEPTH} --cpu-bind=depth \
-env OMP_NUM_THREADS=${NTHREADS} \
build_aurora/Exec/<case>/erf_<case> <PATH_TO_INPUTS_FILE>
MPI layout parameters:
Parameter |
Description |
Default |
|---|---|---|
|
MPI ranks per node (Aurora has 12 GPU tiles per node) |
12 |
|
Hardware threads per rank (controls rank spacing) |
8 |
|
OpenMP threads per rank ( |
1 |
Batch Submission
Save the following as submit_erf_aurora.pbs (edit placeholders):
#!/bin/bash -l
#PBS -A <PROJECT>
#PBS -q <QUEUE>
#PBS -l select=<NODES>
#PBS -l walltime=<HH:MM:SS>
#PBS -l filesystems=home:flare
#PBS -N erf_aurora
#PBS -j oe
#PBS -o erf_${PBS_JOBID}.out
set -euo pipefail
# -------------------------------------------------------------------
# User configuration (edit these)
# -------------------------------------------------------------------
export ERF_HOME=<PATH_TO_ERF>
CASE_DIR=<PATH_TO_RUN_DIR>
EXE=$ERF_HOME/build_aurora/Exec/<case>/erf_<case>
INPUTS=<inputs_filename>
# MPI layout (Aurora: 12 GPU tiles per node)
NRANKS=12 # MPI ranks per node
NDEPTH=8 # Hardware threads per rank
NTHREADS=1 # OpenMP threads per rank
# -------------------------------------------------------------------
# Load software environment
# -------------------------------------------------------------------
source $ERF_HOME/Build/machines/aurora_erf.profile
export CPPFLAGS="-I${NETCDF_C_ROOT}/include -I${NETCDF_FORTRAN_ROOT}/include -I${HDF5_ROOT}/include"
export LDFLAGS="-L${NETCDF_C_ROOT}/lib64 -L${NETCDF_FORTRAN_ROOT}/lib64 -L${HDF5_ROOT}/lib"
export LD_LIBRARY_PATH="${NETCDF_C_ROOT}/lib64:${NETCDF_FORTRAN_ROOT}/lib64:${HDF5_ROOT}/lib:${LD_LIBRARY_PATH:-}"
export OMP_NUM_THREADS=${NTHREADS}
# -------------------------------------------------------------------
# Validate
# -------------------------------------------------------------------
if [[ ! -x "$EXE" ]]; then
echo "ERROR: executable not found: $EXE"
exit 1
fi
if [[ ! -f "$CASE_DIR/$INPUTS" ]]; then
echo "ERROR: input file not found: $CASE_DIR/$INPUTS"
exit 1
fi
# -------------------------------------------------------------------
# Run
# -------------------------------------------------------------------
cd "$CASE_DIR"
NNODES=$(wc -l < $PBS_NODEFILE)
NTOTRANKS=$((NNODES * NRANKS))
echo "NNODES=$NNODES NTOTRANKS=$NTOTRANKS NRANKS=$NRANKS NTHREADS=$NTHREADS"
mpiexec --np ${NTOTRANKS} --ppn ${NRANKS} --depth=${NDEPTH} --cpu-bind=depth \
"$EXE" "$INPUTS"
Submit the job:
qsub submit_erf_aurora.pbs
Monitor:
qstat -u $USER
qstat -f <jobid>
Tip
Build once, run many: Build ERF once (interactively or in a dedicated build job), then reuse the executable in subsequent batch scripts. SYCL AOT compilation is expensive — rebuilding per job wastes allocation time.
Example Inputs File¶
The ABL case (inputs_ml_most) demonstrates a typical configuration:
# Problem setup
erf.prob_name = "ABL"
max_step = 4000
# Domain: 1024^3 m on 64^3 grid
geometry.prob_extent = 1024 1024 1024
amr.n_cell = 64 64 64
geometry.is_periodic = 1 1 0
# Boundary conditions
zlo.type = "surface_layer"
zhi.type = "SlipWall"
# MOST surface layer
erf.most.z0 = 0.1
erf.most.zref = 8.0
erf.most.surf_temp = 1.1
# Time stepping
erf.fixed_dt = 1.0
# Refinement
amr.max_level = 1
amr.ref_ratio_vect = 20 20 1
erf.coupling_type = "TwoWay"
# Output
erf.plot_file_1 = plt
erf.plot_int_1 = 1
erf.check_file = chk
erf.check_int = 100
# Physics
erf.les_type = "Smagorinsky"
erf.Cs = 0.1
See Running for complete input file documentation.
Expected Output¶
After a successful run, the working directory will contain:
plt*— Plotfiles (AMReX format) at intervals set byerf.plot_int_1chk*— Checkpoint files at intervals set byerf.check_intBacktrace.*— Stack traces (if errors occurred)
Troubleshooting¶
CMake configuration fails
Check environment:
module list
which cmake mpicc mpicxx mpifort icpx
echo $NETCDF_C_ROOT
echo $NETCDF_FORTRAN_ROOT
echo $HDF5_ROOT
ls $NETCDF_C_ROOT/include/netcdf.h
ls $NETCDF_FORTRAN_ROOT/include/netcdf.mod
Common causes:
NETCDF_C_ROOTorNETCDF_FORTRAN_ROOTnot set — re-sourceBuild/machines/aurora_erf.profileModules not loaded (run
module listto verifynetcdf-c,netcdf-fortran,hdf5are present)Stale CMake cache — remove
build_aurora/and reconfigure
Compilation runs out of memory
Reduce parallel compilation:
cmake --build build_aurora -j 4
Or submit the build as an interactive PBS job to get a full compute node:
qsub -I -A <PROJECT> -q debug -l select=1 -l walltime=1:00:00 -l filesystems=flare
mpiexec fails with PBS errors
Ensure you are inside a PBS allocation:
echo $PBS_NODEFILE
cat $PBS_NODEFILE
If empty or unset, you are not in a PBS job context. Use qsub -I ... for interactive or submit a batch script.
Runtime errors or crashes
Quick diagnostic run:
mpiexec --np 12 --ppn 12 --depth=8 --cpu-bind=depth \
-env OMP_NUM_THREADS=1 \
build_aurora/Exec/<case>/erf_<case> <inputs_file> max_step=10
Check Backtrace.* files for stack traces.
NETCDF_FORTRAN_ROOT not set after sourcing profile
If the netcdf-fortran module does not export NETCDF_FORTRAN_ROOT, derive it manually:
export NETCDF_FORTRAN_ROOT=$(module show netcdf-fortran 2>&1 | \
sed -n 's/.*setenv("NETCDF_FORTRAN_ROOT","\([^"]*\)").*/\1/p')
echo $NETCDF_FORTRAN_ROOT # should be non-empty
For additional troubleshooting, see Build Troubleshooting.