Build Troubleshooting¶

This guide helps diagnose and resolve ERF build issues. For library configuration problems, see Library Configuration. For HPC-specific issues, see Machine Profiles, Cray Detection, Build Scripts, and Workstation Builds.

Quick Diagnostic¶

Where’s the problem?

Build Process Issues¶

Missing craype-accel Module¶

Symptom: CMake error during GPU build on Cray systems.

Error:

CMake Error: CRAY_ACCEL_TARGET not set for GPU build

Cause: GPU builds on Cray require craype-accel-* module to set $CRAY_ACCEL_TARGET.

Solution:

Load the module for your hardware:

# NVIDIA A100 (Perlmutter, Polaris)
module load craype-accel-nvidia80

# AMD MI250X (Frontier)
module load craype-accel-amd-gfx90a

# Intel GPUs (Aurora)
module load craype-accel-intel-gpu

Best practice: Use machine profiles:

source Build/machines/perlmutter_erf.profile
cmake -DERF_ENABLE_CUDA=ON ..

Out of Memory During Compilation¶

Symptom: Compilation killed with memory errors.

Error:

nvcc fatal: Memory allocation failure
c++: fatal error: Killed signal terminated program

Cause: GPU compilation requires more memory than default allocation on partial-node systems.

Solution:

Exclusive Node

# SLURM script
#SBATCH --exclusive

# Interactive
salloc --exclusive -N 1

Specific Memory

#SBATCH --mem=240G
# or
#SBATCH --mem-per-cpu=4G

Limit Parallel Jobs

make -j4  # Instead of make -j

Note

Common on Kestrel where partial node allocations are default. Always use --exclusive or explicit memory requests.

Stale CMake Cache¶

Symptom: Unexpected failures after changing modules or compilers.

Cause: CMake caches library locations that become invalid when environment changes.

Solution:

make distclean
cmake ..
make

Or manually:

rm -rf CMakeCache.txt CMakeFiles/
cmake ..

Debugging Tools¶

CMake Debugging¶

# Verbose output
cmake --log-level=VERBOSE ..

# With context (shows hierarchy)
cmake --log-context --log-level=VERBOSE ..

Example output:

[ERF.Cray] Detected Cray Programming Environment
[ERF.Cray] Setting Cray compiler wrappers...
[ERF.NetCDF] Found NetCDF: /opt/cray/pe/netcdf/4.9.0.9

Inspect cache:

cmake -LAH | less
grep NETCDF CMakeCache.txt

GNU Make Debugging¶

# Print variable values
make print-CXXFLAGS
make print-LIBRARIES

# Verbose build
make VERBOSE=1

Library Dependencies¶

# Check linked libraries
ldd ./ERF3d.*.ex | grep netcdf

# Check for symbols
nm ERF3d.*.ex | grep nc_
nm ERF3d.*.ex | grep MPI_

Verifying Successful Builds¶

Quick Test¶

# Run short simulation
cd build/install/bin
mpiexec -n 4 ./ERF3d.*.ex inputs max_step=10

Regression Tests¶

# Configure with tests
cmake -DERF_ENABLE_TESTS=ON ..
make

# Run tests
ctest -L regression -VV

Check Build Info¶

./ERF3d.*.ex --describe

Shows compiler versions, enabled features, and GPU architecture.

Getting Help¶

Before submitting an issue:

Search existing issues
Check this guide and Library Configuration
Run diagnostic commands above

Creating a bug report:

Include this information in your GitHub issue:

**System:**
- OS: [e.g., Perlmutter/CrayOS, Ubuntu 22.04]
- Compiler: [gcc --version or CC --version]
- MPI: [mpirun --version]
- Modules: [module list]

**Build command:**
[Complete cmake command or script]

**Error:**
[Complete, unedited terminal output]

Attach files:

CMakeCache.txt
Build log: make 2>&1 | tee build.log

Diagnostic output:

cmake --log-level=VERBOSE --log-context .. 2>&1 | tee cmake_verbose.log
echo $CRAY_ACCEL_TARGET
echo $NETCDF_DIR
module list

Contributing Fixes¶

If you solve a build problem, contribute your solution!

Contributions welcome:

Machine profiles (Build/machines/*.profile)
Build system improvements
Documentation enhancements
Troubleshooting examples

How to contribute:

Fork ERF repository
Create feature branch
Make changes
Submit Pull Request

See contribution guidelines in the repository.

Note

Community contributions are essential. Your solutions help other users and improve ERF for everyone.