Compiling Kokkos for GPUs
Overview
Kokkos provides a unified programming model that works across different GPU architectures by abstracting the underlying programming models. NVIDIA GPUs use CUDA, while AMD GPUs use HIP (Heterogeneous-compute Interface for Portability). This chapter explains how to compile Kokkos code to target these different GPU backends.
Hardware Architectures
Before compiling for a specific GPU, you need to know what hardware you’re targeting. The architecture name is crucial for optimal performance.
NVIDIA GPU Architectures
NVIDIA GPUs have evolved through several architecture generations. Some common ones:
| Architecture | Kokkos Flag | GPU Examples | Notes |
|---|---|---|---|
| Ampere | Kokkos_ARCH_AMPERE80 |
A100, A10 | Latest high-performance GPUs |
| Ampere | Kokkos_ARCH_AMPERE86 |
RTX 30 series (desktop) | Consumer-grade Ampere |
| Turing | Kokkos_ARCH_TURING75 |
RTX 20 series, T4 | Older data center GPUs |
| Volta | Kokkos_ARCH_VOLTA70 |
V100, Titan V | Previous generation data center |
To check your GPU, use:
nvidia-smiAMD GPU Architectures
AMD GPUs use the RDNA architecture family for modern GPUs:
| Architecture | Kokkos Flag | GPU Examples |
|---|---|---|
| RDNA3 | Kokkos_ARCH_AMD_RDNA3 |
MI300, MI300X |
| RDNA2 | Kokkos_ARCH_AMD_RDNA2 |
MI100, MI200 series |
| RDNA | Kokkos_ARCH_AMD_RDNA |
MI50, MI60 |
To check your AMD GPU and ROCm version, use:
rocm-smi
hipcc --versionCompiling with CUDA (NVIDIA GPUs)
Prerequisites
Before compiling, ensure you have:
- NVIDIA CUDA Toolkit installed (version 11.8 or later recommended)
- NVIDIA GPU drivers that support your GPU
- CMake build tool (at least version 3.21)
- A C++ compiler (typically
nvcccomes with CUDA)
Check your CUDA installation:
nvcc --versionCMake Configuration for CUDA
First, install Kokkos with CUDA support:
git clone https://github.com/kokkos/kokkos.git
cd kokkos
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_THREADS=ON \
-DKokkos_ENABLE_CUDA=ON \
-DKokkos_ARCH_AMPERE80=ON \
-DCMAKE_CXX_COMPILER=nvcc \
..
cmake --build . -j $(nproc)
cmake --install . --prefix ~/kokkos-installThen use Kokkos in your project’s CMakeLists.txt:
cmake_minimum_required(VERSION 3.21)
project(WaveEquation LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
find_package(Kokkos REQUIRED)
add_executable(wave wave.cpp)
target_link_libraries(wave PRIVATE Kokkos::kokkos)Build Commands
Set up and compile with CUDA:
# Create build directory
mkdir build
cd build
# Configure with Kokkos location
cmake -DKokkos_DIR=~/kokkos-install/lib/cmake/Kokkos ..
# Compile
cmake --build . -j $(nproc)
# Run on GPU
./wave --kokkos-devices=cudaRunning on GPUs
When running CUDA-compiled Kokkos code, use the --kokkos-devices flag:
# Use GPU (CUDA)
./wave --kokkos-devices=cuda
# Use specific GPU (if you have multiple)
./wave --kokkos-devices=cuda --kokkos-cuda-device=0
# View available options
./wave --kokkos-helpAdvanced CUDA Options
For different NVIDIA architectures, change the Kokkos architecture flag when configuring Kokkos:
# For A100 (data center)
cmake -DKokkos_ARCH_AMPERE80=ON ..
# For RTX 3090 / RTX 4090 (consumer)
cmake -DKokkos_ARCH_AMPERE86=ON ..
# For V100
cmake -DKokkos_ARCH_VOLTA70=ON ..
# For T4
cmake -DKokkos_ARCH_TURING75=ON ..Compiling with HIP (AMD GPUs)
Prerequisites
Before compiling for AMD GPUs, ensure you have:
- AMD ROCm installed (version 5.0 or later recommended)
- AMD GPU drivers for ROCm
- CMake build tool (at least version 3.21)
- hipcc compiler (comes with ROCm)
Check your ROCm installation:
hipcc --version
rocm-smiCMake Configuration for HIP
First, install Kokkos with HIP support:
git clone https://github.com/kokkos/kokkos.git
cd kokkos
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_THREADS=ON \
-DKokkos_ENABLE_HIP=ON \
-DKokkos_ARCH_AMD_RDNA2=ON \
..
cmake --build . -j $(nproc)
cmake --install . --prefix ~/kokkos-installThen use Kokkos in your project’s CMakeLists.txt:
cmake_minimum_required(VERSION 3.21)
project(WaveEquation LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
find_package(Kokkos REQUIRED)
add_executable(wave wave.cpp)
target_link_libraries(wave PRIVATE Kokkos::kokkos)Build Commands
Set up and compile with HIP:
# Create build directory
mkdir build
cd build
# Configure with Kokkos location
cmake -DKokkos_DIR=~/kokkos-install/lib/cmake/Kokkos ..
# Compile
cmake --build . -j $(nproc)
# Run on GPU
./wave --kokkos-devices=hipRunning on GPUs
When running HIP-compiled Kokkos code:
# Use GPU (HIP/ROCm)
./wave --kokkos-devices=hip
# Use specific GPU (if you have multiple)
./wave --kokkos-devices=hip --kokkos-device=0
# View available options
./wave --kokkos-helpAdvanced HIP Options
For different AMD architectures, change the Kokkos architecture flag when configuring Kokkos:
# For MI300 / MI300X (latest)
cmake -DKokkos_ARCH_AMD_RDNA3=ON ..
# For MI200 series (MI210, MI250)
cmake -DKokkos_ARCH_AMD_RDNA2=ON ..
# For MI100
cmake -DKokkos_ARCH_AMD_RDNA=ON ..Unified CMake Configuration
You can create a build of Kokkos that supports multiple backends by setting multiple options. Kokkos will compile all enabled backends, and you can switch at runtime:
# Configure Kokkos with multiple backend support
cmake -DCMAKE_BUILD_TYPE=Release \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_THREADS=ON \
-DKokkos_ENABLE_OPENMP=ON \
-DKokkos_ENABLE_CUDA=ON \
-DKokkos_ARCH_AMPERE80=ON \
-DKokkos_ENABLE_HIP=OFF \
..
cmake --build . -j $(nproc)
cmake --install . --prefix ~/kokkos-installThen choose the backend at runtime:
# Run on NVIDIA GPU
./wave --kokkos-devices=cuda
# Run on CPU with threads
./wave --kokkos-devices=threads --kokkos-num-threads=4
# Run serially on CPU
./wave --kokkos-devices=serialTroubleshooting
CUDA/NVCC Compilation Errors
Problem: nvcc: not found - Solution: Add CUDA to your PATH. If using the default CUDA installation:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATHProblem: Architecture mismatch errors - Solution: Verify your GPU architecture with nvidia-smi and set the correct Kokkos_ARCH_* flag when configuring Kokkos with CMake
HIP/ROCm Compilation Errors
Problem: hipcc: not found - Solution: Load the ROCm module (on HPC clusters):
module load rocmOr add to PATH if ROCm is installed locally:
export PATH=/opt/rocm/bin:$PATHProblem: “Could not find HIP” - Solution: Set the HIP path explicitly in your CMake configuration or environment:
export HIP_PATH=/opt/rocm/hip
cmake -DHIP_PATH=/opt/rocm/hip ..CMake Configuration Issues
Problem: “CMake version too old” - Solution: Upgrade CMake to at least version 3.21. See notes/CMake.qmd for details.
Problem: CMake not found for Kokkos subproject - Solution: Ensure CMake is installed:
# Ubuntu/Debian
apt install cmake
# macOS
brew install cmakePerformance Considerations
Device Selection
- NVIDIA A100: Best for general-purpose HPC, uses
Kokkos_ARCH_AMPERE80 - NVIDIA H100: Latest generation, uses
Kokkos_ARCH_HOPPER90 - AMD MI250: Best price-to-performance, uses
Kokkos_ARCH_AMD_RDNA2 - AMD MI300X: Latest, highest bandwidth, uses
Kokkos_ARCH_AMD_RDNA3
Memory Management
Both CUDA and HIP support unified memory, but explicit device transfers via Kokkos::deep_copy often provide better performance:
// Allocate on device
Kokkos::View<double*> device_view("device", N);
// Copy from host to device
Kokkos::deep_copy(device_view, host_view);
// Copy from device to host
Kokkos::deep_copy(host_view, device_view);Occupancy and Block Sizes
For optimal GPU utilization, consider: - GPU memory bandwidth vs. computation ratio - Thread block size (typically multiples of 32 for NVIDIA, 64 for AMD) - Register pressure and shared memory usage
Kokkos handles many of these automatically, but understanding these concepts helps optimize your kernels.