Compiling CUDA Samples 13.0 on Ubuntu 25.04: A Deep Dive from the Trenches

A complete, battle-tested guide to building NVIDIA CUDA Samples 13.0 on Ubuntu 25.04. Covers driver 580 installation, toolkit setup, environment configuration, CMake builds, and step-by-step fixes for linker, NVRTC, and OpenMP errors.

Compiling CUDA Samples 13.0 on Ubuntu 25.04: A Deep Dive from the Trenches

NVIDIA CUDA 13.0 Samples – Full Build & Troubleshooting Guide for
Ubuntu 25.04 – NVIDIA Driver 580 – CUDA 13.0

1. Overview

This guide walks you step‑by‑step through:

  • Installing the NVIDIA proprietary driver 580 (or the open‑source variant) on Ubuntu 25.04.
  • Installing CUDA Toolkit 13.0 (the matching version for the sample code).
  • Configuring the required environment variables (PATH, LD_LIBRARY_PATH, CUDA_HOME).
  • Downloading, configuring, and building NVIDIA CUDA‑Samples 13.0 with CMake and make.
  • Identifying and fixing the most common compilation / linking problems that appear on a fresh Ubuntu 25.04 installation (missing libnvcuda.so, OpenMP‑CUDA detection, undefined JIT symbols, etc.).

All commands are written for a regular user account with sudo privileges. If you prefer a system‑wide installation, replace the per‑user modifications with the system equivalents (/etc/profile.d/…).

Why a manual?
Ubuntu 25.04 is a brand‑new release (not LTS). The official NVIDIA documentation targets Ubuntu 22.04/24.04, so a few subtle differences (library locations, package names, driver‑open vs. driver‑proprietary) appear. This guide captures those nuances, along with the exact patches needed to make the CUDA‑Samples compile cleanly.

2. Prerequisites & System Preparation

Item Reason Command
Supported GPU Must support the CUDA Compute
Capability required by the
samples (≥ 5.0 for most).
lspci
Ubuntu 25.04 (Plucky Puffin) – up‑to‑date Guarantees recent kernel /
libglvnd versions that the
580 driver expects.
sudo apt update &&
sudo apt full-upgrade
-y && sudo reboot
Basic build tools gcc, g++,
make, cmake,
git, wget,
curl.
sudo apt install -y
build-essential cmake
git wget curl
Optional: Additional dev packages Required for some samples
(e.g., OpenMP, libpthread).
sudo apt install -y
libpthread-stubs0-dev
libssl-dev
Clean the nouveau driver (if present) Prevents conflict with
the proprietary NVIDIA driver.
sudo apt purge -y
xserver-xorg-video-nouveau &&
sudo update-initramfs -u &&
sudo reboot

Tip: After the reboot, verify the kernel version matches the driver you will install (uname -r). The driver package will compile its kernel module against the running kernel.

3. Installing the NVIDIA Driver 580

ubuntu-drivers devices

You will see a line similar to:

== /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0 ==
modalias : pci:v000010DEd00001E07sv00001028sd00003718bc03sc00i00
vendor   : NVIDIA Corporation
model    : TU102 [GeForce RTX 2080 Ti Rev. A]
driver   : nvidia-driver-565 - third-party non-free
driver   : nvidia-driver-555 - third-party non-free
driver   : nvidia-driver-560 - third-party non-free
driver   : nvidia-driver-570 - third-party non-free
driver   : nvidia-driver-580 - third-party non-free
driver   : nvidia-driver-570-server-open - distro non-free
driver   : nvidia-driver-580-server - distro non-free
driver   : nvidia-driver-580-open - third-party non-free recommended
driver   : nvidia-driver-565-open - third-party non-free
driver   : nvidia-driver-570-server - distro non-free
driver   : nvidia-driver-580-server-open - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-570-open - third-party non-free
driver   : nvidia-driver-575 - third-party non-free
driver   : nvidia-driver-550 - third-party non-free
driver   : nvidia-driver-575-open - third-party non-free
driver   : nvidia-driver-555-open - third-party non-free
driver   : nvidia-driver-535-server-open - distro non-free
driver   : nvidia-driver-560-open - third-party non-free
driver   : nvidia-driver-550-open - third-party non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

If you have a server‑grade GPU you may prefer nvidia-driver-580-server.
NOTE: The open driver (nvidia-driver-580-open) ships only the kernel module. It does not install the complete user‑space driver stack (e.g., libnvcuda.so). For CUDA development you must install the proprietary driver package (nvidia-driver-580).

3.2 Packages explanation:

Primary use case:

  • nvidia-driver-580-server: This driver package is optimized for server environments, data centers, and high-performance computing (HPC) where stability, reliability, and long-term support (LTS) are the top priorities. It is rigorously tested for stability on critical compute workloads rather than consumer applications like games.
  • nvidia-driver-580: This is the standard "production branch" or "unified" driver package for consumer desktops. It is the recommended option for general desktop and gaming use, where having the latest features and performance optimizations is more important than stability over a longer period. 

Stability and updates:

  • Server: Updates for the server driver branch are rolled out more slowly and less frequently. This is to ensure maximum stability and avoid introducing potential regressions that could affect mission-critical workloads. The server driver also often includes a longer support period.
  • Standard: The production branch for standard drivers receives more frequent updates to support new games, software releases, and implement the latest features. This means they are more "cutting-edge" but may have more minor bugs than the server version. 

Package contents:

  • Server: The server package may exclude some components found in the standard driver, particularly those related to desktop functionality like specific gaming optimizations or 32-bit (i386) libraries needed for applications like Steam.
  • Standard: The standard driver is a more comprehensive package that includes everything needed for both computational tasks and graphics display.

3.3 Install the Proprietary Driver

sudo apt install -y nvidia-driver-580

During installation, the installer will automatically blacklist nouveau.

3.4 Reboot

sudo reboot

3.5 Verify the Driver

nvidia-smi

Typical output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     On  |   00000000:03:00.0 Off |                  N/A |
| 18%   37C    P8             20W /  250W |    8807MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 2080 Ti     On  |   00000000:04:00.0  On |                  N/A |
| 18%   36C    P8             12W /  250W |    8850MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A          334945      C   /usr/local/bin/ollama                  8802MiB |
|    1   N/A  N/A          334945      C   /usr/local/bin/ollama                  8814MiB |
+-----------------------------------------------------------------------------------------+

If the driver version shows 580.xx and the GPU appears, the driver is functional.

Why not the “open” driver?
The open driver provides only the kernel‑mode component (nvidia.ko). Many user‑space libraries required by the CUDA samples (libcuda.so, libnvcuda.so, libnvrtc.so, libnvJitLink.so) are not installed with the open driver, causing the linking errors you observed later.

4. Installing CUDA Toolkit 13.0

There are two supported installation methods:

  1. Network‑Repo (APT) – recommended
  2. Run‑file installer – useful when you need to keep the driver untouched.

Both approaches install the toolkit under /usr/local/cuda-13.0 (the default symlink /usr/local/cuda will point to it).

4.1 Add the NVIDIA CUDA Repository

# 1. Download the repository pin file (keeps CUDA packages higher priority)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2504/x86_64/cuda-ubuntu2504.pin
sudo mv cuda-ubuntu2504.pin /etc/apt/preferences.d/cuda-repository-pin-600

# 2. Download and install the repository GPG key
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2504/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb

# 3. Update the package index
sudo apt update

Reference: NVIDIA “CUDA Installation Guide for Linux” (Ubuntu 25.04 section).

4.2 Install the Toolkit Packages

# Install the full toolkit (includes nvcc, libraries, samples)
sudo apt install -y cuda-toolkit-13-0

If you only need the driver‑side libraries, you can also install cuda-drivers.
The toolkit packages create the directory /usr/local/cuda-13.0.

4.3 Verify the Toolkit

nvcc --version

Expected output (example):

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:58:59_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0

If the version is not 13.0, you have a mismatch; reinstall the correct version.

5. Setting Up Environment Variables

5.1 Configure

Add the following lines once to ~/.bashrc (or create /etc/profile.d/cuda.sh for system‑wide settings):

# CUDA 13.0 location (adjust if you used a custom install path)
export CUDA_HOME=/usr/local/cuda-13.0
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Apply the changes:

source ~/.bashrc

5.2 Verify

echo $CUDA_HOME          # → /usr/local/cuda-13.0
which nvcc               # → /usr/local/cuda-13.0/bin/nvcc
ldconfig -p | grep cuda # should list libcuda.so, libnvrtc.so, libnvJitLink.so …

Important: The LD_LIBRARY_PATH entry must appear before any older CUDA libraries that might be left from a previous installation, otherwise the linker may pick the wrong version. That is why it should be before the rest export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH.

6. Obtaining & Preparing the CUDA Samples 13.0

The samples are a separate repository that matches the toolkit version.

# Clone the repo (official NVIDIA GitHub)
git clone https://github.com/NVIDIA/cuda-samples.git
cd cuda-samples

# Checkout the exact tag for 13.0 (or use the master for the latest)
git checkout v13.0   # optional – if you already have the 13.0 zip, just extract it

If you downloaded a tarball from the CUDA website, extract it instead:

tar -xf cuda-samples-13.0.tar.gz
cd cuda-samples-13.0

6.1 Create a Build Directory

mkdir -p build && cd build

6.2 Initial CMake Configuration

The samples rely on CMake to locate the toolkit and set the correct flags.

cmake .. \
    -DCMAKE_CUDA_COMPILER=$CUDA_HOME/bin/nvcc \
    -DCMAKE_CUDA_ARCHITECTURES="native" \
    -DCUDAToolkit_ROOT=$CUDA_HOME
  • -DCMAKE_CUDA_COMPILER forces CMake to use the exact nvcc from the toolkit you just installed.
  • -DCMAKE_CUDA_ARCHITECTURES="native" instructs CMake to query the GPU for its compute capability and generate the optimal PTX.
  • -DCUDAToolkit_ROOT helps CMake locate the toolkit when the default /usr/local/cuda symlink is missing or points elsewhere.

Tip: Run cmake .. -LAH after configuration to view all cached variables. Pay special attention to CUDA_* and OpenMP_* entries.

7. Building the Samples – Full Repository

Once CMake finishes successfully, you can compile everything:

make -j$(nproc)      # use all CPU cores

If you encounter “make: *** [] Error 1” errors, re‑run with a single job to see the exact failing command:

make -j1 VERBOSE=1   # prints the full compile/link line for each step

Expected Output (excerpt)

[  1%] Building CXX object 1_Utilities/deviceQuery/CMakeFiles/deviceQuery.dir/deviceQuery.cpp.o
[  2%] Building CUDA object 1_Utilities/deviceQuery/CMakeFiles/deviceQuery.dir/deviceQuery_kernel.cu.o
[  3%] Linking CXX executable bin/x86_64/linux/release/deviceQuery
[  3%] Built target deviceQuery
...
[100%] Built target all_samples

All binaries are placed under cuda-samples-13.0/build/Samples/1_Utilities/deviceQuery/deviceQuery.

8. Building & Running a Key Sample – deviceQuery

# From the top-level build directory
make -j1 1_Utilities/deviceQuery

The executable will appear in cuda-samples-13.0/build/Samples/1_Utilities/deviceQuery/deviceQuery:

cd cuda-samples-13.0/build/Samples/1_Utilities/deviceQuery/
./deviceQuery

Sample successful output:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 2080 Ti"
  CUDA Driver Version / Runtime Version          13.0 / 13.0
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 10819 MBytes (11344740352 bytes)
  (068) Multiprocessors, (064) CUDA Cores/MP:    4352 CUDA Cores
  GPU Max Clock rate:                            1545 MHz (1.54 GHz)
  Memory Clock rate:                             7000 Mhz
...
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "NVIDIA GeForce RTX 2080 Ti"
  CUDA Driver Version / Runtime Version          13.0 / 13.0
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 10822 MBytes (11347623936 bytes)
  (068) Multiprocessors, (064) CUDA Cores/MP:    4352 CUDA Cores
...
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 13.0, CUDA Runtime Version = 13.0, NumDevs = 2
Result = PASS

If you see PASS, the driver–runtime stack is functional, and we going to fix all other.

9. Troubleshooting Build Issues – Detailed Walk‑through

Below you will find the most common errors observed during the session, why they happen, and step‑by‑step fixes.

9.1 -lnvcuda Linker Error (Missing libnvcuda.so)

Typical error:

/usr/bin/ld: cannot find -lnvcuda
collect2: error: ld returned 1 exit status

Why it occurs

  • The sample’s CMakeLists tries to link nvcuda – a virtual library name that historically resolves to libcuda.so.
  • On recent Ubuntu releases, only libcuda.so exists (provided by the driver). The libnvcuda.so symlink is not part of the open‑driver package and therefore cannot be found.

Fixes

Force CMake to Use CUDA::cudart_static (if you want a static link):

target_link_libraries(${sample_name} PRIVATE CUDA::cudart_static)

The static library already pulls the driver symbols via -lcuda automatically.

Patch the CMakeLists (cleaner, permanent): Edit the root CMakeLists.txt (or the offending sub‑directory) and replace the target link line:

# Original (fails)
target_link_libraries(${sample_name} PRIVATE -lnvcuda)

# Patched (works on all Ubuntu variants)
find_library(CUDA_DRIVER_LIB NAMES cuda nvcuda PATHS
             ${CUDA_HOME}/lib64
             /usr/lib/x86_64-linux-gnu
             NO_DEFAULT_PATH)

target_link_libraries(${sample_name} PRIVATE ${CUDA_DRIVER_LIB})

This uses find_library to locate whichever library (libcuda.so or libnvcuda.so) is present and links against it.

Create a Compatibility Symlink (quick‑and‑dirty):

sudo ln -s $CUDA_HOME/lib64/stubs/libcuda.so /usr/lib/x86_64-linux-gnu/libnvcuda.so

This satisfies the linker but does not provide the nvcuda API; it merely points to the driver library. Use only for building sample code that does not call nvcuda directly.

9.2 OpenMP‑CUDA Detection Warning

Could NOT find OpenMP (missing: OpenMP_CUDA_FOUND) (found version "4.5")

Why it occurs

CMake 3.29+ introduced a new OpenMP_CUDA component. The older samples only query OpenMP and assume CUDA‑enabled OpenMP is always available. The detection fails on Ubuntu 25.04 because the required omp.h flags for CUDA are not defined.

Work‑around (add to the top‑level CMakeLists.txt after the project(...) line)

# Compatibility shim for older CUDA samples (CMake ≥ 3.29)
if(NOT DEFINED OpenMP_CUDA_FOUND)
    set(OpenMP_CUDA_FOUND TRUE CACHE BOOL "Assume OpenMP CUDA is available")
    set(OpenMP_CUDA_FLAGS "" CACHE STRING "No special flags needed")
    set(OpenMP_CUDA_LIBRARIES "" CACHE STRING "No extra libs needed")
endif()

This silences the warning and allows the rest of the build to proceed.

The final CMakeLists.txt should look like:

cmake_minimum_required(VERSION 3.20)

project(cuda-samples LANGUAGES C CXX CUDA)

find_package(CUDAToolkit REQUIRED)
# -------------------------------------------------------------------------
# 🧩 Patch: Fix missing libnvcuda by using libcuda instead
# -------------------------------------------------------------------------
find_library(CUDA_LIB cuda PATHS /usr/lib /usr/lib64 /usr/lib/x86_64-linux-gnu /usr/local/cuda/lib64)
if(CUDA_LIB)
    message(STATUS "Using ${CUDA_LIB} instead of libnvcuda")
    add_library(nvcuda SHARED IMPORTED)
    set_target_properties(nvcuda PROPERTIES IMPORTED_LOCATION ${CUDA_LIB})
else()
    message(FATAL_ERROR "Could not find libcuda.so on your system")
endif()

set(CMAKE_POSITION_INDEPENDENT_CODE ON)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)

set(CMAKE_CUDA_ARCHITECTURES 75 80 86 87 89 90 100 110 120)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Wno-deprecated-gpu-targets")

if(ENABLE_CUDA_DEBUG)
    # enable cuda-gdb (may significantly affect performance on some targets)
    set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -G")
else()
    # add line information to all builds for debug tools (exclusive to -G option)
    set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -lineinfo")
endif()

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --extended-lambda")

# -------------------------------------------------------------------------
# 🧩 Workaround for CMake >= 3.29 OpenMP_CUDA detection regression
# -------------------------------------------------------------------------
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.29")
    message(STATUS "Applying OpenMP_CUDA compatibility workaround for CMake ${CMAKE_VERSION}")

    # Predefine all variables FindOpenMP.cmake looks for
    set(OpenMP_CUDA_FOUND TRUE CACHE BOOL "")
    set(OpenMP_CUDA_FLAGS "" CACHE STRING "")
    set(OpenMP_CUDA_LIB_NAMES "omp" CACHE STRING "")
    set(OpenMP_omp_LIBRARY "" CACHE STRING "")
    set(OpenMP_FOUND TRUE CACHE BOOL "")

    # Create dummy imported target if it doesn't exist
    if(NOT TARGET OpenMP::OpenMP_CUDA)
        add_library(OpenMP::OpenMP_CUDA INTERFACE IMPORTED)
        target_compile_options(OpenMP::OpenMP_CUDA INTERFACE "")
    endif()
endif()
# -------------------------------------------------------------------------

add_subdirectory(Samples)

Errors (example from jitLto build):

/usr/bin/ld: /tmp/.../jitLto.o: undefined reference to `__nvJitLinkCreate_13_0'

Root cause

  • The sample links against the stub version of libnvJitLink.so (found in …/stubs/).
  • Stub libraries expose only generic symbols (e.g., __nvJitLinkCreate), not the versioned symbols required by the CUDA 13.0 runtime (__nvJitLinkCreate_13_0).

Fix

Re‑run CMake to refresh the cache and then rebuild:

cmake .. -DCUDA_LIBRARY_PATH=$CUDA_HOME/lib64
make -j1 jitLto

Avoid linking to stub libraries in the global CMake cache:

# Remove any cached stub path that might have been set inadvertently
ccmake ..   # (or `cmake-gui`) → Press `t` → Delete entry `CMAKE_LIBRARY_PATH`

Force CMake to use the real library (non‑stub) by adding an explicit find_library call:

find_library(NVJITLINK_LIB
             NAMES nvJitLink
             PATHS ${CUDA_HOME}/lib64
             NO_DEFAULT_PATH)

if(NVJITLINK_LIB)
    target_link_libraries(jitLto PRIVATE ${NVJITLINK_LIB})
else()
    message(FATAL_ERROR "Could not locate libnvJitLink.so")
endif()

9.4 Undefined NVRTC Symbols

Typical error:

/usr/bin/ld: /tmp/.../nvrtc.o: undefined reference to `nvrtcCreateProgram'

Why it occurs

libnvrtc.so is part of the CUDA Toolkit but not part of the driver alone. If the driver package is installed before the toolkit, the LD_LIBRARY_PATH may still point to an older copy (e.g., from a previous CUDA 11.x / 12.x install). The linker therefore sees an outdated libnvrtc.so that lacks the 13.0 symbols.

Remedy

  • Make sure the toolkit path appears first in LD_LIBRARY_PATH (see section 5).

If the wrong library is still found, remove the older path from the cache:

cmake .. -DCMAKE_PREFIX_PATH=/nonexistent  # forces re‑search

Verify the library version:

strings $CUDA_HOME/lib64/libnvrtc.so | grep "13.0"

If you see “13.0” the correct library is being used.

9.4 “No rule to make target …” Errors

During the make run you may see:

make[2]: *** No rule to make target '/usr/lib/x86_64-linux-gnu/libnvcuda.so', needed by 'sample_name'.  Stop.

What it means

The makefile generated by CMake references a library file that does not exist at configure time. CMake writes a rule like:

/usr/lib/x86_64-linux-gnu/libnvcuda.so

If the file is absent, make aborts before even trying to compile.

Solution

  • Run CMake again after fixing the library detection (see 9.1).

Delete the entire build/ directory and start fresh, because the missing file is cached:

cd ..            # back to the samples root
rm -rf build
mkdir build && cd build
cmake .. …       # with the patched CMakeLists
make -j$(nproc)

9.5 Miscellaneous “collect2: error: ld returned 1 exit status”

These generic failures are almost always the symptom of a missing library or mismatched ABI. Follow the systematic approach:

Step Command / Action
1. Identify the failing target
(make -j1 VERBOSE=1)
Look at the full ld
command line.
2. Check library existence ls $CUDA_HOME/lib64/lib<name>.so*
and ldconfig -p
3. Verify version strings
/usr/lib/x86_64-linux-gnu/libcuda.so
4. Adjust CMake cache cmake .. -DCMAKE_CUDA_COMPILER=
$CUDA_HOME/bin/nvcc
-DCUDAToolkit_ROOT=$CUDA_HOME
4. Re‑run CMake cmake .. (or ccmake .. to edit the cache interactively)
6. Re‑build make -j$(nproc)

If you still hit a wall, the Appendix (section 13) contains one‑liners for each common failure.

10. Patch‑Level Fixes for the Samples Build System

Below is a complete diff that you can apply to the freshly‑extracted samples (apply from the repository root). It solves the three major problems: missing libnvcuda.so, OpenMP‑CUDA detection, and NVJitLink stub usage.

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@
 project(cuda-samples LANGUAGES C CXX CUDA)
 
 # ---------------------------------------------------------------------------
 # 1. Compatibility shim for newer CMake (OpenMP_CUDA component)
 # ---------------------------------------------------------------------------
+if(NOT DEFINED OpenMP_CUDA_FOUND)
+    # The old samples assume OpenMP with CUDA support is always present.
+    # On Ubuntu 25.04 the detection fails, so we simply pretend it exists.
+    set(OpenMP_CUDA_FOUND TRUE CACHE BOOL "Assume OpenMP CUDA is available")
+    set(OpenMP_CUDA_FLAGS "" CACHE STRING "No special compile flags needed")
+    set(OpenMP_CUDA_LIBRARIES "" CACHE STRING "No extra libraries needed")
+endif()
+
 # ---------------------------------------------------------------------------
 # 2. Find the CUDA toolkit (prefer the one pointed to by $CUDA_HOME)
 # ---------------------------------------------------------------------------
-find_package(CUDAToolkit 13.0 REQUIRED)
+set(_cuda_search_paths
+    ${CUDA_HOME}
+    ${CUDA_HOME}/lib64
+    /usr/local/cuda
+    /usr/lib/x86_64-linux-gnu
+)
+find_package(CUDAToolkit 13.0 REQUIRED
+    PATHS ${_cuda_search_paths}
+    NO_DEFAULT_PATH)
+
+# ---------------------------------------------------------------------------
+# 3. Helper to locate the *driver* library (libcuda.so or libnvcuda.so)
+# ---------------------------------------------------------------------------
+find_library(CUDA_DRIVER_LIB
+    NAMES cuda nvcuda
+    PATHS ${CUDA_HOME}/lib64
+          /usr/lib/x86_64-linux-gnu
+    NO_DEFAULT_PATH)
+if(NOT CUDA_DRIVER_LIB)
+    message(FATAL_ERROR "Could not locate libcuda.so (or libnvcuda.so). "
+                        "Make sure the NVIDIA driver is installed.")
+endif()
 
 # ---------------------------------------------------------------------------
 # 4. Global compiler/linker flags (common to all samples)
 # ---------------------------------------------------------------------------
 set(CMAKE_C_STANDARD   11)
 set(CMAKE_CXX_STANDARD 14)
 
-# Existing samples use -lnvcuda directly – replace with the found library.
-foreach(_target ${CUDA_SAMPLE_TARGETS})
-    target_link_libraries(${_target} PRIVATE -lnvcuda)
-endforeach()
+foreach(_target ${CUDA_SAMPLE_TARGETS})
+    target_link_libraries(${_target} PRIVATE ${CUDA_DRIVER_LIB})
+endforeach()

How to apply the diff

cd cuda-samples
patch -p1 < /path/to/above.diff

After the patch, re‑run CMake (delete the old build/ first) and rebuild.

11. Best‑Practice Checklist

Action Command / File
Driver Install proprietary
nvidia-driver-580, not the ‑open
variant.
sudo apt install
nvidia-driver-580
Toolkit Install via APT repository
to get the correct
library layout.
sudo apt install
cuda-toolkit-13-0
Env vars Add CUDA_HOME, PATH, LD_LIBRARY_PATH to ~/.bashrc. export CUDA_HOME=
/usr/local/cuda-13.0
CMake Use explicit -DCMAKE_CUDA_COMPILER and -DCUDAToolkit_ROOT. cmake .. -DCMAKE_CUDA_COMPILER=
$CUDA_HOME/bin/nvcc
-DCUDAToolkit_ROOT=$CUDA_HOME
OpenMP Add compatibility shim in top‑level
CMakeLists.txt.
See §9.2
Driver library linking Replace -lnvcuda with find_library(CUDA_DRIVER_LIB ...). See §9.1
NVJitLink Force CMake to pick the real libnvJitLink.so,
not the stub.
find_library(NVJITLINK_LIB
NAMES nvJitLink …)
Clean rebuild Remove build/ and
re‑configure after
any patch.
rm -rf build &&
mkdir build &&
cd build && cmake ..
Verify runtime Run deviceQuery
(or any sample)
after build.
./bin/x86_64/linux/
release/deviceQuery

If all items above are satisfied, the CUDA Samples should compile and run without further intervention.

13. Appendix – Reference Commands

Goal One‑line Command
Add CUDA env vars system‑wide echo -e "export CUDA_HOME=
/usr/local/cuda-13.0\nexport
PATH=$CUDA_HOME/bin:$PATH\nexport
LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
Force CMake to ignore stub libraries cmake .. -DCMAKE_PREFIX_PATH=$CUDA_HOME
Create missing libnvcuda.so symlink (temporary) sudo ln -s $(ldconfig -p
Re‑run CMake with full diagnostics cmake .. -LAH
Build a single sample with verbose output make -j1 1_Utilities/deviceQuery VERBOSE=1
Inspect cached CMake variables grep '^CUDA_' CMakeCache.txt
Clear the whole CMake cache rm -rf CMakeCache.txt CMakeFiles/
Run the driver‑runtime version check cat /proc/driver/nvidia/version
List all CUDA‑related libraries known to the dynamic loader ldconfig -p

Final Remarks

  • The steps above have been tested on a clean Ubuntu 25.04 installation with the NVIDIA driver 580 and CUDA 13.0.
  • The patches are non‑intrusive – they merely replace a generic library name with a concrete path that works on any Ubuntu flavour (including the open driver scenario).
  • Once the build succeeds, you can explore the full set of samples, modify them, or use them as a starting point for your own CUDA projects.

Happy coding! 🚀

Read next

Testing OpenGL Performance on CPU nouveau driver

While waiting for the right NVLink bridge, I decided to see how my dual RTX 2080 Ti setup performs without them—by turning the GPUs off entirely. The result? A deep dive into OpenGL on CPU using the humble Nouveau driver, where even spinning gears tell a story about patience and pixels.

Intel Z890 / B860 / H810 – What’s the difference?

Z890, B860, H810 — three Intel chipsets, one socket, and wildly different personalities. In this guide, we’ll unpack who’s the overclocker, who’s the office workhorse, and who’s the budget minimalist, helping you pick the perfect platform for your next build.