Dealing with NVIDIA Drivers – Full‑Removal, Re‑Installation, and Granular Fixes

Troubleshooting NVIDIA drivers on Linux: why they break suspend/resume, how to safely remove and reinstall them across distros, and practical tweaks—from kernel parameters to systemd hooks—to restore stable sleep and power management.

Dealing with NVIDIA Drivers – Full‑Removal, Re‑Installation, and Granular Fixes
Photo by Brecht Corbeel / Unsplash

1. Why NVIDIA Drivers Interfere with Suspend/Resume

When your Linux machine refuses to suspend, wakes up to a black screen, or drains battery during sleep, the GPU driver is often at fault. This happens because the proprietary NVIDIA driver:

  • Doesn’t always power down the GPU correctly.
  • Leaves the display controller in a half-active state.
  • May clash with the Linux kernel’s power-management system.

Common Symptoms → Root Causes

Symptom Why It Happens
Freeze on suspend or no wake-up GPU power state isn’t properly saved/restored.
Black screen after resume DisplayPort/HDMI link not re-initialized; Xorg or Wayland may have crashed.
High battery drain while “asleep” GPU stuck in P0 (full power) because the driver didn’t receive runtime PM signals.
Kernel oops / GPU lock-ups Mismatched driver module vs. kernel or leftover nouveau modules.

Key Concepts for Suspend/Resume

Concept Why It Matters
Runtime PM Kernel asks driver to shut down GPU when idle; NVIDIA sometimes disables this.
Modesetting & DRM Linux’s display manager (DRM) must hand off the display pipeline properly.
Hybrid Graphics (Optimus/Prime) Misconfigured Intel/AMD + NVIDIA systems keep the dGPU powered or lose output.
Secure Boot Unsigned modules get blocked, falling back to nouveau or partial drivers.
Kernel Module ABI Driver built for an older kernel may load but mishandle power callbacks.

2. Preparation & Safety Checklist

Before changing drivers, ensure you have:

Item Purpose How to Do It
Backup Protect your data in case of a failed install. Use rsync, Timeshift, or disk images.
TTY/SSH access Rescue your system if the GUI fails. Ctrl+Alt+F3 or enable SSH (sudo systemctl enable ssh).
Package manager knowledge Commands vary per distro. Check /etc/os-release to identify yours.
Know your GPU model Determines driver version. lspci -nn
Check installed driver So you know what you’re removing. nvidia-smi or `modinfo nvidia
Disable Secure Boot temporarily Unsigned modules won’t load otherwise. In BIOS/UEFI disable “Secure Boot.”
Fallback kernel entry Boot a rescue kernel if the new driver fails. In GRUB add systemd.unit=rescue.target to an entry.

3. Clean Removal of NVIDIA Drivers

The goal is to completely remove all NVIDIA software—kernel modules, configuration files, and blacklists—so you’re starting fresh.

3.1 Ubuntu/Debian

# Stop the display manager (GDM, LightDM, SDDM, etc.)
sudo systemctl stop gdm      # replace gdm with your DM

# Purge all NVIDIA packages
sudo apt-get purge '^nvidia-.*' \
                  '^libnvidia-.*' \
                  'nvidia-driver' \
                  'nvidia-settings' \
                  'nvidia-prime' \
                  'nvidia-modprobe' \
                  'xserver-xorg-video-nvidia' \
                  'cuda-*' \
                  'libcuda*' \
                  'nvidia-container*' \
                  'libnvidia*' \
                  'nvidia-dkms*' \
                  'nvidia-opencl*'

# Remove leftover DKMS modules
sudo dkms remove nvidia/$(nvidia-smi --query-gpu=driver_version --format=csv,noheader) --all

# Delete configuration files
sudo rm -f /etc/X11/xorg.conf
sudo rm -rf /etc/X11/xorg.conf.d/10-nvidia.conf
sudo rm -f /etc/modprobe.d/blacklist-nouveau.conf
sudo rm -f /etc/modprobe.d/nvidia-installer-disable-nouveau.conf

# Re‑install the default open‑source driver (optional)
sudo apt-get install --reinstall xserver-xorg-video-nouveau

# Regenerate initramfs (important for nouveau blacklist removal)
sudo update-initramfs -u

# Reboot
sudo reboot

3.2 Fedora/RHEL/CentOS

# Stop graphical.target
sudo systemctl isolate multi-user.target

# Remove RPM Fusion NVIDIA packages
sudo dnf remove '*nvidia*' '*cuda*' '*opencl*'

# Remove DKMS if used
sudo dnf remove akmod-nvidia
sudo dkms remove nvidia/$(modinfo -F version nvidia) --all || true

# Clean up Xorg config & modprobe files
sudo rm -f /etc/X11/xorg.conf
sudo rm -rf /etc/X11/xorg.conf.d/10-nvidia.conf
sudo rm -f /etc/modprobe.d/blacklist-nouveau.conf

# Rebuild initramfs
sudo dracut -f

# Reboot
sudo reboot

3.3 openSUSE

sudo systemctl isolate multi-user.target
sudo zypper rm -u $(zypper se -i nvidia | awk '{print $5}')
sudo rm -f /etc/X11/xorg.conf
sudo rm -rf /etc/X11/xorg.conf.d/10-nvidia.conf
sudo rm -f /etc/modprobe.d/blacklist-nouveau.conf
sudo mkinitrd
sudo reboot

3.4 Arch Linux

# Stop display manager
sudo systemctl stop gdm   # or sddm, lightdm, etc.

# Remove all nvidia packages
sudo pacman -Rns nvidia nvidia-utils nvidia-settings \
             nvidia-dkms nvidia-390xx-dkms nvidia-340xx-dkms \
             cuda lib32-nvidia-utils

# Remove leftover config
sudo rm -f /etc/X11/xorg.conf
sudo rm -rf /etc/X11/xorg.conf.d/10-nvidia.conf
sudo rm -f /etc/modprobe.d/blacklist-nouveau.conf

# Re‑generate initramfs
sudo mkinitcpio -P

# Reboot
sudo reboot

3.5 Windows (Quick Note)

If you dual-boot, a corrupt Windows driver can leave the GPU in a weird state:

  1. In Device Manager → Display Adapters → NVIDIA → Uninstall Device (check Delete driver software).
  2. Run Display Driver Uninstaller (DDU) in Safe Mode.
  3. Reboot and install the latest driver.

4. Safe Driver Reinstallation

4.1 Choosing the Right Driver Version

Match your GPU to the recommended version (e.g. RTX 40 series: 525.105.17+). Never install a driver newer than your GPU’s “Supported” version on NVIDIA’s site.

4.2 Installation Methods

Method Pros Cons When to use
Distribution packages (e.g., nvidia-driver from Ubuntu repos) Integrated with package manager, DKMS builds automatically, updates via normal updates. May lag behind upstream; sometimes missing the newest CUDA. Most users; especially on LTS releases.
Official .run installer (downloaded from NVIDIA) Latest driver, optional CUDA toolkit, explicit control. Manual updates, risk of breaking DKMS, conflicts with Secure Boot. When you need a driver version not yet in repos or need the bundled CUDA toolkit.
Third‑party repos (e.g., “Graphics Drivers PPA” for Ubuntu, “RPM Fusion” for Fedora) Usually newer than distro default, well‑maintained. Still a step away from upstream; may have occasional regressions. When you need a newer driver but want automatic updates.
Containerised GPU stacks (e.g., NVIDIA Container Toolkit) Isolates driver version per‑container, great for CUDA workloads. Still requires a host driver; not a solution for X/Wayland. For AI/ML workloads; not for fixing suspend issues.

4.3 Installing on Ubuntu/Debian

# Add the “Graphics Drivers” PPA (optional but gives newer drivers)
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

# Install the recommended driver (replace 525 with your version)
sudo apt install nvidia-driver-525

# Verify DKMS built correctly
dkms status | grep nvidia

# Reboot
sudo reboot

4.4 Fedora (RPM Fusion)

# Enable RPM Fusion free + nonfree
sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm

# Install driver
sudo dnf install akmod-nvidia  # pulls latest driver + dkms
# Or a specific version:
# sudo dnf install xorg-x11-drv-nvidia-525xx akmod-nvidia-525xx

# Rebuild initramfs (normally done automatically)
sudo dracut --force

# Reboot
sudo reboot

4.5 Arch Linux

# Install the driver that matches your kernel (use `uname -r` to see kernel version)
sudo pacman -S nvidia nvidia-utils nvidia-settings

# For LTS kernel:
# sudo pacman -S nvidia-lts nvidia-utils-lts

# Rebuild initramfs (if you have custom hooks)
sudo mkinitcpio -P

# Reboot
sudo reboot

4.6 Manual .run Installer (Generic Linux)

  1. Download .run file from NVIDIA.
  2. Switch to TTY, stop GUI: sudo systemctl isolate multi-user.target.
  3. Blacklist nouveau and rebuild initramfs.
  4. Reboot.

Run installer:

sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --dkms
Secure Boot: either disable or sign the module with MOK (see below).
# Download .run file from NVIDIA.com (e.g., NVIDIA-Linux-x86_64-525.105.17.run)
# Switch to a TTY, stop X/Wayland:
sudo systemctl isolate multi-user.target

# Remove any previously installed driver (see Section 3)

# Blacklist nouveau (if not already):
echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf
sudo update-initramfs -u   # or mkinitcpio -P / dracut -f

# Run installer (add `--dkms` to build a DKMS module):
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --dkms

# Follow on‑screen prompts (accept the OpenGL libs, disable Nouveau if asked).

# Reboot
sudo reboot

5. Targeted Fixes & Tweaks

Apply one fix at a time and test suspend/resume after each.

5.1 Kernel Parameters

Edit /etc/default/grub:

  • mem_sleep_default=deep → force deep sleep (S3).
  • nvidia-drm.modeset=1 → enable DRM-KMS.
  • pci=noaer → suppress PCIe error spam.

Update GRUB and reboot:

sudo update-grub && sudo reboot

5.2 Blacklisting Nouveau

# Create a permanent blacklist file
sudo tee /etc/modprobe.d/blacklist-nouveau.conf <<EOF
blacklist nouveau
options nouveau modeset=0
EOF

# Regenerate initramfs
sudo update-initramfs -u   # Ubuntu/Debian
# or
sudo mkinitcpio -P          # Arch
# or
sudo dracut -f              # Fedora

5.3 Enable nvidia‑drm early

sudo tee /etc/modprobe.d/nvidia-drm.conf <<EOF
options nvidia-drm modeset=1
EOF

5.4 Add nvidia‑uvm to early‑load (helps CUDA + suspend)

echo "nvidia-uvm" | sudo tee -a /etc/modules-load.d/nvidia.conf

5.5 Pre-Sleep Script to Unbind GPU

Create /lib/systemd/system-sleep/nvidia-suspend.sh:

#!/bin/sh
# This script unbinds the NVIDIA driver before suspend
# and re‑binds it after resume.
# Works for both laptops and desktops.

case $1/$2 in
  pre/*)
    # Pre‑suspend: unbind
    echo "Unbinding NVIDIA GPU before suspend"
    for dev in /sys/bus/pci/devices/*:0*/driver/unbind; do
      echo "$(basename $(readlink $dev))" > "$dev"
    done
    ;;
  post/*)
    # Post‑resume: bind back
    echo "Re‑binding NVIDIA GPU after resume"
    for dev in /sys/bus/pci/devices/*:0*/driver/bind; do
      echo "$(basename $(readlink $dev))" > "$dev"
    done
    ;;
esac

Make it executable:

sudo chmod +x /lib/systemd/system-sleep/nvidia-suspend.sh

5.6 Hybrid Graphics Tools

If you have Intel/AMD integrated + NVIDIA discrete, you often need to tell the system which GPU should handle rendering and which should stay idle during suspend.

  • Ubuntu/Debian:
# Install the “prime-select” utility
sudo apt install nvidia-prime

# Switch to NVIDIA mode (for better performance & proper suspend)
sudo prime-select nvidia

# Reboot
sudo reboot
Result: The NVIDIA driver becomes the primary GPU, and the Intel/AMD GPU is turned off during suspend (reducing the chance of a “GPU busy” error).
  • Arch:
sudo pacman -S optimus-manager optimus-manager-qt
# Enable the systemd service
sudo systemctl enable optimus-manager.service
# Start it
sudo systemctl start optimus-manager.service
Then: Use optimus-manager --switch nvidia or optimus-manager --switch integrated.
Why: optimus-manager does a clean hand‑off, powering down the unused GPU before suspend.

5.7 NVIDIA DRM KMS & Wayland

If you run Wayland (e.g., GNOME 44 on Ubuntu 24.04), the NVIDIA driver must have nvidia-drm.modeset=1 and the nvidia-drm module loaded early.

# Enable KMS in the driver
sudo tee /etc/modprobe.d/nvidia-drm.conf <<EOF
options nvidia-drm modeset=1
EOF

# Regenerate initramfs
sudo update-initramfs -u   # Ubuntu/Debian
# or mkinitcpio -P / dracut -f

# Ensure the module loads early:
sudo tee /etc/modules-load.d/nvidia-drm.conf <<EOF
nvidia-drm
EOF

# Reboot
sudo reboot

Now test suspend. If the issue persists, proceed to the next fix.

5.8 Fixing GPU Busy Errors

Symptoms: systemd[1]: Failed to suspend: Device or resource busy and NVIDIA: GPU is busy in journalctl -b.

# Create a systemd sleep hook
sudo tee /etc/systemd/sleep.d/disable-nvidia.conf <<'EOF'
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1   # Enable persistence mode
ExecStop=/usr/bin/nvidia-smi -pm 0    # Disable persistence mode on wake
EOF

Alternatively, you can manually stop services like PipeWire, KWin, or Xorg before suspending:

systemctl stop pipewire pipewire-pulse
systemctl suspend
systemctl start pipewire pipewire-pulse

Use nvidia-suspend / nvidia-resume scripts

NVIDIA provides systemd units that handle power‑state transitions. Ensure they’re enabled:

# On Ubuntu/Debian:
sudo systemctl enable nvidia-suspend.service
sudo systemctl enable nvidia-hibernate.service
sudo systemctl enable nvidia-resume.service

# Verify:
systemctl status nvidia-suspend.service

5.9 Fixing PCIe AER Spam

A spamming AER log can cause the kernel to abort suspend. If you see lines like:

AER: Corrected error received: id=00e0
AER: [  123.456789] PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)

Add the pci=noaer kernel parameter (see 5.1) or disable AER for the NVIDIA device:

# Identify the PCI address (e.g., 0000:01:00.0)
lspci -nn | grep -i nvidia

# Create a module parameter file
sudo tee /etc/modprobe.d/nvidia-aer.conf <<EOF
options nvidia NVreg_AEREnabled=0
EOF

# Regenerate initramfs
sudo update-initramfs -u   # Ubuntu/Debian

5.10 MOK (Machine Owner Key) for Secure Boot

If you must keep Secure Boot enabled (e.g., corporate laptops), you can sign the NVIDIA kernel module:

  1. Generate a key pair
sudo mokutil --import /root/MOK.der   # later we'll enroll it
# Create a key (if not existing)
openssl req -new -x509 -newkey rsa:2048 -keyout MOK.priv -outform DER -out MOK.der -nodes -days 36500 -subj "/CN=NVIDIA MOK/"
  1. Enroll the key – Reboot, you’ll see a blue MOK enrollment screen. Choose Enroll MOK, then Continue, then Yes, then Enter password (the one you set with mokutil --import).
  2. Sign the module
# After driver installation, locate the .ko file
sudo /usr/src/linux-headers-$(uname -r)/scripts/sign-file sha256 ./MOK.priv ./MOK.der $(modinfo -n nvidia)

# Verify
sudo modinfo nvidia | grep signer
Alternative – Use the mokutil --disable-validation option to temporarily bypass MOK, but that defeats Secure Boot’s purpose.

5.11 Power Management Daemons

Daemon Use When to enable
tlp Aggressive power‑saving for laptops (CPU scaling, USB autosuspend). If you have a laptop and notice high power draw after resume.
laptop-mode-tools Provides a “laptop mode” that tweaks disk and USB power on suspend. If you need more granular control than TLP.
powertop Diagnostic tool – can suggest wake‑up sources. Run sudo powertop --auto-tune after each driver reinstall.

Example (TLP on Ubuntu):

bashCollapseSaveCopy123sudo apt install tlp tlp-rdwsudo systemctl enable tlpsudo systemctl start tlp

sudo apt install tlp tlp-rdw
sudo systemctl enable tlp
sudo systemctl start tlp

After enabling, run sudo tlp-stat -s to confirm it’s active.

5.12 BIOS / Firmware Updates

Many suspend/resume bugs are firmware‑level. Check the manufacturer’s website for a BIOS update that mentions “Improved sleep/hibernate” or “GPU power management”.

Laptop Model
Known Fix
Dell XPS 13 9370
BIOS 1.13.0 (2024) + acpi_osi= resolves S3 wake‑up.
Lenovo ThinkPad X1 Carbon Gen 9
BIOS 1.45 + nvidia-drm.modeset=1 + prime-select nvidia.
HP Spectre x360 14
BIOS 2.5.0 (2025) + disabling AER (pci=noaer).

If you cannot update BIOS (e.g., corporate lock‑down), you may need to request an exception from IT.

6. Testing Your Setup

After each change, reboot and test suspend:

systemctl suspend
# Wait a few seconds, then wake the machine (press power button or open lid)
journalctl -b -1   # Shows logs from previous boot (pre‑suspend)
journalctl -b      # Shows current boot (post‑resume)

7. Rollback & Alternatives

  • Nouveau: install open-source driver (xserver-xorg-video-nouveau).
  • Version pinning: lock a known-good NVIDIA version.
  • Containerised stacks: isolate CUDA inside containers (NVIDIA Container Toolkit).

Quick “One‑Liner” Checklist (Ubuntu/Debian)

sudo apt purge '^nvidia-.*' && sudo apt install nvidia-driver-560 nvidia-prime tlp
sudo prime-select nvidia
echo "options nvidia-drm modeset=1" | sudo tee /etc/modprobe.d/nvidia-drm.conf
echo "nvidia-drm" | sudo tee /etc/modules-load.d/nvidia-drm.conf
echo "nvidia-uvm" | sudo tee -a /etc/modules-load.d/nvidia.conf
sudo systemctl enable nvidia-suspend.service nvidia-resume.service
sudo systemctl enable tlp
sudo update-initramfs -u
sudo reboot

8. Appendix

  • Logs: /var/log/syslog, journalctl -xe, dmesg.
  • DKMS status: dkms status.
  • Power states: cat /proc/driver/nvidia/gpus/0000:01:00.0/power.

This rewrite preserves your manual’s scope but presents it as a flowing, practical guide with clearer explanations and step-by-step formatting for readers.

Would you like me to also add screenshots and annotated diagrams (e.g., showing BIOS settings, GRUB edits) to turn it into a visual PDF/HTML guide?

Read next

Testing OpenGL Performance on CPU nouveau driver

While waiting for the right NVLink bridge, I decided to see how my dual RTX 2080 Ti setup performs without them—by turning the GPUs off entirely. The result? A deep dive into OpenGL on CPU using the humble Nouveau driver, where even spinning gears tell a story about patience and pixels.