Diagnosing Problems – Log‑Reading & Interpretation

Learn how to diagnose Linux power-management problems by reading and interpreting logs. This guide shows where to find suspend, resume, battery and wake-up events, how to filter them, and how to pinpoint the driver or device causing issues.

Diagnosing Problems – Log‑Reading & Interpretation
Photo by Kristine Wook / Unsplash

Table of Contents

Introduction

  1. Why Logs Matter for Power‑Management
  2. Key Log Sources
    • 3.1 Systemd Journal (journalctl)
    • 3.2 Traditional Syslog (/var/log/*)
    • 3.3 Kernel Ring Buffer (dmesg)
    • 3.4 Power‑specific Daemons (pm‑log, tlp, upowerd, systemd‑logind)
  3. Essential Tools & Commands
  4. Common Power‑Management Subsystems & Their Logs
    • 5.1 ACPI & ACPI Events
    • 5.2 CPU Frequency Scaling (cpufreq, intel_pstate, amd_pstate)
    • 5.3 Device Suspend/Resume (PM‑runtime, runtimepm)
    • 5.4 Battery & Power‑Supply (upower, acpi)
    • 5.5 Display & GPU Power (i915, amdgpu, nouveau)
  5. Step‑by‑Step Diagnostic Workflow
  6. Typical Problems & Log‑Based Remedies
  7. Advanced Log‑Parsing Techniques
  8. Building a Personal Knowledge Base

Conclusion


Introduction

Power management in Linux is a collaborative effort between the kernel, firmware (ACPI/UEFI), and userspace daemons. When something goes wrong - excessive wake‑ups, premature sleep, battery drain, or failed suspend/resume - diagnosing the issue usually starts with the logs. This manual walks you through where the relevant information lives, how to extract it, and how to interpret it to resolve common power‑management glitches.

1. Why Logs Matter for Power‑Management

Reason Explanation
Visibility The kernel reports hardware events (e.g., ACPI wake sources) that userspace can’t see.
Temporal Correlation Logs contain timestamps that let you link an event (e.g., “resume”) with a preceding cause (e.g., “USB device activity”).
Configuration Feedback Daemons such as tlp or powertop write diagnostics about applied settings, helping you verify that policies are active.
Error Reporting Critical failures (e.g., “PM: Unable to enter suspend”) are logged with severity levels, making them searchable.

2. Key Log Sources

2.1 Systemd Journal (journalctl)

Modern distributions use systemd-journald. It aggregates kernel messages, syslog, and daemon output into a binary journal.

# Show only power‑management related entries (case‑insensitive)
journalctl -b -u systemd-logind -u upower -u tlp -k | grep -iE 'power|acpi|suspend|resume|battery'

Flags you’ll use frequently

Flag Meaning
-b Restrict to the current boot (or -b -1 for previous).
-k Show kernel messages only (same as dmesg).
-u <unit> Filter by systemd unit (e.g., systemd-suspend.service).
--since "2024‑09‑01 10:00" Time‑range filtering.
-p err Show entries of priority error or higher.

2.2 Traditional Syslog (/var/log/*)

If your system still runs rsyslog/syslog-ng, look at:

  • /var/log/kern.log – kernel ring‑buffer dump.
  • /var/log/syslog – general system messages (Debian/Ubuntu).
  • /var/log/messages – general system messages (RHEL/Fedora).

Tip: grep -i acpi /var/log/kern.log isolates ACPI messages.

2.3 Kernel Ring Buffer (dmesg)

dmesg is the live view of kernel logs. Use it for quick checks after a suspend/resume cycle:

dmesg | grep -iE 'acpi|pm|suspend|resume|battery|wake'

You can also dump the entire buffer to a file for offline analysis:

dmesg > /tmp/dmesg-$(date +%F-%T).log

2.4 Power‑Specific Daemons

Daemon Log Location Typical Entries
tlp journalctl -u tlp or /var/log/tlp.log (if TLP_ENABLE=1 and TLP_LOG=1) “TLP: applying settings for AC”, “TLP: runtime PM disabled for USB”.
powertop No persistent log, but interactive output can be saved via powertop --csv. “Device wakeup count”, “Power usage per process”.
upowerd journalctl -u upower “Battery state changed”, “Power source changed”.
systemd-logind journalctl -u systemd-logind “Suspend request”, “Resume finished”.
acpid (if used) /var/log/acpid (depends on config) “ACPI event: button/power”.

3. Essential Tools & Commands

Tool Purpose Example
journalctl Unified log browsing journalctl -b -p warning
dmesg Immediate kernel messages dmesg -T
grep, awk, sed Text filtering journalctl -k
acpi Query battery/thermal info acpi -V
upower -i /org/freedesktop/UPower/devices/battery_BAT0 Detailed battery stats
tlp-stat -s -b Show TLP current config and battery
powertop --json Export PowerTOP data for later analysis
systemd-analyze blame Identify long‑running services that may prevent suspend
cat /sys/power/* Inspect current kernel power‑state files cat /sys/power/state
ls /sys/bus/acpi/devices/ List ACPI devices and their power control files

4. Common Power‑Management Subsystems & Their Logs

4.1 ACPI & ACPI Events

ACPI is the firmware interface that tells the OS about power states, button presses, thermal zones, etc.

  • Log messages: ACPI:, ACPI button, ACPI: LAPIC, PM: resume from...
  • Typical location: kernel log (dmesg), journal (-k).

Example snippet

Oct 05 14:32:01 hostname kernel: ACPI: button [PWRF] (validating)
Oct 05 14:32:01 hostname kernel: ACPI: Power Button [PWRF] pressed
Oct 05 14:32:01 hostname systemd-logind[1234]: Power key pressed, initiating suspend

Interpretation

  • A PWRF event is being received. If you never press the power button but see this, a rogue device (e.g., USB hub) is generating spurious events.

4.2 CPU Frequency Scaling

Linux uses the cpufreq subsystem (intel_pstate, amd_pstate, or generic drivers).

  • Relevant files: /sys/devices/system/cpu/cpu*/cpufreq/*
  • Log clues: “cpufreq: transition …”, “intel_pstate: ...”.

Example

Oct 05 14:35:12 hostname kernel: intel_pstate: No hardware limits found, using defaults
Oct 05 14:35:12 hostname kernel: cpufreq: driver intel_pstate: scaling driver registered
Oct 05 14:35:12 hostname kernel: cpufreq: switching to governor 'powersave'

If you see many rapid “cpu0: idle state” messages, it may indicate a CPU governor mismatch causing unnecessary wake‑ups.

4.3 Device Suspend/Resume (PM‑runtime)

Devices may be runtime‑suspended when idle and resumed on demand.

  • Log source: kernel messages containing PM: runtime or PM: device.
  • Files: /sys/devices/.../power/runtime_status, runtime_enabled.

Example

Oct 05 14:40:18 hostname kernel: usb 1-1:1.0: USB disconnect, device number 5
Oct 05 14:40:18 hostname kernel: PM: runtime suspend of USB device usb1:1.0 failed with error -22

Interpretation: The USB device refuses to suspend; it may be a culprit for battery drain.

4.4 Battery & Power‑Supply

Managed by the power_supply class, exposed via /sys/class/power_supply/.

  • Key logs: battery:, power_supply:, UPower: messages.
  • Utilities: upower -i, acpi -V.

Example

Oct 05 15:00:00 hostname upower:   warning: Battery (BAT0) has a low capacity (95%)
Oct 05 15:00:00 hostname upower:   info: Battery (BAT0) is now discharging

If the battery repeatedly reports “critical” despite a healthy charge, look for battery health logs (/sys/class/power_supply/BAT0/health).

4.5 Display & GPU Power

Graphics drivers implement runtime PM and power‑saving states.

  • Intel i915: messages like i915: power domain and i915: wakeup.
  • AMD: amdgpu: power state entries.
  • NVIDIA (nouveau): nouveau: pm runtime logs.

Example

Oct 05 15:12:33 hostname kernel: i915 0000:00:02.0: power domain 0: active
Oct 05 15:12:33 hostname kernel: i915 0000:00:02.0: power domain 0: inactive

Frequent transitions may indicate a mis‑configured screen‑blanking timer or an app preventing idle.

5. Step‑by‑Step Diagnostic Workflow

  1. Reproduce the Symptom
    • Note the exact time, actions (e.g., closing the lid, pressing power), and environment (plugged‑in vs. battery).
  2. Correlate Timestamps
    • Use awk or spreadsheet to line up the moment you pressed the button with the surrounding log entries.
    • Example: awk '/2025-09-29 14:32:01/{print;getline;print}' /tmp/power‑log.txt
  3. Identify the “First” Failure
    • Look for messages with severity error, warning, or failed that appear before the symptom resolves.
  4. Verify Daemon Configurations
    • systemctl status tlp
    • tlp-stat -s (shows active profile).
    • powertop --auto-tune to see which tunables are being applied.
  5. Test with Minimal Services
    • Boot into a rescue or single‑user environment (e.g., systemd.unit=rescue.target) to rule out third‑party services.
    • Re‑run the suspend/resume test and compare logs.
  6. Apply Fix & Re‑test
    • Edit the offending configuration (e.g., blacklist a USB device from wake‑up: echo "device" > /sys/bus/usb/devices/1-1/power/wakeup).
    • Reload the module or reboot, then repeat steps 1‑4.
  7. Document the Outcome
    • Store the final log excerpt and a short note in a personal knowledge base for future reference.

Check Device‑Specific Runtime PM

for d in /sys/devices/*/power/runtime_status; do
    echo "$d: $(cat $d)"
done | grep -i "active"

Filter for Power‑Related Keywords

grep -iE 'acpi|suspend|resume|pm|runtime|battery|wake|idle' /tmp/journal-*.log > /tmp/power‑log.txt

Capture the Logs

# Capture the journal for the last 30 minutes (adjust as needed)
journalctl --since "30 minutes ago" > /tmp/journal-$(date +%F-%T).log
# Also dump the kernel ring buffer
dmesg > /tmp/dmesg-$(date +%F-%T).log

6. Typical Problems & Log‑Based Remedies

Problem Log Signature Likely Cause Fix (log‑driven)
Suspend never completes PM: suspend entry failed: -EBUSY
systemd-logind: Failed to suspend
A device refuses to go into
low‑power state (often a
USB or Wi‑Fi adapter).
Identify device via dmesg
Unexpected wake‑ups PM: resume from suspend
followed quickly by
PM: Device xyz woke
up the system
A peripheral (mouse, network,
Bluetooth) generating
wake events.
grep -i wake /var/log/kern.log
to see source. Disable via
echo "disabled" > /sys/.../power/wakeup
.
Battery drains quickly while idle powertop: Device 0000:00:14.0
shows high wake‑ups, i915
frequent power‑domain switches
GPU or PCIe device not
entering runtime PM.
Verify i915.enable_rc6=1
in kernel parameters; add
pcie_aspm=force
if safe.
Battery not charging ACPI: Battery (BAT0) not present
or
upower: warning: No battery detected
ACPI battery driver missing
or BIOS mis‑reports.
Check dmesg
CPU stuck at max frequency cpufreq: scaling driver
not loaded

or
intel_pstate: No P‑states
Missing intel_pstate
module or governor mis‑set.
modprobe intel_pstate;
set governor: echo powersave >
/sys/devices/system/cpu/cpu0/
cpufreq/scaling_governor
.
System refuses to go to hibernate systemd-hibernate.service:
Failed with result 'exit-code'
and PM: hibernate
not supported
Swap not large enough
or not configured for hibernation.
Verify swap size (free -h),
ensure resume=UUID=…
kernel param points to
swap partition.
Random “ACPI: _S3 not supported” ACPI: \_S3 not supported Firmware lacks S3 (suspend‑to‑RAM)
support; may only support
modern S0ix.
Use systemctl suspend (S3)
vs.
systemctl suspend-then-hibernate.
If S3 unavailable, rely on
s2idle
(echo s2idle > /sys/power/mem_sleep).

7. Advanced Log‑Parsing Techniques

7.1 Using journalctl JSON Output

journalctl -b -o json-pretty | jq -r 'select(.MESSAGE | test("PM|ACPI|suspend")) | "\(.__REALTIME_TIMESTAMP) \(.SYSLOG_IDENTIFIER) \(.MESSAGE)"' > pm‑events.txt
  • jq filters only entries containing power‑keywords and formats them for quick diff.

7.2 Correlating Wake‑Source Stats

Linux provides /proc/acpi/wakeup (or /sys/kernel/debug/wakeup_sources on newer kernels).

cat /sys/kernel/debug/wakeup_sources | grep -i "total_time" > /tmp/ws_before
# Perform suspend/resume
cat /sys/kernel/debug/wakeup_sources | grep -i "total_time" > /tmp/ws_after
diff -u /tmp/ws_before /tmp/ws_after | grep -v '^---' > new‑wakers.txt
  • The diff shows which devices increased their total_time during the test.

7.3 Visualizing with gawk Timeline

gawk '
/suspend/ {s=$0; next}
 /resume/ {print s "\n" $0 "\n"}' /tmp/power‑log.txt > /tmp/suspend‑timeline.txt

Creates a paired view: suspendresume.

7.4 Building a Persistent “PM‑Audit” Script

#!/usr/bin/env bash
set -euo pipefail
log=/var/log/pm‑audit-$(date +%F-%T).log
{
    echo "=== JOURNAL ==="
    journalctl -b --since "1 hour ago"
    echo "=== DMESG ==="
    dmesg
    echo "=== WAKEUP_SOURCES ==="
    cat /sys/kernel/debug/wakeup_sources
} > "$log"
  • Schedule via systemd-timer to capture logs automatically before the laptop is put to sleep.

8. Building a Personal Knowledge Base

  • File Naming: pm‑<YYYYMMDD>-<symptom>.log
  • Version Control: Store in a private Git repository; commit each new case.

Metadata: Include a small YAML header:

---
date: 2025-09-29
system: laptop‑x1
symptom: “suspend fails after USB‑3 disconnect”
root_cause: USB device 1‑1 runtime‑suspend failure
fix: disabled wakeup for 1‑1; added udev rule
---

Conclusion

Systemd’s journal combined with kernel dmesg provides a comprehensive view of every power‑management event. By systematically capturing timestamps, filtering for key terms, and tracing the first failure message, you can pinpoint the exact driver or device responsible for suspend, wake, or battery‑drain issues.

Remember:

  • Early logs are king – the first error before the symptom is usually the root cause.
  • Device runtime‑PM status can be inspected directly from /sys.
  • Daemons (TLP, powertop, laptop‑mode-tools) leave clear footprints; verify they are active.

With this workflow, you’ll be able to diagnose and resolve most Linux power‑management woes without endless trial‑and‑error.

Happy debugging!

Read next

Testing OpenGL Performance on CPU nouveau driver

While waiting for the right NVLink bridge, I decided to see how my dual RTX 2080 Ti setup performs without them—by turning the GPUs off entirely. The result? A deep dive into OpenGL on CPU using the humble Nouveau driver, where even spinning gears tell a story about patience and pixels.