Skip to content

[BUG] RX 7900 XTX (gfx1100) stuck in low-power state under compute/ROCm load on linux-cachyos 7.0.11/7.0.12 — works fine on linux-cachyos-lts 6.18.33 #888

Description

@sognix

Pre-flight checklist

  • I have searched existing issues and this is not a duplicate.
  • I have read the Contributing Guidelines.
  • I have verified the issue is reproducible with the latest available CachyOS kernel.
  • I have tried to reproduce the issue on Arch Linux's linux kernel.

Upstream / vanilla kernel check

I have not tested with a vanilla/upstream kernel

Kernel variant

linux-cachyos (EEVDF, Clang)

System information (cachyos-bugreport.sh)

https://paste.cachyos.org/p/85fb1b1.log

Manual system information (if cachyos-bugreport.sh is unavailable)


Bug description

Problem Description

After upgrading from an earlier kernel to linux-cachyos 7.0.11-1 (and confirmed still present on 7.0.12-1), the GPU no longer boosts clocks under compute (ROCm/KFD) workloads, while 3D/gaming workloads boost normally.

Observed behavior

  • Loading a model onto the GPU via PyTorch/ROCm (tested via InvokeAI and a separate HuggingFace transformers fine-tuning script) is dramatically slower than expected — roughly two orders of magnitude. As one data point, InvokeAI's own log reported a single pipeline sub-component (CLIP text encoder, ~234MB) taking 122 seconds to load onto the GPU, where this normally completes in well under a second. The exact total VRAM footprint of the full pipeline was not measured, but the slowdown is consistent and severe across every compute workload tested.
    [2026-06-20 05:28:03,911]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '76120fb2-df02-4247-a12b-9f0124b5a85e:text_encoder' (CLIPTextModel) onto cuda device in 122.34s. Total model size: 234.72MB, VRAM: 234.72MB (100.0%) [2026-06-20 05:28:03,953]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '76120fb2-df02-4247-a12b-9f0124b5a85e:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
  • Gaming workloads (tested with a native Vulkan title) boost GPU clocks normally and run at expected performance.
  • rocm-smi --showclocks during compute load shows:
  WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status
  GPU[0] : sclk clock level: 1: (227-293Mhz)
  GPU[0] : mclk clock level: 1: (456Mhz)

(Expected boost clocks for this card under load: sclk >2000MHz, mclk ~2500MHz)

Troubleshooting already performed (all ineffective)

  1. power_dpm_force_performance_level set to auto (default) — no boost under compute load.
  2. pp_power_profile_mode switched to COMPUTE (index 5) via both LACT GUI and direct sysfs write — confirmed active (* marker correctly shown next to COMPUTE in cat pp_power_profile_mode), but clocks remained low under actual compute workload.
  3. power_dpm_force_performance_level set to manual, then pp_dpm_sclk/pp_dpm_mclk written to force the highest available P-state — write is accepted without error (exit 0), but rocm-smi --showclocks continues to report a low clock level (e.g. level 1) instead of the forced level. The SMU appears to silently reject the requested state, similar to the report below.
  4. LACT "Power States" — disabling lower P-states, leaving only the two highest enabled — accepted by the driver (confirmed via cat pp_power_profile_mode / power states UI), but actual measured clocks during compute load remained low (and disagreed with what KDE System Monitor displayed at the same time, suggesting the forced range is not actually being honored at the hardware level either).

Key observation: GFX-ring vs Compute-ring divergence
Boosting works correctly when load comes through the 3D/GFX ring (gaming), but not when load comes through the compute ring (KFD/ROCm). This points to a regression in the workload-type-to-SMU-hint path for compute submissions specifically, rather than a general DPM/SMU failure (since DPM does respond correctly to GFX-ring workloads).

Related report
A very similar symptom (pp_dpm_sclk state masking silently failing on linux-cachyos 7.0.9-1, SMU returning ret=255, write accepted by kernel but rejected by SMU) was reported for a different GPU generation (Polaris/GFX8, RX 580) on the CachyOS forum about a month ago:
https://discuss.cachyos.org/t/amdgpu-pp-dpm-sclk-state-masking-silently-fails-on-polaris-gfx8-smu-returns-ret-255/30115

This suggests the underlying sysfs DPM-state-masking/enforcement path may be broken generically in the current 7.0.x branch, affecting multiple GPU generations, not just RDNA3.

The ArchWiki AMDGPU page also currently notes that manual powerplay table edits are "bugged on kernel side" for Navi3x/Navi4x, which may be related.

Steps to reproduce

  1. Boot linux-cachyos 7.0.11-1 or 7.0.12-1 with an RX 7900 XTX.
  2. Run any ROCm/PyTorch compute workload (e.g. load a model with transformers/diffusers, or use InvokeAI for image generation).
  3. Monitor rocm-smi --showclocks during the workload.
  4. Observe sclk/mclk remaining at the lowest P-state despite active compute load, and the overall workload taking ~100x longer than expected.
  5. Reboot into linux-cachyos-lts 6.18.33-2, repeat steps 2–4 — clocks boost correctly and performance is normal.

Expected behavior

GPU should boost sclk/mclk appropriately under ROCm/compute load, the same way it already does under 3D/gaming load.

Actual behavior

GPU remains stuck at the lowest available P-state (sclk ~227-293MHz, mclk ~456MHz) for the entire duration of the compute workload, regardless of which power_dpm_force_performance_level mode or pp_power_profile_mode is selected. rocm-smi reports a persistent "low-power state" warning throughout. As a result, GPU-bound compute operations (model loading, inference, training) run roughly 100x slower than expected — a ~234MB model load that should take well under a second instead takes ~122 seconds. The GPU never transitions to a higher P-state during the entire workload, even though the same hardware boosts correctly within seconds under a 3D/gaming workload.

Logs / stack traces


Additional system information

  • dmesg shows normal amdgpu/KFD initialization with no errors (KFD device created successfully, kfd kfd: added device 1002:744c).
  • The standard SMU driver if version not matched message is present in dmesg but appears to be benign/expected on this hardware generation (seen in other working RX 7900 XTX dmesg logs as well).
  • Happy to provide full dmesg output, rocm-smi -a output, or test additional kernel parameters/patches if it helps narrow this down.
  • I have checkted the Arch Linux Kernel box, but only because it is mandatory. I have NOT tested it, because
    setting up a separate vanilla Arch install solely to test this wasn't practical (and I don't have an additional system for it); happy to help narrow this down another way if useful.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions