You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
I confirm that this does not happen with the proprietary driver package.
Note: The system runs exclusively with the open kernel driver (USE=kernel-open on Gentoo/Pentoo). The proprietary driver has not been tested with this specific crash scenario. However, related HMM issues (see #901) have been confirmed open-driver-specific, and the crash occurs at an offset inside nvidia_uvm where HMM state is accessed.
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
I am running on a stable kernel release.
Hardware: GPU
NVIDIA GeForce RTX 3080 Mobile (GA104, PCI ID 10de:24a0, PCI slot 0000:01:00.0)
System: System76 Oryx Pro, Intel Core i7-12700H + RTX 3080 Mobile hybrid graphics (Optimus/PRIME), BIOS 2022-07-20_ae6aa72
Describe the bug
A kernel NULL pointer dereference occurs in uvm_hmm_unregister_gpu+0x40 inside nvidia_uvm during UVM VA space teardown when a CUDA worker thread exits. The fault address is 0x00000000000000a0 (NULL + 0xa0). RAX is 0x0 at the point of the fault; the faulting instruction attempts to read from [rax+0xa0].
The crashing thread (cuda00001800007, PID 180310, UID 1000) was blocked in sys_poll (userspace ORIG_RAX=7) when it received a signal and was killed, triggering do_exit → UVM VA space cleanup → the crash.
The driver version (595.58.03, built 2026-03-26 11:47) had been running without issue for approximately 85 minutes before the crash. The CUDA workload was active during that entire period.
Adding options nvidia_uvm uvm_disable_hmm=1 to modprobe configuration prevents the crash, confirming the fault is in the HMM (Heterogeneous Memory Management) code path of nvidia_uvm.
Decoding the faulting instruction: The Code: field shows the faulting byte sequence (marked with <4c>) decodes as add r12, QWORD PTR [rax+0xa0] — a read of 8 bytes from rax+0xa0 where rax=0. This is a dereference of a NULL pointer with an offset of 0xa0 into what should be a GPU struct. RDI is also 0x0, consistent with a NULL GPU pointer being passed.
To Reproduce
Cannot reproduce on demand. The crash occurred once during normal system use while a CUDA workload was running (most likely background Steam shader pre-compilation via fossilize/nv-fossilize, which runs as UID 1000 and uses CUDA). The CUDA worker thread was signaled while blocked in sys_poll and crashed during exit cleanup.
The crash has not recurred since adding options nvidia_uvm uvm_disable_hmm=1 to /etc/modprobe.d/nvidia.conf, which is consistent with the fault being in the HMM path.
Bug Incidence
Happened once (single occurrence recovered from EFI pstore). Workaround (uvm_disable_hmm=1) applied since then.
nvidia-bug-report.log.gz
Not available — the system was not running when the crash was analyzed; all crash data was recovered from EFI pstore after reboot.
The kernel was also built with CONFIG_RANDSTRUCT_PERFORMANCE=y (Pentoo default). This is unrelated to this crash but is noted for completeness (it affects nvidia_drm/nvidia_modeset, not nvidia_uvm).
[S]=CPU_OUT_OF_SPEC taint flag is present because the BIOS has disabled eist (Intel SpeedStep) on this system, which the kernel detects. It is not related to this crash.
Driver was freshly rebuilt on the same day as the crash (build time 11:47, crash at ~13:17 = 90 minutes of uptime with CUDA active).
NVIDIA Open GPU Kernel Modules Version
595.58.03
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Note: The system runs exclusively with the open kernel driver (
USE=kernel-openon Gentoo/Pentoo). The proprietary driver has not been tested with this specific crash scenario. However, related HMM issues (see #901) have been confirmed open-driver-specific, and the crash occurs at an offset insidenvidia_uvmwhere HMM state is accessed.Operating System and Version
Pentoo Linux (Gentoo-based), OpenRC
Kernel Release
6.19.9-pentoo (custom build, 2026-03-26,
PREEMPT(voluntary))Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 3080 Mobile (GA104, PCI ID
10de:24a0, PCI slot0000:01:00.0)System: System76 Oryx Pro, Intel Core i7-12700H + RTX 3080 Mobile hybrid graphics (Optimus/PRIME), BIOS 2022-07-20_ae6aa72
Describe the bug
A kernel NULL pointer dereference occurs in
uvm_hmm_unregister_gpu+0x40insidenvidia_uvmduring UVM VA space teardown when a CUDA worker thread exits. The fault address is0x00000000000000a0(NULL + 0xa0).RAXis 0x0 at the point of the fault; the faulting instruction attempts to read from[rax+0xa0].The crashing thread (
cuda00001800007, PID 180310, UID 1000) was blocked insys_poll(userspaceORIG_RAX=7) when it received a signal and was killed, triggeringdo_exit→ UVM VA space cleanup → the crash.The driver version (595.58.03, built 2026-03-26 11:47) had been running without issue for approximately 85 minutes before the crash. The CUDA workload was active during that entire period.
Adding
options nvidia_uvm uvm_disable_hmm=1to modprobe configuration prevents the crash, confirming the fault is in the HMM (Heterogeneous Memory Management) code path ofnvidia_uvm.Crash dump (recovered from EFI pstore)
Decoding the faulting instruction: The
Code:field shows the faulting byte sequence (marked with<4c>) decodes asadd r12, QWORD PTR [rax+0xa0]— a read of 8 bytes fromrax+0xa0whererax=0. This is a dereference of a NULL pointer with an offset of 0xa0 into what should be a GPU struct.RDIis also 0x0, consistent with a NULL GPU pointer being passed.To Reproduce
Cannot reproduce on demand. The crash occurred once during normal system use while a CUDA workload was running (most likely background Steam shader pre-compilation via
fossilize/nv-fossilize, which runs as UID 1000 and uses CUDA). The CUDA worker thread was signaled while blocked insys_polland crashed during exit cleanup.The crash has not recurred since adding
options nvidia_uvm uvm_disable_hmm=1to/etc/modprobe.d/nvidia.conf, which is consistent with the fault being in the HMM path.Bug Incidence
Happened once (single occurrence recovered from EFI pstore). Workaround (
uvm_disable_hmm=1) applied since then.nvidia-bug-report.log.gz
Not available — the system was not running when the crash was analyzed; all crash data was recovered from EFI pstore after reboot.
More Info
uvm_disable_hmm=1workaround)CONFIG_RANDSTRUCT_PERFORMANCE=y(Pentoo default). This is unrelated to this crash but is noted for completeness (it affectsnvidia_drm/nvidia_modeset, notnvidia_uvm).[S]=CPU_OUT_OF_SPECtaint flag is present because the BIOS has disabledeist(Intel SpeedStep) on this system, which the kernel detects. It is not related to this crash.