Windows CUDA: cuMemAddressReserve failure in VMM pool causes hard abort (GGML_CUDA_NO_VMM workaround)

Hey folks. I ran into a hard crash while running some heavy embedding workloads on Windows using the CUDA backend. It looks like it's tied to the VMM allocator.

**The Problem**
When running a large indexing job (about 32,000 chunks via `qmd`), the process dies with a CUDA out of memory error. 

Digging into the debug logs, the exact failure happens at `ggml-cuda.cu:97`. It aborts inside `ggml_cuda_pool_vmm::alloc` (around line 476) when calling:
`cuMemAddressReserve(&pool_addr, CUDA_POOL_VMM_MAX_SIZE, 0, 0, 0)`

**Why it's failing**
I'm on an RTX 3090 (24GB). In `ggml-cuda.cu`, `CUDA_POOL_VMM_MAX_SIZE` is hardcoded to reserve 32GB of virtual memory. Even with plenty of actual VRAM available, the virtual address space reservation fails. Instead of gracefully falling back to a non-VMM pool, the whole process hard-aborts.

**The Workaround**
I managed to bypass this locally by compiling `node-llama-cpp` from source with VMM disabled:
`GGML_CUDA_NO_VMM=ON`
With that flag, the exact same embedding job finishes perfectly and memory usage stays stable.

**The Request**
Would it be possible to add a runtime fallback here? If `cuMemAddressReserve` fails (which seems to happen on some Windows/WDDM setups), it would be great if it logged a warning and fell back to the standard allocator instead of crashing. That would make the prebuilt binaries a lot more stable for Windows users hitting this edge case.

**My Environment**
- OS: Windows 11 Pro N (10.0.22631)
- GPU: RTX 3090 24GB (Driver 591.44)
- CUDA: 13.1
- Node: v24.13.0
- node-llama-cpp: 3.17.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Windows CUDA: cuMemAddressReserve failure in VMM pool causes hard abort (GGML_CUDA_NO_VMM workaround) #580

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Windows CUDA: cuMemAddressReserve failure in VMM pool causes hard abort (GGML_CUDA_NO_VMM workaround) #580

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions