Bug: cuMemCreate hook reads uninitialized dev when   prop->location.type != CU_MEM_LOCATION_TYPE_DEVICE

[repro.c](https://github.com/user-attachments/files/27178572/repro.c)

# Bug: `cuMemCreate` hook reads uninitialized `dev` when `prop->location.type != CU_MEM_LOCATION_TYPE_DEVICE`

## Summary

In `src/cuda/memory.c`, the `cuMemCreate` hook calls `add_chunk_only(*handle, size, dev)` with `dev` left uninitialized whenever `prop->location.type` is not `CU_MEM_LOCATION_TYPE_DEVICE`. The garbage value flows down into `set_current_device_memory_limit`, which logs `Illegal device id: <random>` and writes out-of-bounds into `region_info.shared_region->limit[dev]`.

This blocks any container that uses CUDA Driver API VMM (`cuMemCreate` + `cuMemAddressReserve` + `cuMemMap`) with a non-DEVICE allocation location — which is the path `ggml-cuda` takes when staging a virtual address pool before binding it to a specific device. So basically `llama.cpp`, every `llama-server` build with VMM on, and the Lucebox / DFlash speculative-decoding stack all crash on a HAMi-managed pod.

## Steps to reproduce

Hardware: NVIDIA RTX 5090 Laptop GPU (sm_120 Blackwell consumer). The bug isn't hardware-specific — same pattern should hit on any GPU under HAMi as long as the workload uses `cuMemCreate` with a non-DEVICE location.

Stack: HAMi 2.5.1 (deployed as part of Olares 1.12.5), `nvidia/cuda:13.0.0-devel-ubuntu22.04`, `ggml-cuda` built with default `GGML_CUDA=ON` (i.e. with VMM enabled).

Pod logs at startup:

```
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24463 MiB):
  Device 0: NVIDIA GeForce RTX 5090 Laptop GPU, compute capability 12.0,
            VMM: yes, VRAM: 24463 MiB
[HAMI-core ERROR (pid:1 thread=124663410515968 multiprocess_memory_limit.c:846)]:
Illegal device id: -644371744
```

The `-644371744` value changes between runs — whatever was on the stack at the previous frame. I confirmed it's not a build, env or driver issue: same pattern across image rebuilds, `LD_LIBRARY_PATH` permutations, `CUDA_DEVICE_MEMORY_LIMIT_0` settings, and on two different driver versions.

Minimal C reproducer:

```c
#include <stdio.h>
#include <cuda.h>

int main(void) {
    cuInit(0);
    CUdevice device; cuDeviceGet(&device, 0);
    CUcontext ctx;   cuCtxCreate(&ctx, 0, device);

    CUmemAllocationProp prop = {0};
    prop.type             = CU_MEM_ALLOCATION_TYPE_PINNED;
    prop.location.type    = CU_MEM_LOCATION_TYPE_HOST_NUMA;  // not DEVICE
    prop.location.id      = 0;

    size_t granularity;
    cuMemGetAllocationGranularity(&granularity, &prop,
                                  CU_MEM_ALLOC_GRANULARITY_MINIMUM);
    size_t size = ((1<<20) + granularity - 1) / granularity * granularity;

    CUmemGenericAllocationHandle handle;
    CUresult res = cuMemCreate(&handle, size, &prop, 0);
    printf("cuMemCreate returned %d\n", res);
    cuCtxDestroy(ctx);
    return 0;
}
```

`nvcc -lcuda repro.c -o repro` and run it inside a HAMi-managed pod — `[HAMI-core ERROR ...]: Illegal device id: <random>` shows up immediately.

## Root cause

`src/cuda/memory.c`, `cuMemCreate` hook around line 1009:

```c
CUresult cuMemCreate(CUmemGenericAllocationHandle* handle, size_t size,
                     const CUmemAllocationProp* prop, unsigned long long flags) {
    LOG_INFO("cuMemCreate:%lld:%d", size, prop->location.id);
    ENSURE_RUNNING();
    CUdevice dev;                                          // (a) not initialised
    int do_oom_check = (prop->location.type == CU_MEM_LOCATION_TYPE_DEVICE);
    if (do_oom_check && cuCtxGetDevice(&dev) != CUDA_SUCCESS) {
        dev = prop->location.id;
    }
    if (do_oom_check && oom_check(dev, size)) {
        return CUDA_ERROR_OUT_OF_MEMORY;
    }
    CUresult res = CUDA_OVERRIDE_CALL(cuda_library_entry,
        cuMemCreate, handle, size, prop, flags);
    if (res == CUDA_SUCCESS) {
        add_chunk_only(*handle, size, dev);                // (b) uses (a) uninitialised
    }
    return res;
}
```

When `prop->location.type != CU_MEM_LOCATION_TYPE_DEVICE`, the entire `if (do_oom_check && ...)` block is skipped, so `dev` stays whatever was on the stack. The unconditional `add_chunk_only(*handle, size, dev)` at the bottom forwards that garbage to `set_current_device_memory_limit`:

```c
// src/multiprocess/multiprocess_memory_limit.c
int set_current_device_memory_limit(const int dev, size_t newlimit) {
    ensure_initialized();
    if (dev < 0 || dev >= CUDA_DEVICE_MAX_COUNT) {
        LOG_ERROR("Illegal device id: %d", dev);
    }
    LOG_INFO("dev %d new limit set to %ld",dev,newlimit);
    region_info.shared_region->limit[dev]=newlimit;        // OOB write
    return 0;
}
```

Two issues actually:

1. `cuMemCreate` reads uninitialised `dev` (the visible symptom).
2. `set_current_device_memory_limit` logs the error and then *proceeds* to `region_info.shared_region->limit[dev] = newlimit` — that's an out-of-bounds write into shared memory, latent corruption for any other process attached to the same `region_info`.

The real-world path that triggers (1): `ggml-cuda` reserves a virtual address pool via `cuMemCreate` with `CU_MEM_LOCATION_TYPE_HOST_NUMA` for staging before binding to a specific device. Standard CUDA Driver API VMM usage, see [NVIDIA's intro post](https://developer.nvidia.com/blog/introducing-low-level-gpu-virtual-memory-management/).

## Suggested fix

Two-line minimal fix at the `cuMemCreate` site — don't track non-DEVICE allocations against per-device memory limits since they don't consume device VRAM:

```c
CUdevice dev = prop->location.id;                          // initialised up-front
int do_oom_check = (prop->location.type == CU_MEM_LOCATION_TYPE_DEVICE);
if (do_oom_check && cuCtxGetDevice(&dev) != CUDA_SUCCESS) {
    dev = prop->location.id;
}
if (do_oom_check && oom_check(dev, size)) {
    return CUDA_ERROR_OUT_OF_MEMORY;
}
CUresult res = CUDA_OVERRIDE_CALL(cuda_library_entry,
    cuMemCreate, handle, size, prop, flags);
if (res == CUDA_SUCCESS && do_oom_check) {                 // ← skip non-DEVICE
    add_chunk_only(*handle, size, dev);
}
return res;
```

While we're at it, `set_current_device_memory_limit` should bail out instead of writing out of bounds:

```c
int set_current_device_memory_limit(const int dev, size_t newlimit) {
    ensure_initialized();
    if (dev < 0 || dev >= CUDA_DEVICE_MAX_COUNT) {
        LOG_ERROR("Illegal device id: %d", dev);
        return -1;                                         // ← was missing
    }
    LOG_INFO("dev %d new limit set to %ld", dev, newlimit);
    region_info.shared_region->limit[dev] = newlimit;
    return 0;
}
```

I'm happy to send a PR if that's the direction you'd take, or to test a different approach if you'd rather track non-DEVICE allocs in a separate accounting path.

## Why this matters

VMM is on by default in `ggml-cuda` since [llama.cpp PR #11446](https://github.com/ggml-org/llama.cpp/pull/11446) (Jan 2026), which means every recent `llama-server` build hits this bug on a HAMi-managed pod. The current workaround is to rebuild with `-DGGML_CUDA_NO_VMM=ON`, which works but loses the VMM benefits (notably ~3.5× KV cache savings via TurboQuant TQ3_0 paths).

A fix in HAMi-core would unblock upstream images (`ggml-org/llama.cpp:server-cuda13-*`, `vllm/vllm-openai` recent builds, the Lucebox `lucebox-hub` consumer-tuned fork) without users having to rebuild anything.

## Environment

- HAMi-core: `master`, commit `ec8979d` (the submodule pin in HAMi `master`)
- HAMi: 2.5.1 (deployed via the `hami-2.5.1` Helm chart in `kube-system`)
- Olares: 1.12.5
- GPU: NVIDIA RTX 5090 Laptop GPU, compute capability 12.0 (sm_120 Blackwell consumer)
- Driver: 580.65.06, CUDA 13.0.48
- Runtime: containerd via k3s 1.31
- Reproducer images on Docker Hub:
  - `aamsellem/lucebox-qwen36-blackwell:1.0.0` — VMM enabled, triggers the bug
  - `aamsellem/lucebox-qwen36-blackwell:1.1.0` — same code, `-DGGML_CUDA_NO_VMM=ON`, no bug

Happy to send extra logs, the full reproducer pod manifest, or a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: cuMemCreate hook reads uninitialized dev when prop->location.type != CU_MEM_LOCATION_TYPE_DEVICE #187

Bug: `cuMemCreate` hook reads uninitialized `dev` when `prop->location.type != CU_MEM_LOCATION_TYPE_DEVICE`

Summary

Steps to reproduce

Root cause

Suggested fix

Why this matters

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: cuMemCreate hook reads uninitialized dev when prop->location.type != CU_MEM_LOCATION_TYPE_DEVICE #187

Description

Bug: cuMemCreate hook reads uninitialized dev when prop->location.type != CU_MEM_LOCATION_TYPE_DEVICE

Summary

Steps to reproduce

Root cause

Suggested fix

Why this matters

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug: `cuMemCreate` hook reads uninitialized `dev` when `prop->location.type != CU_MEM_LOCATION_TYPE_DEVICE`