Skip to content

fix: CUDA device PCI bus ID de-dupe OOMing (ignoring other 3 gpus entirely)#22533

Open
lucyknada wants to merge 1 commit intoggml-org:masterfrom
lucyknada:patch-2
Open

fix: CUDA device PCI bus ID de-dupe OOMing (ignoring other 3 gpus entirely)#22533
lucyknada wants to merge 1 commit intoggml-org:masterfrom
lucyknada:patch-2

Conversation

@lucyknada
Copy link
Copy Markdown

Overview

Basically the same as here: #18673 I believe (I only recently updated nvidia drivers, so might be the exact same); after updating llama.cpp and nvidia drivers it started detecting my 4x 3090 as 1x 3090 and OOMing, this fixed the issue - CUDA itself returns unique IDs already too.

Additional information

I tested this on a 4x 3090 system where CUDA reported unique PCI bus IDs via cudaDeviceGetPCIBusId(), but lcpp logged the same device ID for all GPUs and skipped the other three 3090's with a cli message and then OOM'd.

Requirements

@lucyknada lucyknada requested a review from a team as a code owner April 29, 2026 21:23
@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 29, 2026
char pci_bus_id[16] = {};
snprintf(pci_bus_id, sizeof(pci_bus_id), "%04x:%02x:%02x.0", prop.pciDomainID, prop.pciBusID, prop.pciDeviceID);
char pci_bus_id[32] = {};
CUDA_CHECK(cudaDeviceGetPCIBusId(pci_bus_id, sizeof(pci_bus_id), i));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail compilation on hip/musa. You need to add the macro in vendor headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants