Skip to content

Commit e95e299

Browse files
JohannesGaesslerArberSephirotheca
authored andcommitted
CUDA: require explicit opt-in for P2P access (ggml-org#21910)
1 parent 2e4a5be commit e95e299

2 files changed

Lines changed: 18 additions & 10 deletions

File tree

docs/build.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,12 @@ Use `GGML_CUDA_FORCE_CUBLAS_COMPUTE_16F` environment variable to force use FP16
281281

282282
The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted. In Windows this setting is available in the NVIDIA control panel as `System Memory Fallback`.
283283

284+
### Peer Access
285+
286+
The environment variable `GGML_CUDA_P2P` can be set to enable peer-to-peer access between multiple GPUs, allowing them to transfer data directly rather than to go through system memory.
287+
Requires driver support (usually restricted to workstation/datacenter GPUs).
288+
May cause crashes or corrupted outputs for some motherboards and BIOS settings (e.g. IOMMU).
289+
284290
### Performance Tuning
285291

286292
The following compilation options are also available to tweak performance:

ggml/src/ggml-cuda/ggml-cuda.cu

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -324,16 +324,18 @@ static ggml_cuda_device_info ggml_cuda_init() {
324324
// configure logging to stdout
325325
// CUBLAS_CHECK(cublasLoggerConfigure(1, 1, 0, nullptr));
326326

327-
for (int id = 0; id < info.device_count; ++id) {
328-
ggml_cuda_set_device(id);
329-
for (int id_other = 0; id_other < info.device_count; ++id_other) {
330-
if (id == id_other) {
331-
continue;
332-
}
333-
int can_access_peer;
334-
CUDA_CHECK(cudaDeviceCanAccessPeer(&can_access_peer, id, id_other));
335-
if (can_access_peer) {
336-
CUDA_CHECK(cudaDeviceEnablePeerAccess(id_other, 0));
327+
if (getenv("GGML_CUDA_P2P") != nullptr) {
328+
for (int id = 0; id < info.device_count; ++id) {
329+
ggml_cuda_set_device(id);
330+
for (int id_other = 0; id_other < info.device_count; ++id_other) {
331+
if (id == id_other) {
332+
continue;
333+
}
334+
int can_access_peer;
335+
CUDA_CHECK(cudaDeviceCanAccessPeer(&can_access_peer, id, id_other));
336+
if (can_access_peer) {
337+
CUDA_CHECK(cudaDeviceEnablePeerAccess(id_other, 0));
338+
}
337339
}
338340
}
339341
}

0 commit comments

Comments
 (0)