From cbbe3b3c7c7bebbcb52cf33e1eaef4887ebc285f Mon Sep 17 00:00:00 2001 From: Gouri Yerra Date: Wed, 27 May 2026 10:44:53 -0600 Subject: [PATCH 1/2] Update GPU README to document required system-probe.yaml flag and gpu.enabled for non-containerized hosts Co-Authored-By: Claude Sonnet 4.6 --- gpu/README.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/gpu/README.md b/gpu/README.md index ca83e18b5aaa8..0ee917873a421 100644 --- a/gpu/README.md +++ b/gpu/README.md @@ -33,21 +33,32 @@ The check also uses eBPF probes to assign GPU usage and performance metrics to p #### Host -The agent needs to be configured to enable GPU-related features. Add the following parameters to the `/etc/datadog-agent/datadog.yaml` configuration file and then restart the Agent: +GPU monitoring requires configuration in **both** `/etc/datadog-agent/datadog.yaml` and `/etc/datadog-agent/system-probe.yaml`. Configuring only one of these files results in incomplete metrics collection. + +**Step 1**: Add the following parameters to `/etc/datadog-agent/datadog.yaml`: ```yaml +gpu: + enabled: true collect_gpu_tags: true enable_nvml_detection: true ``` -Enabling the `gpu` integration requires `system-probe` to have the configuration option enabled for collecting per-process metrics. Inside the `/etc/datadog-agent/system-probe.yaml` configuration file, the following parameters must be set: +**Step 2**: Add the following parameters to `/etc/datadog-agent/system-probe.yaml`. This flag loads the eBPF module responsible for per-process GPU metrics and is required even for non-containerized hosts: ```yaml gpu_monitoring: enabled: true ``` -The check in the Agent configuration file is enabled by default whenever NVIDIA GPUs and their drivers are detected in the system, as long as the `enable_nvml_detection` parameter is set to `true`. However, it can also be configured manually following these steps: +**Step 3**: Restart both the Agent and system-probe: + +```shell +sudo systemctl restart datadog-agent +sudo systemctl restart datadog-agent-sysprobe +``` + +The check in the Agent configuration file is also enabled by default whenever NVIDIA GPUs and their drivers are detected in the system, as long as the `enable_nvml_detection` parameter is set to `true`. However, it can also be configured manually following these steps: 1. Edit the `gpu.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory, to start collecting your GPU performance data. From 8947984301831b82635b2dfdf96b775ffb302bc6 Mon Sep 17 00:00:00 2001 From: Gouri Yerra Date: Wed, 27 May 2026 12:41:15 -0600 Subject: [PATCH 2/2] Address review feedback: style and consistency fixes in GPU README Co-Authored-By: Claude Sonnet 4.6 --- gpu/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gpu/README.md b/gpu/README.md index 0ee917873a421..31650d2207bcb 100644 --- a/gpu/README.md +++ b/gpu/README.md @@ -33,9 +33,9 @@ The check also uses eBPF probes to assign GPU usage and performance metrics to p #### Host -GPU monitoring requires configuration in **both** `/etc/datadog-agent/datadog.yaml` and `/etc/datadog-agent/system-probe.yaml`. Configuring only one of these files results in incomplete metrics collection. +GPU monitoring requires configuration in both `/etc/datadog-agent/datadog.yaml` and `/etc/datadog-agent/system-probe.yaml`. Configuring only one of these files results in incomplete metrics collection. -**Step 1**: Add the following parameters to `/etc/datadog-agent/datadog.yaml`: +1. Add the following parameters to `/etc/datadog-agent/datadog.yaml`: ```yaml gpu: @@ -44,21 +44,21 @@ collect_gpu_tags: true enable_nvml_detection: true ``` -**Step 2**: Add the following parameters to `/etc/datadog-agent/system-probe.yaml`. This flag loads the eBPF module responsible for per-process GPU metrics and is required even for non-containerized hosts: +2. Add the following parameter to `/etc/datadog-agent/system-probe.yaml`. This flag loads the eBPF module responsible for per-process GPU metrics and is required even for non-containerized hosts: ```yaml gpu_monitoring: enabled: true ``` -**Step 3**: Restart both the Agent and system-probe: +3. Restart both the Agent and system-probe: ```shell sudo systemctl restart datadog-agent sudo systemctl restart datadog-agent-sysprobe ``` -The check in the Agent configuration file is also enabled by default whenever NVIDIA GPUs and their drivers are detected in the system, as long as the `enable_nvml_detection` parameter is set to `true`. However, it can also be configured manually following these steps: +The check in the Agent configuration file is enabled by default whenever NVIDIA GPUs and their drivers are detected in the system, as long as the `enable_nvml_detection` parameter is set to `true`. The check can also be configured manually following these steps: 1. Edit the `gpu.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory, to start collecting your GPU performance data.