Skip to content

Commit f2ec2d2

Browse files
gouri-yerraclaude
andauthored
gpu: document required system-probe.yaml flag and gpu.enabled for non-containerized Linux hosts (DataDog#23853)
* Update GPU README to document required system-probe.yaml flag and gpu.enabled for non-containerized hosts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Address review feedback: style and consistency fixes in GPU README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent faacce5 commit f2ec2d2

1 file changed

Lines changed: 14 additions & 3 deletions

File tree

gpu/README.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,21 +33,32 @@ The check also uses eBPF probes to assign GPU usage and performance metrics to p
3333

3434
#### Host
3535

36-
The agent needs to be configured to enable GPU-related features. Add the following parameters to the `/etc/datadog-agent/datadog.yaml` configuration file and then restart the Agent:
36+
GPU monitoring requires configuration in both `/etc/datadog-agent/datadog.yaml` and `/etc/datadog-agent/system-probe.yaml`. Configuring only one of these files results in incomplete metrics collection.
37+
38+
1. Add the following parameters to `/etc/datadog-agent/datadog.yaml`:
3739

3840
```yaml
41+
gpu:
42+
enabled: true
3943
collect_gpu_tags: true
4044
enable_nvml_detection: true
4145
```
4246
43-
Enabling the `gpu` integration requires `system-probe` to have the configuration option enabled for collecting per-process metrics. Inside the `/etc/datadog-agent/system-probe.yaml` configuration file, the following parameters must be set:
47+
2. Add the following parameter to `/etc/datadog-agent/system-probe.yaml`. This flag loads the eBPF module responsible for per-process GPU metrics and is required even for non-containerized hosts:
4448

4549
```yaml
4650
gpu_monitoring:
4751
enabled: true
4852
```
4953

50-
The check in the Agent configuration file is enabled by default whenever NVIDIA GPUs and their drivers are detected in the system, as long as the `enable_nvml_detection` parameter is set to `true`. However, it can also be configured manually following these steps:
54+
3. Restart both the Agent and system-probe:
55+
56+
```shell
57+
sudo systemctl restart datadog-agent
58+
sudo systemctl restart datadog-agent-sysprobe
59+
```
60+
61+
The check in the Agent configuration file is enabled by default whenever NVIDIA GPUs and their drivers are detected in the system, as long as the `enable_nvml_detection` parameter is set to `true`. The check can also be configured manually following these steps:
5162

5263
1. Edit the `gpu.d/conf.yaml` file, in the `conf.d/` folder at the root of your
5364
Agent's configuration directory, to start collecting your GPU performance data.

0 commit comments

Comments
 (0)