Skip to content

gpu: document required system-probe.yaml flag and gpu.enabled for non-containerized Linux hosts#23853

Merged
gouri-yerra merged 2 commits into
masterfrom
agent-16266-gpu-readme-missing-system-probe-flag
May 27, 2026
Merged

gpu: document required system-probe.yaml flag and gpu.enabled for non-containerized Linux hosts#23853
gouri-yerra merged 2 commits into
masterfrom
agent-16266-gpu-readme-missing-system-probe-flag

Conversation

@gouri-yerra

Copy link
Copy Markdown
Contributor

Summary

  • Fixes a documentation gap in the GPU integration README where the Linux (non-containerized) Host setup section was missing gpu.enabled: true from the datadog.yaml block and did not clearly communicate that system-probe.yaml must also be configured
  • Restructures the Host configuration section into numbered steps to make it unambiguous that both datadog.yaml and system-probe.yaml require changes
  • Adds an explicit note that gpu_monitoring.enabled: true in system-probe.yaml is required even for non-containerized hosts

Motivation

Tracked in AGENT-16266. A customer (Norfolk Southern Corp) followed the official setup docs and the shipped datadog.yaml.example/system-probe.yaml.example templates, neither of which document the system-probe.yaml flag. The result was system-probe running healthy (eBPF loaded, T4 discovered, uprobes attached) while the core-agent GPU check never instantiated (Last Check from Core Agent: 1969-12-31). The integrations-core README already covered both files but the steps were easy to miss — this PR makes them explicit and sequential.

Test plan

  • Verify the rendered README displays the three-step Host configuration correctly
  • Confirm gpu.enabled: true now appears in the datadog.yaml code block
  • Confirm gpu_monitoring.enabled: true and the system-probe.yaml step are clearly presented alongside the datadog.yaml step

🤖 Generated with Claude Code

….enabled for non-containerized hosts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@datadog-prod-us1-6

This comment has been minimized.

@gouri-yerra gouri-yerra added the qa/skip-qa Automatically skip this PR for the next QA label May 27, 2026
@buraizu buraizu self-assigned this May 27, 2026

@buraizu buraizu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, just requesting a few minor edits for style and consistency. Let me know if you have any questions!

Comment thread gpu/README.md Outdated
Comment thread gpu/README.md Outdated
Comment thread gpu/README.md Outdated
Comment thread gpu/README.md Outdated
Comment thread gpu/README.md Outdated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gouri-yerra

Copy link
Copy Markdown
Contributor Author
  1. Removed bold from both → both
  2. Step 1: → 1. (numbered list)
  3. Step 2: Add the following parameters → 2. Add the following parameter (singular, since it's just gpu_monitoring.enabled)
  4. Step 3: → 3.
  5. The check is also enabled... However, it can also → The check is enabled... The check can also

@gouri-yerra gouri-yerra requested a review from buraizu May 27, 2026 18:43
@dd-octo-sts

dd-octo-sts Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@gouri-yerra gouri-yerra added this pull request to the merge queue May 27, 2026
Merged via the queue into master with commit f2ec2d2 May 27, 2026
56 of 59 checks passed
@gouri-yerra gouri-yerra deleted the agent-16266-gpu-readme-missing-system-probe-flag branch May 27, 2026 19:59
@dd-octo-sts dd-octo-sts Bot added this to the 7.81.0 milestone May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants