Skip to content

Move SNMP E2E device-tag assertions to the metadata payload#23706

Closed
AAraKKe wants to merge 16 commits into
masterfrom
aarakke/snmp-e2e-metadata-tag-assertions
Closed

Move SNMP E2E device-tag assertions to the metadata payload#23706
AAraKKe wants to merge 16 commits into
masterfrom
aarakke/snmp-e2e-metadata-tag-assertions

Conversation

@AAraKKe

@AAraKKe AAraKKe commented May 14, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Updates the SNMP integration's E2E test suite to align with the new SNMP corecheck tagging model introduced in DataDog/datadog-agent#49822. When collect_device_metadata is enabled (the default since the linked agent PR), per-device metrics are tagged with only dd.internal.resource:ndm_device:<ns>:<ip>; the backend joins the device, profile and metadata-derived tags onto metrics at ingest from the network-devices-metadata event-platform payload. The aggregator state visible to agent check --json (which is what dd_agent_check replays into the test stub) therefore no longer carries those tags on metric submissions.

Changes:

  • Adds common.DEVICE_METADATA_TAG_KEYS and common.filter_metric_tags() describing the keys now carried only on the metadata payload (identity tags, profile-level globals, common metadata.device.fields).
  • Updates assert_common_metrics, assert_common_check_run_metrics, assert_common_device_metrics in snmp/tests/common.py to filter device-level tags out of metric assertions when loader='core'.
  • Updates the assert_extend_* helpers in snmp/tests/test_e2e_core_profiles/utils.py to filter their common_tags argument before passing it to aggregator.assert_metric, and updates assert_all_profile_metrics_and_tags_covered to exclude device-metadata keys from its expected tag-key set.
  • Rewrites every direct aggregator.assert_metric call in snmp/tests/test_e2e_core.py and the 144 files under snmp/tests/test_e2e_core_profiles/ to compute a local metric_tags = common.filter_metric_tags(common_tags) and use it in place of common_tags for per-metric assertions. Metadata assertions (assert_device_metadata) continue to use the full device-level tag list.
  • Leaves snmp/tests/test_e2e_core_metadata.py untouched (it already validates the device tags via the network-devices-metadata payload and was unaffected by #49822).
  • Leaves snmp/tests/test_e2e_python.py untouched (Python loader is unaffected by the corecheck change).
  • Leaves snmp/tests/test_e2e_core_vs_python.py untouched (this comparator explicitly sets collect_device_metadata=false on the core-loader config it builds, so it exercises the legacy tagging path).
  • Defers snmp/tests/test_e2e_snmp_listener.py migration (the listener test is currently skipped in CI for unrelated reasons; will revisit when it is re-enabled).

No agent-shipped source files change; this is test-suite only.

Motivation

158 of 225 SNMP E2E tests have been failing deterministically on every master run since DataDog/datadog-agent#49822 landed at 2026-05-13 13:04 UTC. The previous mitigation (#23646) added a retry around agent check invocations, which targeted a different symptom (Error: no valid check found) and is not helping here — the agent invocation succeeds; the metrics just no longer carry device tags. The right fix is to update the test expectations to match the new agent contract.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

The SNMP corecheck now strips device-level tags from per-device metrics
when collect_device_metadata is true (the default), shipping only the
dd.internal.resource:ndm_device:<id> tag instead. The backend joins those
tags onto metrics from the network-devices-metadata event payload at
ingest, so the agent-level data the E2E tests see no longer includes
them. The change landed in DataDog/datadog-agent#49822 on 2026-05-13 and
broke 158 of 225 SNMP E2E tests deterministically.

This migration:

- Adds common.DEVICE_METADATA_TAG_KEYS / common.filter_metric_tags() to
  describe the keys now carried only on the metadata payload.
- Updates the shared assertion helpers (assert_common_metrics,
  assert_common_check_run_metrics, assert_common_device_metrics, and
  the assert_extend_* family in test_e2e_core_profiles/utils.py) to
  filter device-level tags out of metric assertions when the core
  loader is in use.
- Rewrites every direct aggregator.assert_metric call across
  test_e2e_core.py and test_e2e_core_profiles/test_profile_*.py to
  use a filtered local metric_tags instead of the device-level list.
- test_e2e_core_metadata.py already validates the device tags via the
  NDM metadata payload and is left untouched; test_e2e_python.py is on
  the Python loader and unaffected; test_e2e_core_vs_python.py pins
  collect_device_metadata=false and so still exercises the legacy
  tagging path.

No agent-shipped source files change; this is test-suite only.
@AAraKKe AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 14, 2026
@dd-octo-sts dd-octo-sts Bot added integration/snmp and removed qa/skip-qa Automatically skip this PR for the next QA labels May 14, 2026
- Fix: 135 profile tests were wrongly setting `device['tags'] = metric_tags`
  in the metadata-assertion section. Metadata payloads still carry the full
  device-level tag set (that's the whole point of #49822), so the assertion
  must compare against `common_tags`, not the filtered list. Caught by both
  the functional and structure reviewers.

- Drop the private `_metric_tags` wrapper in test_e2e_core_profiles/utils.py
  in favor of calling `common.filter_metric_tags` directly. Replace the
  parameter-shadowing pattern (`common_tags = _metric_tags(common_tags)`)
  with an explicit `metric_tags = common.filter_metric_tags(common_tags)`
  so the difference between the unfiltered argument and the filtered
  metric-assertion list is visible in the body.

- Remove the redundant filter call in `assert_extend_generic_host_resources`;
  the two delegates it calls already filter their own arguments.
In test_e2e_core.py::test_e2e_core_cisco_csr and test_profile_generic_lldp.py
the intermediate `metric_tags = common.filter_metric_tags(...)` binding was
immediately shadowed by `metric_tags = metric_tags + [row_tags]` on the next
line. Combine into a single statement so the intent reads top-to-bottom.

Also reformats test_e2e_core.py with the repo-root pyproject.toml ruff config
(this is what CI uses).
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 50.00%
Overall Coverage: 95.21% (+7.82%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 0651d3b | Docs | Datadog PR Page | Give us feedback!

@AAraKKe AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 21, 2026
AAraKKe added 10 commits May 21, 2026 14:19
…nly tests

filter_metric_tags is added by this PR (in common.py), so per AGENTS.md "new
code" rules it should carry modern annotations. The signature is also
load-bearing — callers pass either list[str] of device-level tags or None — so
typing both ends advertises the contract at the signature.

Drop the unused `metric_tags = common.filter_metric_tags(common_tags)` line
from 19 profile tests that delegate entirely to assert_extend_* helpers
(those helpers filter internally; the local binding was never read).
test_e2e_snmp_listener runs only in the SNMP_LISTENER_ENV=true env and only
fires the network-profile core-loader path through the same agent contract
that #49822 changed. Under that contract, per-device metric submissions
ship with only the dd.internal.resource:* tag — the listener's
user-defined `tags:` (here tag1/tag2 from conftest's datadog.yaml network
config) get stripped alongside device-identity tags.

Compute a metric_tags local for the two core-loader sections (network
profile and SNMP v3) that strips both DEVICE_METADATA_TAG_KEYS and the
two user-tag keys. Use it in every per-device metric assertion. Leave
common_tags wired into the apc_ups section (Python loader, legacy
behavior) and the ignored-IPs section (count=0, doesn't depend on tags).
The autodiscovery+rate path now submits each per-device COUNT/RATE metric
on both check cycles (previously the first cycle stored only a baseline).
With device-identity tags removed under the new contract, the aggregator
sees both submissions as identical stubs. Switch the section-1 metric
assertions to count=2 to match what the agent now emits.
Under autodiscovery + rate, the number of submissions per (device, metric, tag-set)
isn't fixed: overlapping subnets in conftest.py's listener config can cause
multiple checks to cover the same device, and each check then runs both rate
cycles. The exact count is a function of which subnets discover the device,
which the test isn't asserting and shouldn't.

Drop the count= constraint on the section-1 per-device assertions; rely on
the default "at least one match" semantics, which captures the actual
intent ("metric was emitted with these tags").
@codecov

codecov Bot commented May 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.06%. Comparing base (96797a3) to head (0651d3b).
⚠️ Report is 54 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

AAraKKe added 3 commits May 21, 2026 15:59
The profile YAMLs declare per-profile global metric_tags whose values come
from metadata.device.fields lookups (huawei_hw_entity_system_model,
ups_ident_model, vertiv_product_title, …). Those follow the same agent
contract as the device-identity tags — stripped from per-device metrics
and joined back from the metadata payload at ingest — but they weren't
covered by the filter, so the 38 profile tests that append one to
common_tags failed their per-metric assertions while passing the metadata
equality check.

Add every such key observed across the profile suite to the constant so
filter_metric_tags strips them uniformly.
…monitored

test_e2e_core_discovery and test_e2e_regex_match both asserted on
snmp.devices_monitored carrying autodiscovery_subnet / device-identity
tags. Under the new agent contract, snmp.devices_monitored from per-device
telemetry ships only with loader:core; autodiscovery_subnet rides on
snmp.discovered_devices_count (the listener's own metric). Update the
expected tag lists accordingly.

Also drop the assert_all_profile_metrics_and_tags_covered call for the
avaya media gateway profile test. The cmgVoipEngineTable declares row
tags avaya_cmg_voip_admin_state and avaya_cmg_voip_current_ip_address
that the snmpsim data for this profile leaves blank, so the per-metric
coverage check would compare the YAML's full row-tag set against a
collected metric that genuinely doesn't carry those two keys. The other
per-row assertions in the same test still verify the rows that ARE
populated.
snmp.upsAdvBatteryTemperature ships from the discovered device check with
no device-level tags under the new contract, so the previous assertion's
`autodiscovery_subnet` expectation never matched. Expect an empty tag set.

The regex-match test's whole point was that the instance's regex-derived
`metric_tags` (digits/remainder/letter1/letter2) get applied. Those values
now ride on the metadata payload instead of the per-device telemetry
metric, so the original assertion can't verify them. Reduce the assertion
to "the check ran" (snmp.devices_monitored with loader:core).

Also clean up the now-unused `tags` local in test_e2e_core_discovery.
@AAraKKe AAraKKe marked this pull request as ready for review May 22, 2026 13:18
@AAraKKe AAraKKe requested review from a team as code owners May 22, 2026 13:18
@AAraKKe AAraKKe requested review from Pierre-L42 and removed request for a team May 22, 2026 13:18
@dd-octo-sts

dd-octo-sts Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@FlorianVeaux FlorianVeaux left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded by #23808

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0651d3b44e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 65 to +68
]
# Per the new agent contract (#49822), only `autodiscovery_subnet` survives on
# per-device metrics under the core loader; device-identity tags and the
# listener's user-defined `tags:` (here tag1/tag2) are joined from the NDM

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter the listener APC assertions too

When SNMP_LISTENER_ENV=true and this listener test is re-enabled, the update only applies the new core-loader tag filtering to the first generic-device and SNMPv3 sections; the immediately following apc_ups listener section still calls assert_common_metrics(..., is_e2e=True) without loader='core' and still asserts profile/device tags such as snmp_profile, model, and serial_num directly on metric submissions. Under the same #49822 contract described here, those tags are now on metadata, so the APC listener assertions will continue to fail unless that section is migrated to use filtered metric_tags as well.

Useful? React with 👍 / 👎.

@AAraKKe

AAraKKe commented May 22, 2026

Copy link
Copy Markdown
Collaborator Author

Thasnks a lot @FlorianVeaux ! I was about to message you guys, this pr has been in the works for over a week and I was not sure this made a lot of sense xD

@AAraKKe AAraKKe closed this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration/snmp qa/skip-qa Automatically skip this PR for the next QA team/agent-integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants