Move SNMP E2E device-tag assertions to the metadata payload#23706
Move SNMP E2E device-tag assertions to the metadata payload#23706AAraKKe wants to merge 16 commits into
Conversation
The SNMP corecheck now strips device-level tags from per-device metrics when collect_device_metadata is true (the default), shipping only the dd.internal.resource:ndm_device:<id> tag instead. The backend joins those tags onto metrics from the network-devices-metadata event payload at ingest, so the agent-level data the E2E tests see no longer includes them. The change landed in DataDog/datadog-agent#49822 on 2026-05-13 and broke 158 of 225 SNMP E2E tests deterministically. This migration: - Adds common.DEVICE_METADATA_TAG_KEYS / common.filter_metric_tags() to describe the keys now carried only on the metadata payload. - Updates the shared assertion helpers (assert_common_metrics, assert_common_check_run_metrics, assert_common_device_metrics, and the assert_extend_* family in test_e2e_core_profiles/utils.py) to filter device-level tags out of metric assertions when the core loader is in use. - Rewrites every direct aggregator.assert_metric call across test_e2e_core.py and test_e2e_core_profiles/test_profile_*.py to use a filtered local metric_tags instead of the device-level list. - test_e2e_core_metadata.py already validates the device tags via the NDM metadata payload and is left untouched; test_e2e_python.py is on the Python loader and unaffected; test_e2e_core_vs_python.py pins collect_device_metadata=false and so still exercises the legacy tagging path. No agent-shipped source files change; this is test-suite only.
- Fix: 135 profile tests were wrongly setting `device['tags'] = metric_tags` in the metadata-assertion section. Metadata payloads still carry the full device-level tag set (that's the whole point of #49822), so the assertion must compare against `common_tags`, not the filtered list. Caught by both the functional and structure reviewers. - Drop the private `_metric_tags` wrapper in test_e2e_core_profiles/utils.py in favor of calling `common.filter_metric_tags` directly. Replace the parameter-shadowing pattern (`common_tags = _metric_tags(common_tags)`) with an explicit `metric_tags = common.filter_metric_tags(common_tags)` so the difference between the unfiltered argument and the filtered metric-assertion list is visible in the body. - Remove the redundant filter call in `assert_extend_generic_host_resources`; the two delegates it calls already filter their own arguments.
In test_e2e_core.py::test_e2e_core_cisco_csr and test_profile_generic_lldp.py the intermediate `metric_tags = common.filter_metric_tags(...)` binding was immediately shadowed by `metric_tags = metric_tags + [row_tags]` on the next line. Combine into a single statement so the intent reads top-to-bottom. Also reformats test_e2e_core.py with the repo-root pyproject.toml ruff config (this is what CI uses).
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 0651d3b | Docs | Datadog PR Page | Give us feedback! |
…nly tests filter_metric_tags is added by this PR (in common.py), so per AGENTS.md "new code" rules it should carry modern annotations. The signature is also load-bearing — callers pass either list[str] of device-level tags or None — so typing both ends advertises the contract at the signature. Drop the unused `metric_tags = common.filter_metric_tags(common_tags)` line from 19 profile tests that delegate entirely to assert_extend_* helpers (those helpers filter internally; the local binding was never read).
test_e2e_snmp_listener runs only in the SNMP_LISTENER_ENV=true env and only fires the network-profile core-loader path through the same agent contract that #49822 changed. Under that contract, per-device metric submissions ship with only the dd.internal.resource:* tag — the listener's user-defined `tags:` (here tag1/tag2 from conftest's datadog.yaml network config) get stripped alongside device-identity tags. Compute a metric_tags local for the two core-loader sections (network profile and SNMP v3) that strips both DEVICE_METADATA_TAG_KEYS and the two user-tag keys. Use it in every per-device metric assertion. Leave common_tags wired into the apc_ups section (Python loader, legacy behavior) and the ignored-IPs section (count=0, doesn't depend on tags).
The autodiscovery+rate path now submits each per-device COUNT/RATE metric on both check cycles (previously the first cycle stored only a baseline). With device-identity tags removed under the new contract, the aggregator sees both submissions as identical stubs. Switch the section-1 metric assertions to count=2 to match what the agent now emits.
This reverts commit 1870eb6.
This reverts commit 8b3620d.
This reverts commit 0008c38.
This reverts commit a14992a.
Under autodiscovery + rate, the number of submissions per (device, metric, tag-set)
isn't fixed: overlapping subnets in conftest.py's listener config can cause
multiple checks to cover the same device, and each check then runs both rate
cycles. The exact count is a function of which subnets discover the device,
which the test isn't asserting and shouldn't.
Drop the count= constraint on the section-1 per-device assertions; rely on
the default "at least one match" semantics, which captures the actual
intent ("metric was emitted with these tags").
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
The profile YAMLs declare per-profile global metric_tags whose values come from metadata.device.fields lookups (huawei_hw_entity_system_model, ups_ident_model, vertiv_product_title, …). Those follow the same agent contract as the device-identity tags — stripped from per-device metrics and joined back from the metadata payload at ingest — but they weren't covered by the filter, so the 38 profile tests that append one to common_tags failed their per-metric assertions while passing the metadata equality check. Add every such key observed across the profile suite to the constant so filter_metric_tags strips them uniformly.
…monitored test_e2e_core_discovery and test_e2e_regex_match both asserted on snmp.devices_monitored carrying autodiscovery_subnet / device-identity tags. Under the new agent contract, snmp.devices_monitored from per-device telemetry ships only with loader:core; autodiscovery_subnet rides on snmp.discovered_devices_count (the listener's own metric). Update the expected tag lists accordingly. Also drop the assert_all_profile_metrics_and_tags_covered call for the avaya media gateway profile test. The cmgVoipEngineTable declares row tags avaya_cmg_voip_admin_state and avaya_cmg_voip_current_ip_address that the snmpsim data for this profile leaves blank, so the per-metric coverage check would compare the YAML's full row-tag set against a collected metric that genuinely doesn't carry those two keys. The other per-row assertions in the same test still verify the rows that ARE populated.
snmp.upsAdvBatteryTemperature ships from the discovered device check with no device-level tags under the new contract, so the previous assertion's `autodiscovery_subnet` expectation never matched. Expect an empty tag set. The regex-match test's whole point was that the instance's regex-derived `metric_tags` (digits/remainder/letter1/letter2) get applied. Those values now ride on the metadata payload instead of the per-device telemetry metric, so the original assertion can't verify them. Reduce the assertion to "the check ran" (snmp.devices_monitored with loader:core). Also clean up the now-unused `tags` local in test_e2e_core_discovery.
Validation ReportAll 21 validations passed. Show details
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0651d3b44e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ] | ||
| # Per the new agent contract (#49822), only `autodiscovery_subnet` survives on | ||
| # per-device metrics under the core loader; device-identity tags and the | ||
| # listener's user-defined `tags:` (here tag1/tag2) are joined from the NDM |
There was a problem hiding this comment.
Filter the listener APC assertions too
When SNMP_LISTENER_ENV=true and this listener test is re-enabled, the update only applies the new core-loader tag filtering to the first generic-device and SNMPv3 sections; the immediately following apc_ups listener section still calls assert_common_metrics(..., is_e2e=True) without loader='core' and still asserts profile/device tags such as snmp_profile, model, and serial_num directly on metric submissions. Under the same #49822 contract described here, those tags are now on metadata, so the APC listener assertions will continue to fail unless that section is migrated to use filtered metric_tags as well.
Useful? React with 👍 / 👎.
|
Thasnks a lot @FlorianVeaux ! I was about to message you guys, this pr has been in the works for over a week and I was not sure this made a lot of sense xD |
What does this PR do?
Updates the SNMP integration's E2E test suite to align with the new SNMP corecheck tagging model introduced in DataDog/datadog-agent#49822. When
collect_device_metadatais enabled (the default since the linked agent PR), per-device metrics are tagged with onlydd.internal.resource:ndm_device:<ns>:<ip>; the backend joins the device, profile and metadata-derived tags onto metrics at ingest from thenetwork-devices-metadataevent-platform payload. The aggregator state visible toagent check --json(which is whatdd_agent_checkreplays into the test stub) therefore no longer carries those tags on metric submissions.Changes:
common.DEVICE_METADATA_TAG_KEYSandcommon.filter_metric_tags()describing the keys now carried only on the metadata payload (identity tags, profile-level globals, commonmetadata.device.fields).assert_common_metrics,assert_common_check_run_metrics,assert_common_device_metricsinsnmp/tests/common.pyto filter device-level tags out of metric assertions whenloader='core'.assert_extend_*helpers insnmp/tests/test_e2e_core_profiles/utils.pyto filter theircommon_tagsargument before passing it toaggregator.assert_metric, and updatesassert_all_profile_metrics_and_tags_coveredto exclude device-metadata keys from its expected tag-key set.aggregator.assert_metriccall insnmp/tests/test_e2e_core.pyand the 144 files undersnmp/tests/test_e2e_core_profiles/to compute a localmetric_tags = common.filter_metric_tags(common_tags)and use it in place ofcommon_tagsfor per-metric assertions. Metadata assertions (assert_device_metadata) continue to use the full device-level tag list.snmp/tests/test_e2e_core_metadata.pyuntouched (it already validates the device tags via thenetwork-devices-metadatapayload and was unaffected by #49822).snmp/tests/test_e2e_python.pyuntouched (Python loader is unaffected by the corecheck change).snmp/tests/test_e2e_core_vs_python.pyuntouched (this comparator explicitly setscollect_device_metadata=falseon the core-loader config it builds, so it exercises the legacy tagging path).snmp/tests/test_e2e_snmp_listener.pymigration (the listener test is currently skipped in CI for unrelated reasons; will revisit when it is re-enabled).No agent-shipped source files change; this is test-suite only.
Motivation
158 of 225 SNMP E2E tests have been failing deterministically on every master run since DataDog/datadog-agent#49822 landed at 2026-05-13 13:04 UTC. The previous mitigation (#23646) added a retry around
agent checkinvocations, which targeted a different symptom (Error: no valid check found) and is not helping here — the agent invocation succeeds; the metrics just no longer carry device tags. The right fix is to update the test expectations to match the new agent contract.Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged