[pull] master from DataDog:master#520
Merged
Merged
Conversation
* feat(nutanix): add state tags and disk_status
Introduce resource state tags so capacity-planning workflows can filter
out hosts in maintenance, disconnected hosts, powered-off VMs, degraded
disks, and clusters in special operation modes:
- ntnx_maintenance_state, ntnx_connection_state on host.*
- ntnx_power_state on vm.* (was previously only on vm.status)
- ntnx_operation_mode on cluster.*
- ntnx_disk_status on host.storage_* (worst-status aggregation)
Disk status is sourced once per check from the cluster-wide
api/clustermgmt/v4.0/config/disks endpoint and cached by node ID.
_report_stats gains an extra_tags_by_key parameter so disk_status is
scoped to host.storage_* metrics only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(nutanix): cover disk_status and state tag extraction
Add focused unit tests for the new state-tag and disk_status paths:
- test_disk_status: parametrized aggregation matrix (NORMAL, degraded
states, $UNKNOWN/$REDACTED, mixed, forward-compat enum values),
cache-build defensiveness (skips disks missing nodeExtId), and an
end-to-end check that a degraded disk flows ntnx_disk_status:degraded
onto storage_* metrics. Disks-endpoint failure is asserted to leave
storage metrics emitted without the tag.
- test_tag_extraction: defensive checks for missing hypervisor block,
missing config.operationMode, empty-string field values, and verbatim
PAUSED power-state preservation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(nutanix): record fixtures by resource selectively
Add --resources / -r flag to the fixture recorder accepting a comma-
separated or repeated list of resources to refresh, plus --list to
enumerate available names. When a dependent resource is requested
without its prerequisites (e.g., host_stats without clusters/hosts),
the prerequisites are fetched in memory but their fixtures are not
overwritten.
Adds record_disks() targeting api/clustermgmt/v4.0/config/disks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(nutanix): expose state tags in Capacity Planning
Consolidate the Capacity Planning section into three unified tables —
Clusters, Hosts, VMs — each grouped by their state tags so capacity
planners can filter inline by maintenance/connection/power/operation
state. Add a Storage Capacity by Host table grouped by ntnx_disk_status
to surface degraded storage. The intro note now summarizes the inputs
and the available state tags.
Datadog lowercases tag values at ingestion, so all tag-value filters in
the dashboard use lowercase (e.g., ntnx_connection_state:connected).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(nutanix): tighten disk_status helpers
Minor compression in the disk_status path with no behavior change:
- _aggregate_disk_status: drop the redundant empty-set guard and
inline the degraded_states constant — set ops on empty sets work,
the constant was used once.
- _report_stats: collapse the 4-line param docstring to one line.
- _process_single_host: inline the disk_status_extra_tags local;
it was used exactly once.
- _get_disk_status_storage_tags: fold the storage-keys tuple onto
one line.
Net -22 lines in infrastructure_monitor.py; all 136 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(nutanix): guard disks API consumers against malformed entries
The new /api/clustermgmt/v4.0/config/disks consumers (_build_disks_by_
host_cache, _aggregate_disk_status) iterate over response items and
call .get(...) on each — if Nutanix ever returns a null entry or a
non-dict, that would raise AttributeError. Add isinstance(d, dict)
filters at both sites and parametrized test coverage for None /
strings / ints mixed into the disk list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(nutanix): emit vm.disk_capacity_bytes from config
The metric was mapped from diskCapacityBytes in VM_STATS_METRICS, but
the Nutanix v4 stats endpoint never returns that field — so the metric
was effectively never emitted (the existing test list parked it under
VM_STATS_METRICS_OPTIONAL, which silently let it slip).
Source it instead from the VM config: sum vm.disks[].backingInfo.disk
SizeBytes. Same approach as memory.allocated_bytes from memorySizeBytes.
This makes the metric reliably available for capacity-planning queries
and the dashboard's per-VM Disk Allocated column now populates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(nutanix): correct host power metric name in optional test list
HOST_STATS_METRICS_OPTIONAL had nutanix.host.power.consumption.instant_watt
(dots) but the canonical metric — emitted by the integration and matching
the cluster equivalent — is nutanix.host.power_consumption_instant_watt
(underscores). The typo made the OPTIONAL entry dead: when the metric ever
lands in a power-meter-equipped environment, assert_all_metrics_covered()
would fail because the OPTIONAL bucket entry didn't actually cover it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(nutanix): add per-metric state-tag coverage guards
Six tests that introspect the aggregator after a check run and assert
every new state tag lands on every applicable metric:
- ntnx_maintenance_state on every host metric for hosts whose source
data has the field
- ntnx_connection_state on every host metric (both fixture hosts have it)
- ntnx_power_state on every VM metric
- ntnx_operation_mode on every cluster metric
- ntnx_disk_status on every host.storage_* metric
- ntnx_disk_status NOT present on non-storage host metrics (scoping guard)
If someone removes a tag emission from one of the _extract_*_tags methods,
the failure message names every metric that lost the tag rather than
failing at the first bundled assertion in test_hosts/test_vms/test_clusters
and stopping there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix changelog pr number
* fix(nutanix): address review feedback for state tags
Bundle of changes driven by the PR review:
- Lowercase state-tag values (ntnx_connection_state, ntnx_operation_mode,
ntnx_power_state) at emission so the code matches Datadog's
ingestion-time normalization explicitly rather than relying on it
implicitly. ntnx_maintenance_state was already lowercase from the API.
- Always emit ntnx_power_state, falling back to "unknown" when the
source field is missing/empty. Previously the tag silently dropped
for those VMs, breaking dashboards/monitors that group by power_state.
- Hoist DEGRADED_DISK_STATUSES and HOST_STORAGE_STAT_KEYS to
datadog_checks/nutanix/metrics.py as module-level frozensets, and
derive HOST_STORAGE_METRICS in tests/metrics.py from them — eliminates
the three independent enumerations that could drift if a fifth
storage stat is added.
- Log the disk count alongside the host bucket count when caching
("Cached %d disks across %d hosts") for easier triage.
- Move the InfrastructureMonitor stub fixture used by tag-logic unit
tests into tests/conftest.py with the union of attributes both files
need; remove the divergent local fixtures.
- Add a parametrized unit test for _extract_vm_disk_capacity_bytes
covering missing disks, missing backingInfo/diskSizeBytes, and
non-dict entries — previously only happy-path e2e coverage existed.
156 unit tests pass; 1 dropped review finding (functional reviewer's
"silent failure" claim about the dashboard) was correctly contested
by the integrations reviewer — Datadog auto-lowercases tag values at
ingestion. Lowercasing at emission is purely a clarity improvement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(nutanix): guard state-tag .lower() against non-string values
The previous commit introduced .lower() on the four state tags
(maintenance, connection, operation_mode, power) to align emission
with Datadog's lowercased ingestion. If the Nutanix API ever returns
a non-string value (e.g. an int, bool, or list — implausible per the
v4 schema, but possible if the API misbehaves), .lower() would raise
AttributeError. The previous walrus-only pattern would have just
formatted the unusual value via __str__ — no crash.
Add a tiny module-level _norm_state(value) helper:
- Returns value.lower() when value is a non-empty str.
- Returns None for missing, empty, or non-string values.
Threading it through the four sites preserves the walrus :=
pattern and keeps each emission a single line. For VM powerState
the fallback to "unknown" is unchanged.
Adds parametrized regression tests covering int/bool/list inputs
on host, cluster, and VM tag extraction. 162 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(nutanix): rewrite changelog entries for customer audience
- Drop 23578.fixed.1: the always-emit ntnx_power_state behavior is
part of the tag's first shipped contract (added in this same PR);
not a fix to a previously-released feature.
- Rewrite 23578.added to lead with the user benefit (capacity-planning
filtering) and structure the tags as bullets.
- Trim 23578.fixed of internal field names; surface only what users
observe ("the metric now reports values where it didn't before").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(nutanix): split vm.disk_capacity_bytes fix into separate PR
Move the vm.disk_capacity_bytes fix out of this PR; it ships separately
in PR #23583. This PR is now state-tags-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(nutanix): drop private-method tests in favor of check-level coverage
Remove test_tag_extraction.py and test_state_tag_coverage.py, and trim
test_disk_status.py to only the integration-style tests that exercise
the check end-to-end. Coverage of the new state tags is preserved
through HOST_TAGS/CLUSTER_TAGS/PCVM_TAGS in the existing per-entity
tests. Also drop the now-unused monitor fixture from conftest, simplify
_norm_state, and remove a redundant defensive check in
_aggregate_disk_status.
* docs(nutanix): rewrite state-tags changelog for users
* docs(nutanix): collapse state-tags changelog into one line
* docs(nutanix): frame state-tags changelog as a new feature
* fix(nutanix): always emit state tags with unknown fallback
Make ntnx_maintenance_state, ntnx_connection_state, ntnx_operation_mode,
and ntnx_disk_status emit consistently like ntnx_power_state already
does — always present, falling back to "unknown" when the source field
is missing. Also normalize the spec-defined sentinel values (\$UNKNOWN,
\$REDACTED, UNDETERMINED) to "unknown" so they don't surface as ugly
tag values like ntnx_operation_mode:\$unknown.
Without this, dashboards and monitors filtering on these tags silently
drop entities whose source field is missing — the same failure mode
the power-state fallback was originally added to prevent.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Isolate ddev release-notes extraction in an unprivileged job The publish job holds contents: write and id-token: write. Installing ddev there pulled the full transitive dependency graph into the same workspace as the release artifacts and credentials, giving any compromised dependency a path to tamper with archives/installers before upload (AI-6799). Move the install + ddev release changelog show into a new extract-release-notes job with permissions: contents: read. The publish job downloads release-notes.md by exact artifact name and no longer runs any pip install. Extraction stays best-effort: on failure we still upload an empty file so the release proceeds with an empty body rather than blocking after PyPI has already published. * Drop unused define-tags dependency from extract-release-notes * Make release-notes extraction always succeed and always upload Previously only the ddev release changelog show step handled failure gracefully. Setup, install, and ddev config could still fail and skip the artifact upload, which would in turn cause publish to be skipped because of the needs dependency, blocking PyPI and the GitHub release. Pre-create an empty release-notes.md, mark the install/configure/extract steps continue-on-error: true, and run the upload with if: always(). The job's conclusion is now success regardless of transient setup or install failures, and an artifact (possibly empty) is always available for the publish job.
* Add n8n default * Add changelog * Apply suggestion from @sarah-witt * validate --------- Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
* Fill in metadata.csv descriptions Populate empty descriptions for datadog.agent.python.version, datadog.agent.running, and datadog.agent.started, sourced from https://docs.datadoghq.com/getting_started/agent/#agent-metrics * Refine metadata.csv descriptions * Address review: natural language tags and uniform quoting
* Handle promotion failure on forks * use pull_request_target
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )