feat(nutanix): add support for alerts tracking (DataDog#23538)

NouemanKHAL · claude · janine-c · web-flow · commit b8dceeef4691 · 2026-06-10T07:44:56.000Z
* feat(nutanix): alert lifecycle tracking with per-state metrics

Replace the cursor-based alert collection with a reconciliation loop
against the v4.0 unresolved-alerts API. Ship per-state lifecycle gauges
and a default monitor template:

  - nutanix.alert.open       — 1 while alert is unresolved + unacknowledged
  - nutanix.alert.acknowledged — 1 while alert is unresolved + acknowledged
  - nutanix.alert.resolved   — 1 once when alert enters the resolved state

State transitions emit explicit zeros to the previous state's metric so
per-alert monitor cases recover cleanly when the alert leaves a state.
Each metric carries ext_id for monitor grouping; the metric name itself
encodes the state, so monitor queries don't need a tag filter.

Lifecycle events (in addition to the metrics):

  - "Alert: &lt;title&gt;"               — created (or re-opened from resolved)
  - "Alert acknowledged: &lt;title&gt;"  — open -&gt; acknowledged transition
  - "Alert reopened: &lt;title&gt;"      — acknowledged -&gt; open transition
  - "Alert Resolved: &lt;title&gt;"      — resolution with resolvedTime / by /
                                     auto_resolved metadata

Reconciliation is the source of truth each cycle: alerts in the API but
not in the in-memory cache are new (emit open event); alerts in the cache
but absent from the API are resolved or deleted (emit resolution event +
.open or .acknowledged = 0, .resolved = 1); alerts in both have their
cached metadata refreshed and ack-state transitions emit dedicated
events. Stateless across check cycles in terms of persistence — agent
restarts re-derive state from the API; the aggregation_key collapses any
visible duplicate creation events on restart.

Hardening:
- on transient API failure, re-emit cached gauges before re-raising so
  per-alert monitors don't auto-resolve while the alert is still open.
- pre-compute new/gone/still-tracked sets before mutating _open_alerts
  so loop ordering is safe.
- v4.2 fallback removed; v4.0 endpoint with $filter=isResolved eq false
  is the only path. The pre-existing client-side filter remains as a
  safety net.

Tags added to alert events and metrics:
  - ext_id, ntnx_alert_type, ntnx_alert_severity, ntnx_alert_status
    (events only — redundant on metrics where the name encodes state)
  - ntnx_originating_cluster_name, ntnx_alert_user_defined,
    ntnx_alert_service (Tier 1 — distinguish federated cluster, custom
    vs platform alerts, and Nutanix subsystem when present)
  - ntnx_cluster_name, ntnx_alert_classification, ntnx_alert_impact,
    ntnx_alert_auto_resolved (resolution events only), source-entity tags

Default monitor template at assets/monitors/alerts.json combines
nutanix.alert.open + nutanix.alert.acknowledged minus
nutanix.alert.resolved to alert on any unresolved alert (clamped to
non-negative). Auto-resolves on the resolved one-shot. Description
notes the agent-restart re-broadcast trade-off.

Test coverage: state transitions (open&lt;-&gt;ack, ack-&gt;resolved from each
prior state), filter-add edge case (treated as spurious resolution),
deleted-alert (_get_alert returns None) graceful fallback, empty
unresolved list cold-start, and per-tag assertions for the new Tier 1
tags. The four "complete output" alertType tests are parametrized.
conftest mock has a _filter_after helper for the time-based fixture
branches.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* chore(nutanix): consolidate alert changelog entries

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* chore(nutanix): revert version bump and shorten changelog

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* working monitor

* feat(nutanix): heartbeat open alerts each cycle for event-based monitors

Re-emit one alert event per tracked alert per check cycle so event-count
monitors (last("Nm") &gt; 0) stay firing while the alert is open. Transition
cycles skip the heartbeat — the dedicated transition event already lands
under the same aggregation_key. Resolved alerts are popped from
_open_alerts before the heartbeat loop, so they don't get a duplicate
heartbeat alongside their resolution event.

Ship the default monitor template at assets/monitors/alerts.json with a
real title and description.

Tests cover the new heartbeat skip-list (transitions, resolutions,
filter-exclusion), aggregation_key consistency across the full alert
lifecycle, and the cached-gauges-no-events contract on transient API
failures.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* chore(nutanix): revert version bump and simplify monitor threshold

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* chore(nutanix): address review feedback on alert lifecycle tracking

- Drop the leaky self.alerts cache; _get_alert is now a one-shot fetch.
- Warn loudly when the client-side isResolved safety net drops alerts.
- Note that lastUpdatedTime is the closest signal for ack-&gt;open transitions.
- Fix triple-space in monitor title, fix date format to YYYY-MM-DD.
- Document the alert lifecycle and agent-restart behavior in the README.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix(nutanix): address alert tracking review feedback

- Repoint NutanixCheck.alerts at _open_alerts so the public property
  matches what the activity monitor actually carries.
- Restore a per-cycle dedup cache on _get_alert so _process_task does
  not issue N x M GETs for tasks referencing the same alerts.
- Extract _reconcile_alerts into named helpers (new, resolved,
  transitioned, heartbeat, cached-gauge fallback) so the coordinator
  reads cleanly.
- Namespace ext_id as ntnx_alert_ext_id on metrics, events, the
  monitor template, and metadata.csv to avoid colliding with other
  sources in the global ext_id tag.
- Log a warning when the unresolved-alerts list call fails, and
  another when a gone alert cannot be fetched back from Prism Central.
- Switch nutanix.alert.resolved from gauge to count so a resolved
  alert reopening with the same extId does not leave a stuck
  resolved=1 series.
- Monitor template: query threshold &gt; 0, use is_recovery consistently.
- Update record_fixtures.py to use the production isResolved eq false
  filter so re-recorded fixtures match what the integration queries.
- Document the alert lifecycle, agent restart behavior, and recommended
  metric monitor patterns in the README.
- Add a test for the resolved to open lifecycle with the same extId.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix(nutanix): quote metadata.csv description containing a comma

The nutanix.alert.resolved description contained an unquoted comma,
splitting the row into 12 columns at parse time and breaking metadata
validation.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* fix(nutanix): correct alert reconciliation edge cases from review

Two distinct correctness fixes to the alert lifecycle tracking path:

- _get_alert no longer swallows every exception. A 404 still returns None
  (the alert was deleted upstream), but transient HTTP failures (5xx,
  network, timeout) propagate so they are not silently misclassified as
  deletions and emit degraded "Resolved" events. _emit_resolved_alerts
  catches the propagated HTTPError per-alert, restores tracking, and
  retries on the next cycle. Returned count now reflects actual emissions.

- _reconcile_alerts now distinguishes alerts that truly left the
  unresolved-alerts API (resolved/deleted upstream) from alerts that are
  still open in Prism Central but no longer match the configured
  resource_filters. The latter are dropped from tracking silently with an
  info log; no resolution event or nutanix.alert.resolved increment is
  emitted, since the alert is not resolved.

Test updates:
- New test_get_alert_returns_none_on_404, test_get_alert_propagates_on_transient_http_error[500/502/503/504], and test_transient_alert_get_failure_preserves_tracking.
- Existing test_alert_filter_excludes_tracked_alert_emits_spurious_resolution rewritten as test_alert_filter_excludes_tracked_alert_drops_without_resolution (pinned the prior bug; now asserts the correct behavior).
- New test_resolution_event_still_fires_when_alert_truly_leaves_unresolved_api guards the gone_ids semantics regression.
- conftest 404 mocks switched from bare Exception to requests.exceptions.HTTPError(response=...) so the 404 branch is actually exercised.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* Apply suggestions from code review

Co-authored-by: Janine Chan &lt;64388808+janine-c@users.noreply.github.com&gt;

* fix(nutanix): shorten monitor description to under 300 chars

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;

* fix(nutanix): address review feedback on alert tracking

- Guard the best-effort _get_alert call in _process_task with try/except
  HTTPError so a transient failure no longer aborts the task collection
  cycle, matching the guard already used in _emit_resolved_alerts.
- Report currently-tracked open alerts in the check summary log instead
  of the per-cycle change count, which read 0 on quiet cycles. Only the
  INFO summary is affected; events and metrics are unchanged.
- Drop the leading underscore from the SEVERITY_TO_ALERT_TYPE constant.
- Move the fixture_alert helper to conftest.py and inline the
  complete-output parametrize cases.

Co-Authored-By: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;

* fix(nutanix): stamp open alert event at observation time

The open/heartbeat alert event was timestamped at the alert's
creationTime. Event monitors window on occurrence time, so back-dated
heartbeats never entered the recommended monitor's trailing 5m window:
the monitor fired once at creation, then auto-recovered ~5m later and
stayed OK regardless of the alert's real state in Prism Central.

Stamp the open/heartbeat event at observation time so it lands in the
monitor's rolling window. Recovery is now driven by heartbeats ceasing,
so the monitor recovers ~5m after the alert actually resolves.

Resolution and transition events keep their real timestamps; they are
not counted by the status:open query and only feed the timeline.

Co-Authored-By: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;

* chore(nutanix): drop fixed changelog fragments

Keep only the added fragments on this branch; the fixed entries are
removed at the maintainer's request.

Co-Authored-By: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
Co-authored-by: Janine Chan &lt;64388808+janine-c@users.noreply.github.com&gt;
diff --git a/nutanix/README.md b/nutanix/README.md
@@ -63,6 +63,18 @@ Use the `collect_events`, `collect_alerts`, `collect_tasks`, and `collect_audits
 
 **Note**: By default, only parent tasks are collected. Set `collect_subtasks: true` to include subtasks.
 
+**Alert lifecycle.** Alerts are reconciled against Prism Central's unresolved-alerts API on every check cycle. While an alert is open, a heartbeat event (`msg_title: Alert: ...`) is emitted each cycle so event-based monitors stay firing; the first occurrence acts as the creation event. Transition events are emitted when an alert is acknowledged or reopened, and a resolution event is emitted when the alert is resolved or deleted. All events for the same alert share `aggregation_key=nutanix-alert-<extId>`, which collapses them into a single entry in the Events Explorer.
+
+**Agent restart.** The integration is stateless across restarts. On startup it fetches all currently-unresolved alerts and re-emits a heartbeat event for each; `aggregation_key` collapses these duplicates with any prior events. State changes (acknowledgement, reopening) that happen during Agent downtime are not retroactively emitted as transition events. The next check cycle picks up the current state and proceeds normally.
+
+**Building metric-based monitors for alerts.** The state of an alert is captured by `nutanix.alert.open` and `nutanix.alert.acknowledged` (gauges). `nutanix.alert.resolved` is a `count` of resolution transitions, not a state. Recommended patterns:
+
+- Active alerts: `avg:nutanix.alert.open{*}.default_zero() > 0` by `ntnx_alert_ext_id`.
+- Active or acknowledged: `avg:nutanix.alert.open{*} + avg:nutanix.alert.acknowledged{*}` with `default_zero` and threshold `> 0`, grouped by `ntnx_alert_ext_id`.
+- Resolution rate: `sum:nutanix.alert.resolved{*}.as_count()` for dashboards or backlog monitors.
+
+Because `nutanix.alert.resolved` is a count, do not subtract it from the open or acknowledged gauges; an alert can transition from resolved back to open with the same `ntnx_alert_ext_id`, and `.open` alone is the correct state signal.
+
 ### Service Checks
 
 The integration does not emit any service checks.
diff --git a/nutanix/assets/monitors/alerts.json b/nutanix/assets/monitors/alerts.json
@@ -0,0 +1,35 @@
+{
+	"version": 2,
+	"created_at": "2026-05-14",
+	"last_updated_at": "2026-05-14",
+	"title": "Nutanix alert is open in Prism Central",
+	"description": "Tracks open Nutanix alerts from Prism Central. Fires when an alert is unresolved, auto-recovers on acknowledgement or resolution. Lifecycle events (created, acknowledged, reopened, resolved) are emitted to the Events Explorer under the same aggregation key.",
+	"definition": {
+		"id": 281752599,
+		"name": "{{#is_alert}}OPEN{{/is_alert}}{{#is_recovery}}RESOLVED{{/is_recovery}} [Nutanix {{ntnx_alert_severity.name}}] [{{ntnx_alert_impact.name}}] [{{ntnx_originating_cluster_name.name}}] - {{ntnx_alert_ext_id.name}}",
+		"type": "event-v2 alert",
+		"query": "events(\"source:nutanix ntnx_type:alert ntnx_alert_status:open\").rollup(\"cardinality\", \"@aggregation_key\").by(\"@aggregation_key,ntnx_alert_severity,ntnx_alert_impact,ntnx_originating_cluster_name\").last(\"5m\") > 0",
+		"message": "{{#is_alert}}A Nutanix alert has been raised or escalated.{{/is_alert}}\n  {{#is_recovery}}The underlying Nutanix alert has recovered.{{/is_recovery}}\n\n  **Alert:** `{{@aggregation_key.name}}`\n  **Severity:** `{{ntnx_alert_severity.name}}`\n  **Impact:** `{{ntnx_alert_impact.name}}`\n  **Originating cluster:** `{{ntnx_originating_cluster_name.name}}`\n\n  **Transition observed at:** {{last_triggered_at}}",
+		"tags": [],
+		"options": {
+			"thresholds": {
+				"critical": 0
+			},
+			"enable_logs_sample": false,
+			"notify_audit": false,
+			"on_missing_data": "default",
+			"include_tags": true,
+			"new_group_delay": 60,
+			"renotify_interval": 0,
+			"escalation_message": "",
+			"silenced": {}
+		},
+		"priority": null,
+		"restriction_policy": {
+			"bindings": []
+		}
+	},
+	"tags": [
+		"integration:nutanix"
+	]
+}
diff --git a/nutanix/changelog.d/23538.added b/nutanix/changelog.d/23538.added
@@ -0,0 +1 @@
+Track each Nutanix alert through its lifecycle (open, acknowledged, resolved) with dedicated metrics, transition events, and a default monitor template.
diff --git a/nutanix/datadog_checks/nutanix/activity_monitor.py b/nutanix/datadog_checks/nutanix/activity_monitor.py
diff --git a/nutanix/datadog_checks/nutanix/check.py b/nutanix/datadog_checks/nutanix/check.py
@@ -72,7 +72,7 @@ def audits(self):
 
     @property
     def alerts(self):
-        return self.activity_monitor.alerts
+        return self.activity_monitor._open_alerts
 
     @property
     def tasks(self):
diff --git a/nutanix/manifest.json b/nutanix/manifest.json
@@ -62,7 +62,9 @@
         "Nutanix - Overview": "assets/dashboards/nutanix_overview.json",
         "Nutanix - Activity Monitoring": "assets/dashboards/nutanix_activity_monitoring.json"
     },
-    "monitors": {},
+    "monitors": {
+        "Nutanix alert is open": "assets/monitors/alerts.json"
+    },
     "saved_views": {}
   },
   "author": {
diff --git a/nutanix/metadata.csv b/nutanix/metadata.csv
@@ -1,4 +1,7 @@
 metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name,curated_metric,sample_tags
+nutanix.alert.acknowledged,gauge,,,,1 while a Nutanix alert is acknowledged but not yet resolved; 0 emitted once when leaving the acknowledged state. Tagged per-alert via ntnx_alert_ext_id.,0,nutanix,alert acknowledged,,ntnx_alert_ext_id
+nutanix.alert.open,gauge,,,,1 while a Nutanix alert is unresolved and unacknowledged; 0 emitted once when leaving the open state (acknowledged or resolved). Tagged per-alert via ntnx_alert_ext_id.,0,nutanix,alert open,,ntnx_alert_ext_id
+nutanix.alert.resolved,count,,,,"Incremented once each time a Nutanix alert is detected as resolved or deleted. Use for resolution-rate dashboards or backlog monitors; not a state metric, since alerts can transition from resolved back to open with the same ntnx_alert_ext_id. Use nutanix.alert.open for state.",0,nutanix,alert resolved,,ntnx_alert_ext_id
 nutanix.api.rate_limited,count,,,,Count of HTTP 429 rate limit responses from the Prism Central API.,0,nutanix,rate_limited,,
 nutanix.cluster.aggregate_hypervisor.memory_usage,gauge,,,,Total memory usage across all hypervisors in the cluster.,0,nutanix,usage,,
 nutanix.cluster.controller.avg_io_latency,gauge,,,,Average I/O latency of the cluster storage controller.,0,nutanix,latency,,
diff --git a/nutanix/tests/conftest.py b/nutanix/tests/conftest.py
@@ -5,12 +5,24 @@
 
 import json
 import os
+from datetime import datetime
 
 import pytest
+from requests.exceptions import HTTPError
 
 from datadog_checks.dev import docker_run, get_docker_hostname, get_here
 from datadog_checks.dev.conditions import CheckEndpoints
 
+
+def _filter_after(records, field, filter_param):
+    """Filter & sort records whose `field` ISO-8601 timestamp is after the value in `<field> gt …`."""
+    threshold = datetime.fromisoformat(filter_param.split(f"{field} gt ")[-1].strip().replace("Z", "+00:00"))
+    return sorted(
+        (r for r in records if r.get(field) and datetime.fromisoformat(r[field].replace("Z", "+00:00")) > threshold),
+        key=lambda r: datetime.fromisoformat(r[field].replace("Z", "+00:00")),
+    )
+
+
 HERE = get_here()
 HOST = get_docker_hostname()
 DOCKER_DIR = os.path.join(HERE, 'docker')
@@ -43,6 +55,15 @@ def load_fixture_page(filename, page):
     return {"data": [], "metadata": {"totalAvailableResults": 0}}
 
 
+def fixture_alert(alert_type, **overrides):
+    """Load the first fixture alert with the given alertType and apply overrides."""
+    for page in load_fixture('alerts.json'):
+        for alert in page.get('data', []):
+            if alert.get('alertType') == alert_type:
+                return {**alert, **overrides}
+    raise ValueError(f"No alert with alertType={alert_type} in fixture")
+
+
 # Test instance configurations
 INSTANCE = {
     "pc_ip": "10.0.0.197",
@@ -232,25 +253,8 @@ def mock_response(url, params=None, *args, **kwargs):
 
             filter_param = params.get('$filter', '') if params else ''
             if 'creationTime gt' in filter_param:
-                from datetime import datetime
-
-                filter_time_str = filter_param.split('creationTime gt ')[-1].strip()
-                filter_time = datetime.fromisoformat(filter_time_str.replace('Z', '+00:00'))
-
-                filtered_data = []
-                for event in response_data.get('data', []):
-                    event_time_str = event.get('creationTime', '')
-                    if event_time_str:
-                        event_time = datetime.fromisoformat(event_time_str.replace('Z', '+00:00'))
-                        if event_time > filter_time:
-                            filtered_data.append(event)
-
-                filtered_data.sort(
-                    key=lambda t: datetime.fromisoformat(t.get('creationTime', '').replace('Z', '+00:00'))
-                )
-
                 response_data = dict(response_data)
-                response_data['data'] = filtered_data
+                response_data['data'] = _filter_after(response_data.get('data', []), 'creationTime', filter_param)
 
             mock_resp.json = mocker.Mock(return_value=response_data)
             return mock_resp
@@ -260,33 +264,16 @@ def mock_response(url, params=None, *args, **kwargs):
 
             filter_param = params.get('$filter', '') if params else ''
             if 'creationTime gt' in filter_param:
-                from datetime import datetime
-
-                filter_time_str = filter_param.split('creationTime gt ')[-1].strip()
-                filter_time = datetime.fromisoformat(filter_time_str.replace('Z', '+00:00'))
-
-                filtered_data = []
-                for audit in response_data.get('data', []):
-                    audit_time_str = audit.get('creationTime', '')
-                    if audit_time_str:
-                        audit_time = datetime.fromisoformat(audit_time_str.replace('Z', '+00:00'))
-                        if audit_time > filter_time:
-                            filtered_data.append(audit)
-
-                filtered_data.sort(
-                    key=lambda t: datetime.fromisoformat(t.get('creationTime', '').replace('Z', '+00:00'))
-                )
-
                 response_data = dict(response_data)
-                response_data['data'] = filtered_data
+                response_data['data'] = _filter_after(response_data.get('data', []), 'creationTime', filter_param)
 
             mock_resp.json = mocker.Mock(return_value=response_data)
             return mock_resp
 
         # Individual alert fetch by ID (e.g. /alerts/{uuid})
         import re
 
-        alert_id_match = re.search(r'api/monitoring/v4\.\d/serviceability/alerts/([0-9a-f-]{36})', url)
+        alert_id_match = re.search(r'api/monitoring/v4\.0/serviceability/alerts/([0-9a-f-]{36})', url)
         if alert_id_match:
             alert_ext_id = alert_id_match.group(1)
             all_alerts = load_fixture_page("alerts.json", 0).get('data', [])
@@ -295,33 +282,22 @@ def mock_response(url, params=None, *args, **kwargs):
                 mock_resp.json = mocker.Mock(return_value={"data": alert_data})
             else:
                 mock_resp.status_code = 404
-                mock_resp.raise_for_status = mocker.Mock(side_effect=Exception("404 Not Found"))
+                mock_resp.raise_for_status = mocker.Mock(side_effect=HTTPError(response=mock_resp))
             return mock_resp
 
-        if 'api/monitoring/v4.0/serviceability/alerts' in url or 'api/monitoring/v4.2/serviceability/alerts' in url:
+        if 'api/monitoring/v4.0/serviceability/alerts' in url:
             response_data = load_fixture_page("alerts.json", page)
 
             filter_param = params.get('$filter', '') if params else ''
-            if 'creationTime gt' in filter_param:
-                from datetime import datetime
-
-                filter_time_str = filter_param.split('creationTime gt ')[-1].strip()
-                filter_time = datetime.fromisoformat(filter_time_str.replace('Z', '+00:00'))
-
-                filtered_data = []
-                for alert in response_data.get('data', []):
-                    alert_time_str = alert.get('creationTime', '')
-                    if alert_time_str:
-                        alert_time = datetime.fromisoformat(alert_time_str.replace('Z', '+00:00'))
-                        if alert_time > filter_time:
-                            filtered_data.append(alert)
-
-                filtered_data.sort(
-                    key=lambda t: datetime.fromisoformat(t.get('creationTime', '').replace('Z', '+00:00'))
-                )
-
+            if 'isResolved eq false' in filter_param:
                 response_data = dict(response_data)
-                response_data['data'] = filtered_data
+                response_data['data'] = [a for a in response_data.get('data', []) if not a.get('isResolved')]
+            elif 'lastUpdatedTime gt' in filter_param:
+                response_data = dict(response_data)
+                response_data['data'] = _filter_after(response_data.get('data', []), 'lastUpdatedTime', filter_param)
+            elif 'creationTime gt' in filter_param:
+                response_data = dict(response_data)
+                response_data['data'] = _filter_after(response_data.get('data', []), 'creationTime', filter_param)
 
             mock_resp.json = mocker.Mock(return_value=response_data)
             return mock_resp
@@ -330,32 +306,15 @@ def mock_response(url, params=None, *args, **kwargs):
 
             filter_param = params.get('$filter', '') if params else ''
             if 'createdTime gt' in filter_param:
-                from datetime import datetime
-
-                filter_time_str = filter_param.split('createdTime gt ')[-1].strip()
-                filter_time = datetime.fromisoformat(filter_time_str.replace('Z', '+00:00'))
-
-                filtered_data = []
-                for task in response_data.get('data', []):
-                    task_time_str = task.get('createdTime', '')
-                    if task_time_str:
-                        task_time = datetime.fromisoformat(task_time_str.replace('Z', '+00:00'))
-                        if task_time > filter_time:
-                            filtered_data.append(task)
-
-                filtered_data.sort(
-                    key=lambda t: datetime.fromisoformat(t.get('createdTime', '').replace('Z', '+00:00'))
-                )
-
                 response_data = dict(response_data)
-                response_data['data'] = filtered_data
+                response_data['data'] = _filter_after(response_data.get('data', []), 'createdTime', filter_param)
 
             mock_resp.json = mocker.Mock(return_value=response_data)
             return mock_resp
 
         print(f"[MOCK ERROR] No matching endpoint for URL: {url}")
         mock_resp.status_code = 404
-        mock_resp.raise_for_status = mocker.Mock(side_effect=Exception("404 Not Found"))
+        mock_resp.raise_for_status = mocker.Mock(side_effect=HTTPError(response=mock_resp))
         return mock_resp
 
     return mocker.patch('requests.Session.get', side_effect=mock_response)
diff --git a/nutanix/tests/docker/README.md b/nutanix/tests/docker/README.md
@@ -100,7 +100,6 @@ The Flask server mocks the following Nutanix Prism Central v4 APIs:
 - `GET /api/monitoring/v4.0/serviceability/events` - List events (paginated, time-filtered)
 - `GET /api/monitoring/v4.0/serviceability/audits` - List audits (paginated, time-filtered)
 - `GET /api/monitoring/v4.0/serviceability/alerts` - List alerts (paginated, time-filtered)
-- `GET /api/monitoring/v4.2/serviceability/alerts` - List alerts v4.2 (paginated, time-filtered)
 - `GET /api/prism/v4.0/config/tasks` - List tasks (paginated, time-filtered)
 
 ### Metadata APIs
diff --git a/nutanix/tests/docker/mock_server.py b/nutanix/tests/docker/mock_server.py
@@ -181,7 +181,6 @@ def audits():
 
 
 @app.route('/api/monitoring/v4.0/serviceability/alerts')
-@app.route('/api/monitoring/v4.2/serviceability/alerts')
 def alerts():
     """Alerts endpoint (paginated with time filtering)."""
     page = int(request.args.get('$page', 0))
diff --git a/nutanix/tests/metrics.py b/nutanix/tests/metrics.py
@@ -9,6 +9,12 @@
 # Host storage_* metric names — derived so test guards stay in sync with the production map.
 HOST_STORAGE_METRICS: frozenset[str] = frozenset(f"nutanix.{HOST_STATS_METRICS[k]}" for k in HOST_STORAGE_STAT_KEYS)
 
+ALERT_METRICS_OPTIONAL = [
+    "nutanix.alert.open",
+    "nutanix.alert.acknowledged",
+    "nutanix.alert.resolved",
+]
+
 CLUSTER_STATS_METRICS_REQUIRED = [
     "nutanix.cluster.aggregate_hypervisor.memory_usage",
     "nutanix.cluster.controller.avg_io_latency",
diff --git a/nutanix/tests/scripts/record_fixtures.py b/nutanix/tests/scripts/record_fixtures.py
@@ -395,34 +395,19 @@ def record_audits() -> None:
 
 
 def record_alerts() -> None:
-    """Record alerts fixture."""
-    # Get alerts from last 24 hours
-    now = datetime.now(timezone.utc)
-    start_time = now - timedelta(hours=24)
-    start_time_str = start_time.isoformat().replace("+00:00", "Z")
-
-    print(f"\nRecording alerts (from {start_time_str})")
+    """Record alerts fixture (currently-unresolved snapshot, matching production query)."""
+    print("\nRecording unresolved alerts")
 
     params = {
-        "$filter": f"creationTime gt {start_time_str}",
-        "$orderBy": "creationTime asc",
+        "$filter": "isResolved eq false",
+        "$orderBy": "lastUpdatedTime asc",
     }
 
-    # Try v4.2 first, fallback to v4.0
     try:
-        print("  Trying alerts API v4.2...")
-        pages = fetch_paginated_endpoint("api/monitoring/v4.2/serviceability/alerts", params=params)
+        pages = fetch_paginated_endpoint("api/monitoring/v4.0/serviceability/alerts", params=params)
         save_fixture("alerts.json", pages)
     except requests.exceptions.HTTPError as e:
-        print(f"  ⚠ v4.2 failed: {e}")
-        try:
-            print("  Falling back to alerts API v4.0...")
-            # v4.0 doesn't support filters
-            params_v40 = {}
-            pages = fetch_paginated_endpoint("api/monitoring/v4.0/serviceability/alerts", params=params_v40)
-            save_fixture("alerts.json", pages)
-        except requests.exceptions.HTTPError as e2:
-            print(f"  ⚠ v4.0 also failed: {e2}")
+        print(f"  ⚠ alerts fetch failed: {e}")
 
 
 def record_tasks() -> None:
diff --git a/nutanix/tests/test_alerts.py b/nutanix/tests/test_alerts.py
diff --git a/nutanix/tests/test_clusters.py b/nutanix/tests/test_clusters.py
diff --git a/nutanix/tests/test_metadata.py b/nutanix/tests/test_metadata.py

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Track each Nutanix alert through its lifecycle (open, acknowledged, resolved) with dedicated metrics, transition events, and a default monitor template.`