Skip to content

Commit 3bce4ec

Browse files
authored
Fix silent drop of ztunnel counter metrics in Istio ambient mode (DataDog#23707)
* Fix silent drop of ztunnel counter metrics in Istio ambient mode Ztunnel emits the modern OpenMetrics counter convention (TYPE declared with the base name, samples emitted with the _total suffix). The integration's ztunnel sub-scraper was routed through the legacy prometheus_client parser, which does not add _total to a counter's allowed sample names and yields each sample as an untyped singleton. Combined with construct_metrics_config stripping _total from registration keys, every ztunnel counter was silently dropped (failure visible only at DEBUG level). Force use_latest_spec=True on the ztunnel sub-scraper so the OpenMetrics parser is used. Scoped to ztunnel only; waypoint (Envoy, legacy convention) and istiod (Go client_golang, legacy convention) are unaffected. The user can still opt out by setting use_latest_spec: false on the instance. Adds an ambient e2e environment (py3.13-1.24-ambient) that installs Istio 1.24.3 with the ambient profile, deploys bookinfo, applies a waypoint, and runs a traffic generator so ztunnel counters are non-zero. The unit test fixture is now the real output captured from ztunnel running in that environment; the prior hand-written 1.5/ztunnel.txt fixture used a non-realistic TYPE convention and is removed. Adds test_ambient_ztunnel_legacy_parser_drops_counters that pins the broken behavior under use_latest_spec=false so a future regression that removes the default cannot reintroduce the silent drop. * Add changelog entry * Make ambient e2e teardown robust to missing kubectl.pid In CI, ddev env test runs the dd_environment fixture's setup and teardown in different pytest invocations. The DDEV_E2E_ENV_* env vars that map per-port-forward TempDir names to actual paths do not survive that transition, so KillProcess opens a freshly mkdtemp'd path and raises FileNotFoundError on kubectl.pid lookup. The kubectl port-forward processes are already dying on their own because kind delete cluster has removed the API server they were forwarding to, so swallowing the missing-pid-file error is safe. Wrap port_forward in a safe_port_forward context manager that suppresses FileNotFoundError on exit. Sidecar (single port-forward) is not affected; ambient (three port-forwards) hits this every run. * Use a Service for ztunnel port-forward instead of a dynamic pod The previous attempt port-forwarded pod/ztunnel-<random-suffix>. The dd_environment fixture runs in two pytest invocations under ddev env test (setup and teardown), and on the teardown invocation _get_first_ztunnel_pod runs against a kind cluster that has already been deleted, returning an empty string. port_forward then builds a different TempDir key than setup, the env-var bridge misses, and KillProcess opens an empty mkdtemp'd dir, raising FileNotFoundError on kubectl.pid. Other integrations like kuma avoid this by port-forwarding a Service (stable name). Create a Service in setup_istio_ambient via `kubectl expose daemonset ztunnel`, port-forward service/ztunnel-metrics. Drop the _get_first_- ztunnel_pod helper and the safe_port_forward workaround. * Create ztunnel Service via manifest, not kubectl expose kubectl expose does not support DaemonSets (error: cannot expose a DaemonSet.apps). The previous attempt to use kubectl expose silently failed and the port-forward of service/ztunnel-metrics then connected to nothing, surfacing as 'Connection refused' on the agent side. Apply a tracked Service manifest at tests/kind/ztunnel_service.yaml instead. * Restructure matrix orthogonally: version x mode The matrix previously folded mode into a single version-like dimension (version = ["1.13", "1.24-ambient"]), which scales poorly: adding ambient support for a new Istio version meant introducing a new "1.xx-ambient" pseudo-version alongside the plain "1.xx" entry. Split into orthogonal dimensions: version = ["1.13"] mode = ["sidecar"] # block 1 version = ["1.24"] mode = ["ambient"] # block 2 Two matrix blocks express the constraint that 1.13 does not support ambient (it was GA in 1.24). Adding a future version is a one-line change to the relevant block; adding a new mode on an existing version is a one-line change too once the matching setup function is in place. Env names become py3.13-1.13-sidecar and py3.13-1.24-ambient, with ISTIO_VERSION and ISTIO_MODE env vars driven independently from the two matrix axes. conftest's fallthrough switches to `else` (i.e. the sidecar branch) to match the new matrix semantics rather than gating on a hard-coded VERSION string. * Use matrix-axis shorthand for ISTIO_MODE env var * Add Istio 1.24 to the sidecar e2e matrix Migrate setup_istio off the baked IstioOperator manifest and onto `istioctl install --set profile=demo`, which is version-agnostic. Drop the now-unused 1.13-only manifest and extend the sidecar matrix to also cover 1.24, so the integration is exercised end-to-end against a modern Istio release in sidecar mode. * Mark Istio galley validation pass/fail metrics intermittent A fresh Istio 1.24 install on bookinfo emits istio.galley.validation.passed but not the .failed variant (no validation errors occur). The .passed.count v2 sibling was also missing from the list. Move both pairs into the intermittent list so the e2e tests pass on a clean cluster. * Mark gc_cpu_fraction intermittent for Istio 1.24 * Mark memstats.lookups intermittent for Istio 1.24 * Mark all pilot.conflict variants intermittent * Split legacy Go metrics from intermittent in Istio e2e The Go runtime metrics istio.go.memstats.gc_cpu_fraction and lookups_total are no longer emitted by Istio 1.24's binary (deprecated in Go 1.20 and later removed from client_golang's default exposition), but they are still emitted by Istio 1.13. Calling them "intermittent" on 1.13 silently hid any collection bug there. Pull them out into a LEGACY_GO_METRICS set that is strictly asserted on 1.13 envs and skipped on 1.24+. Genuinely-conditional metrics (validation failures, listener conflicts, sidecar injection events, etc.) remain in INTERMITTENT_METRICS and are asserted at_least=0 on every version. * Parse Istio version numerically for legacy check * Use packaging.version for Istio version comparison * Address PR review feedback - test_e2e_ambient: replace the dead at_least=0 conditional on ISTIOD_V2_METRICS with the existing _assert_istiod_metric helper so ambient mode also strictly asserts non-intermittent istiod metrics. - check.py: log a warning when the user opts out of the OpenMetrics parser (use_latest_spec: false) while ztunnel is being scraped. Comment the asymmetry between namespace (always restored) and use_latest_spec (user-overridable default) inside _generate_config. - conftest.py: replace the hard-coded 15-second sleep after the traffic generator pod is created with a 300-second polling loop that waits for ztunnel to report a non-zero TCP connection counter. - conftest.py: return osx-amd64 (not osx) for intel macOS in the Istio release suffix helper so local-dev downloads do not 404. - changelog: rephrase to name the affected version (9.4.0) so customers searching the changelog for the regression window can find this fix. * Fix ztunnel poll target and add Istio 1.29 to the matrix - _wait_for_ztunnel_traffic was exec'ing into the ztunnel pod, which runs a minimal Rust binary without curl. Move the kubectl exec into the traffic-gen pod (curlimages/curl) and target the ztunnel-metrics Service applied earlier in setup. Capture stderr instead of leaking it through CI logs. - Extend the matrix with Istio 1.29 (the current supported release as of 2026-05-18) in both sidecar and ambient mode. 1.24 stays since that is where ambient GA'd; 1.13 stays for the legacy Go-runtime safety net. * Address round-2 review feedback - check.py: use is_affirmative for the use_latest_spec opt-out guard so YAML string booleans ("false") also trigger the warning. - check.py: replace use_latest_spec_default kwarg with a keyword-only scraper_defaults mapping; precedence (defaults -> instance -> per- scraper restore) is now visible at the signature without an inline comment. - test_unit_istio_v2: assert directly that the ztunnel scraper config carries use_latest_spec=True, so a refactor that drops the default fails on shape rather than on fixture parsing. - test_e2e: treat unparseable ISTIO_VERSION as legacy so misconfigured local runs fail loudly instead of silently skipping LEGACY_GO_METRICS. - conftest: justify the per-line ValueError skip in _ztunnel_has_traffic with an inline comment. * Address round-3 review feedback - check.py: restore the invariant comment next to `config.update` so the namespace-must-follow-update rule is visible in code shape. - metadata.csv: rename the four ztunnel TCP counters from .total to .count to match the names the OpenMetrics V2 transformer actually submits (construct_metrics_config strips the _total suffix). This closes the metadata loop on the same regression class the PR fixes. - test_unit_istio_v2: add assert_metrics_using_metadata to the ztunnel metrics test so future mismatches between metrics.py, the transformer, and metadata.csv fail fast. - conftest: kubectl wait for traffic-gen Ready before polling so the 300s budget is spent on the actual wait-for-traffic signal instead of on "container not found" exec failures while the image pulls. - conftest: capture the last non-zero exec stderr and surface it in the RuntimeError so CI failures point at the real cause instead of a generic timeout message. * Address round-4 review feedback - Raise on the kubectl wait for traffic-gen so a timeout / pod-schedule failure surfaces immediately instead of being swallowed and then misattributed by the 300s polling loop downstream. - Build the ztunnel /stats/prometheus URL from the existing module constants so a future port or service rename touches one place. - Capture the last stdout excerpt on iterations where curl exits 0 but ztunnel reports no traffic (-sm 5 does not pass -f, so HTTP 4xx/5xx bodies land in stdout); include it in the final RuntimeError so the failure points at the real cause rather than reading "<none>". * Rename mechanical ztunnel .total counters to .count The OpenMetrics V2 transformer strips _total from the registered name and appends .count for counter-type metrics, so eight more ztunnel counter rows in metadata.csv (DNS, on_demand_dns, xds.connection_ terminations, connection.{opens,closes,termination}) were declared under the wrong suffix and went orphan once the V2 parser engaged. Basenames already match what metrics.py registers and what ztunnel 1.24 emits, so the rename is mechanical. The four remaining .total rows (proxies_{started,stopped}, active/ pending_proxy_count) have wrong basenames at the source — ztunnel 1.24 emits these under workload_manager_* — and need their registration corrected in a follow-up, not a suffix-only rewrite. * Realign ztunnel metric registration with real exposition The four in-pod proxy management metrics were registered under istio_proxies_started_total, istio_active_proxy_count_total etc., but ztunnel 1.24 emits them under the workload_manager_* family. Two more counters ztunnel emits (istio_xds_message_total and istio_xds_message_bytes_total) were not registered at all. Realign the registrations against the captured ztunnel exposition, update the four corresponding metadata.csv rows (correct suffix per gauge/counter type), add metadata entries for the two new xds counters, and expand V2_ZTUNNEL_METRICS so the unit and e2e tests assert all nine ztunnel metrics that the integration is now collecting. Split V2_ZTUNNEL_METRICS into counter / gauge sub-lists so the legacy-parser regression test pins the broken behavior on the counter set only (gauges are not affected by the parser bug). * Tighten prose hygiene and warning-silence regression - conftest: drop the seven-line docstring on the private _wait_for_ztunnel_traffic helper for a one-liner per AGENTS.md, with a short inline comment at the call site explaining the curl-from-traffic-gen indirection. - test_unit_istio_v2: assert in the happy-path ztunnel test that the opt-out warning stays silent, so a future change that emits it unconditionally fails the test instead of shipping noise. - common.py: collapse the multi-line block introducing the V2_ZTUNNEL_COUNTER_METRICS / V2_ZTUNNEL_GAUGE_METRICS split into two one-line comments. * Collapse prose comments per AGENTS.md - check.py: replace the four-line block above the ztunnel if-block with a single line. The scraper_defaults kwarg and the opt-out warning already explain the why. - common.py: replace the wrapped comment above V2_ZTUNNEL_COUNTER_METRICS with a single line that names the format-level reason (TYPE base name + _total samples) instead of the PR-narrative "before this PR's fix" framing. * Expand changelog to cover full ambient mode fix * Consolidate ambient changelog into a single fix entry * Tighten changelog wording * Drop dead istio_connection_* registrations from ZTUNNEL_METRICS These three counter registrations point at metric names ztunnel never emits. A search across the ztunnel source confirms only the tcp_connections_* and xds_connection_terminations families exist; the istio_connection_opens_total / closes_total / termination_total entries were dead from the start and carried matching orphan rows in metadata.csv. Removing both. * Document ambient mode and mark its settings fleet-configurable PR DataDog#22581 introduced ambient mode but never updated the docs or spec metadata. The three instance options it added (istio_mode, ztunnel_endpoint, waypoint_endpoint) were the only instance-level settings in the spec without fleet_configurable: true, so customers managing Istio from Datadog Fleet Automation could remote-configure every Istio setting except those three. The README similarly had no mention of ambient mode at all. - Add fleet_configurable: true to the three ambient settings in assets/configuration/spec.yaml; regenerate config_models. - Add an "Ambient mode configuration" subsection to the README under Metric collection, showing the ztunnel annotation pattern and pointing at waypoint as the L7 option. * Correct README: one ambient instance scrapes all three endpoints The previous wording suggested a second instance for waypoint metrics, but _parse_ambient_config reads ztunnel_endpoint, waypoint_endpoint, and istiod_endpoint from the same instance and spawns sub-scrapers for each. Replace the per-pod annotation example with a static conf.yaml covering all three components in one instance. * Drop hard-coded ports from ambient README prose Port 15020 is the Istio default but configurable; mentioning specific numbers in the prose can mislead a reader running a non-default exposition. The example URLs still show the conventional defaults, which is the right place for that information since users edit the URLs directly when their setup differs. * Address documentation review feedback on ambient README section - Spell out GA as 'generally available' on first mention. - Split the sentence joined by an em dash into two complete sentences. - Capitalize Autodiscovery (Datadog feature name). * Restore license headers on regenerated config_models Local ddev's model regeneration produced these files without the Datadog license header that the master branch tracks. CI's validate models check caught the drift.
1 parent 87825d7 commit 3bce4ec

15 files changed

Lines changed: 486 additions & 11456 deletions

File tree

istio/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,25 @@ This annotation specifies the container `discovery` to match the default contain
9090

9191
The method for applying these annotations varies depending on the [Istio deployment strategy (Istioctl, Helm, Operator)][22] used. Consult the Istio documentation for the proper method to apply these pod annotations. See the [sample istio.d/conf.yaml][8] for all available configuration options.
9292

93+
##### Ambient mode configuration
94+
95+
Istio ambient mode, generally available in Istio v1.24, replaces sidecar injection with two shared components: the `ztunnel` DaemonSet (L4 zero-trust tunneling) and optional `waypoint` proxies (L7 HTTP/gRPC processing). Set `istio_mode: ambient` and configure one or more of `ztunnel_endpoint`, `waypoint_endpoint`, and `istiod_endpoint` on the same instance. The check scrapes each endpoint that is set. Adjust the URLs in the example below to match your cluster's hostnames and ports.
96+
97+
Example static configuration in `istio.d/conf.yaml` covering all three components:
98+
99+
```yaml
100+
init_config:
101+
102+
instances:
103+
- istio_mode: ambient
104+
use_openmetrics: true
105+
ztunnel_endpoint: http://ztunnel.istio-system.svc:15020/stats/prometheus
106+
waypoint_endpoint: http://waypoint.<NAMESPACE>.svc:15020/stats/prometheus
107+
istiod_endpoint: http://istiod.istio-system.svc:15014/metrics
108+
```
109+
110+
Replace `<NAMESPACE>` with the namespace where you ran `istioctl waypoint apply`. Omit `waypoint_endpoint` if you have not deployed a waypoint proxy. The same options can be set via the Autodiscovery annotation syntax shown in the [Control plane configuration](#control-plane-configuration) section above.
111+
93112
#### Disable sidecar injection for Datadog Agent pods
94113

95114
If you are installing the [Datadog Agent in a container][10], Datadog recommends that you first disable Istio's sidecar injection.

istio/assets/configuration/spec.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ files:
2424
Specify the Istio data plane mode to monitor.
2525
- `sidecar`: Monitor Istio sidecar proxies (traditional mode)
2626
- `ambient`: Monitor Istio ambient mode components (ztunnel, waypoint proxies)
27+
fleet_configurable: true
2728
value:
2829
example: sidecar
2930
display_default: sidecar
@@ -35,6 +36,7 @@ files:
3536
Ztunnel is the L4 proxy that provides zero-trust tunneling and mTLS for ambient mesh.
3637
Only used when `istio_mode` is set to `ambient`.
3738
Ztunnel metrics are exposed on port 15020.
39+
fleet_configurable: true
3840
value:
3941
display_default: null
4042
example: http://ztunnel.istio-system:15020/stats/prometheus
@@ -45,6 +47,7 @@ files:
4547
Waypoint proxies provide optional L7 processing (HTTP/gRPC traffic management) in ambient mesh.
4648
Only used when `istio_mode` is set to `ambient`.
4749
Waypoint metrics are exposed on port 15020.
50+
fleet_configurable: true
4851
value:
4952
display_default: null
5053
example: http://waypoint.istio-system:15020/stats/prometheus

istio/changelog.d/23707.fixed

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Restore Istio ambient mode metric collection broken in 9.4.0: ztunnel counters are no longer silently dropped, proxy management metrics use the `workload_manager_*` names ztunnel actually emits, and the missing xDS message counters are now registered.

istio/datadog_checks/istio/check.py

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# Licensed under a 3-clause BSD style license (see LICENSE)
44
from collections import ChainMap
55

6-
from datadog_checks.base import ConfigurationError, OpenMetricsBaseCheckV2
6+
from datadog_checks.base import ConfigurationError, OpenMetricsBaseCheckV2, is_affirmative
77
from datadog_checks.base.checks.openmetrics.v2.scraper import OpenMetricsCompatibilityScraper
88

99
from .constants import ISTIOD_NAMESPACE
@@ -72,10 +72,24 @@ def _parse_ambient_config(self, istiod_endpoint, istiod_namespace):
7272
"`ztunnel_endpoint`, `waypoint_endpoint`, or `istiod_endpoint`."
7373
)
7474

75-
# Ztunnel provides L4 TCP metrics for ambient mesh
75+
# Ztunnel uses the modern OpenMetrics counter convention; force the v2 parser so counters are not dropped.
7676
ztunnel_namespace = istiod_namespace + ".ztunnel"
7777
if ztunnel_endpoint:
78-
self.scraper_configs.append(self._generate_config(ztunnel_endpoint, ZTUNNEL_METRICS, ztunnel_namespace))
78+
if not is_affirmative(self.instance.get("use_latest_spec", True)):
79+
self.log.warning(
80+
"`use_latest_spec: false` is set with `ztunnel_endpoint` configured. "
81+
"ztunnel emits the modern OpenMetrics counter convention which the "
82+
"legacy parser silently drops, so every ztunnel counter metric will be "
83+
"missed. Remove `use_latest_spec: false` to restore ztunnel metrics."
84+
)
85+
self.scraper_configs.append(
86+
self._generate_config(
87+
ztunnel_endpoint,
88+
ZTUNNEL_METRICS,
89+
ztunnel_namespace,
90+
scraper_defaults={'use_latest_spec': True},
91+
)
92+
)
7993

8094
# Waypoint provides L7 HTTP/gRPC metrics (optional in ambient mode)
8195
waypoint_namespace = istiod_namespace + ".waypoint"
@@ -86,16 +100,12 @@ def _parse_ambient_config(self, istiod_endpoint, istiod_namespace):
86100
if istiod_endpoint:
87101
self.scraper_configs.append(self._generate_config(istiod_endpoint, ISTIOD_METRICS, istiod_namespace))
88102

89-
def _generate_config(self, endpoint, metrics, namespace):
103+
def _generate_config(self, endpoint, metrics, namespace, *, scraper_defaults=None):
90104
metrics = construct_metrics_config(metrics)
91105
metrics.append(ISTIOD_VERSION)
92-
config = {
93-
'openmetrics_endpoint': endpoint,
94-
'metrics': metrics,
95-
'namespace': namespace,
96-
}
106+
config = {**(scraper_defaults or {}), 'openmetrics_endpoint': endpoint, 'metrics': metrics}
107+
# Instance keys override scraper_defaults; per-scraper namespace is restored on the next line.
97108
config.update(self.instance)
98-
# Restore per-scraper namespace so custom ztunnel/waypoint/mesh namespaces are not overwritten by instance
99109
config['namespace'] = namespace
100110
return config
101111

istio/datadog_checks/istio/metrics.py

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -276,17 +276,16 @@
276276
'istio_dns_upstream_failures_total': 'dns.upstream_failures.total',
277277
'istio_dns_upstream_request_duration_seconds': 'dns.upstream_request_duration_seconds',
278278
'istio_on_demand_dns_total': 'on_demand_dns.total',
279-
# In-pod proxy management metrics (unstable)
280-
'istio_active_proxy_count_total': 'active_proxy_count.total',
281-
'istio_pending_proxy_count_total': 'pending_proxy_count.total',
282-
'istio_proxies_started_total': 'proxies_started.total',
283-
'istio_proxies_stopped_total': 'proxies_stopped.total',
279+
# In-pod proxy management metrics (unstable). Ztunnel exposes these under the
280+
# workload_manager_* family, not istio_*.
281+
'workload_manager_active_proxy_count': 'active_proxy_count',
282+
'workload_manager_pending_proxy_count': 'pending_proxy_count',
283+
'workload_manager_proxies_started_total': 'proxies_started.total',
284+
'workload_manager_proxies_stopped_total': 'proxies_stopped.total',
284285
# XDS metrics (unstable)
285286
'istio_xds_connection_terminations_total': 'xds.connection_terminations.total',
286-
# Connection metrics (unstable)
287-
'istio_connection_opens_total': 'connection.opens.total',
288-
'istio_connection_closes_total': 'connection.closes.total',
289-
'istio_connection_termination_total': 'connection.termination.total',
287+
'istio_xds_message_total': 'xds.message.total',
288+
'istio_xds_message_bytes_total': 'xds.message_bytes.total',
290289
}
291290

292291

istio/hatch.toml

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,34 @@ dependencies = [
55
"requests-mock==1.4.0",
66
]
77

8+
# Istio supports two data plane modes: traditional sidecar injection, and ambient
9+
# (sidecar-less). Ambient mode graduated to GA in Istio 1.24. The matrix below is
10+
# split into version × mode blocks so each block declares only the (version, mode)
11+
# combinations that are actually supported by the version and have a working setup
12+
# function in conftest.py. To add a new Istio version, extend the relevant block's
13+
# `version` list; to add a new mode on an existing version, add the entry to the
14+
# right block (and extend conftest.setup_istio* accordingly).
15+
16+
# Sidecar-mode envs. 1.13 stays for the legacy Go-runtime safety net; 1.24 is where
17+
# ambient GA'd; 1.29 is the current supported release.
18+
[[envs.default.matrix]]
19+
python = ["3.13"]
20+
version = ["1.13", "1.24", "1.29"]
21+
mode = ["sidecar"]
22+
23+
# Ambient-mode envs. Requires Istio >= 1.24.
824
[[envs.default.matrix]]
925
python = ["3.13"]
10-
version = ["1.13"]
26+
version = ["1.24", "1.29"]
27+
mode = ["ambient"]
1128

1229
[envs.default.overrides]
1330
matrix.version.env-vars = [
1431
{ key = "ISTIO_VERSION", value = "1.13.3", if = ["1.13"] },
32+
{ key = "ISTIO_VERSION", value = "1.24.3", if = ["1.24"] },
33+
{ key = "ISTIO_VERSION", value = "1.29.2", if = ["1.29"] },
1534
]
35+
matrix.mode.env-vars = "ISTIO_MODE"
1636

1737
[envs.default.env-vars]
1838
DDEV_SKIP_GENERIC_TAGS_CHECK = "true"

istio/metadata.csv

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -447,24 +447,23 @@ istio.galley.istio.networking.virtualservices,gauge,,,,,0,istio,,
447447
istio.galley.istio.networking.destinationrules,gauge,,,,,0,istio,,
448448
istio.galley.istio.networking.gateways,gauge,,,,,0,istio,,
449449
istio.galley.istio.authentication.meshpolicies,gauge,,,,,0,istio,,
450-
istio.ztunnel.tcp.connections_opened.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total TCP connections opened through ztunnel",0,istio,ztunnel connections opened,
451-
istio.ztunnel.tcp.connections_closed.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total TCP connections closed through ztunnel",0,istio,ztunnel connections closed,
452-
istio.ztunnel.tcp.send_bytes.total,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes sent through ztunnel TCP connections",0,istio,ztunnel bytes sent,
453-
istio.ztunnel.tcp.received_bytes.total,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes received through ztunnel TCP connections",0,istio,ztunnel bytes received,
454-
istio.ztunnel.dns.requests.total,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS requests handled by ztunnel",0,istio,ztunnel dns requests,
455-
istio.ztunnel.dns.upstream_requests.total,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS requests forwarded to upstream by ztunnel",0,istio,ztunnel dns upstream requests,
456-
istio.ztunnel.dns.upstream_failures.total,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS upstream request failures in ztunnel",0,istio,ztunnel dns failures,
450+
istio.ztunnel.tcp.connections_opened.count,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total TCP connections opened through ztunnel",0,istio,ztunnel connections opened,
451+
istio.ztunnel.tcp.connections_closed.count,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total TCP connections closed through ztunnel",0,istio,ztunnel connections closed,
452+
istio.ztunnel.tcp.send_bytes.count,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes sent through ztunnel TCP connections",0,istio,ztunnel bytes sent,
453+
istio.ztunnel.tcp.received_bytes.count,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes received through ztunnel TCP connections",0,istio,ztunnel bytes received,
454+
istio.ztunnel.dns.requests.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS requests handled by ztunnel",0,istio,ztunnel dns requests,
455+
istio.ztunnel.dns.upstream_requests.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS requests forwarded to upstream by ztunnel",0,istio,ztunnel dns upstream requests,
456+
istio.ztunnel.dns.upstream_failures.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total DNS upstream request failures in ztunnel",0,istio,ztunnel dns failures,
457457
istio.ztunnel.dns.upstream_request_duration_seconds.count,count,,second,,"[OpenMetrics V1 and V2 and Istio v1.24+] Count of DNS upstream request durations in ztunnel",0,istio,ztunnel dns duration count,
458458
istio.ztunnel.dns.upstream_request_duration_seconds.sum,count,,second,,"[OpenMetrics V1 and V2 and Istio v1.24+] Sum of DNS upstream request durations in ztunnel",0,istio,ztunnel dns duration sum,
459-
istio.ztunnel.on_demand_dns.total,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total on-demand DNS requests in ztunnel",0,istio,ztunnel on-demand dns,
460-
istio.ztunnel.active_proxy_count.total,gauge,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Number of active in-pod proxies managed by ztunnel",0,istio,ztunnel active proxies,
461-
istio.ztunnel.pending_proxy_count.total,gauge,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Number of pending in-pod proxies in ztunnel",0,istio,ztunnel pending proxies,
462-
istio.ztunnel.proxies_started.total,count,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total number of in-pod proxies started by ztunnel",0,istio,ztunnel proxies started,
463-
istio.ztunnel.proxies_stopped.total,count,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total number of in-pod proxies stopped by ztunnel",0,istio,ztunnel proxies stopped,
464-
istio.ztunnel.xds.connection_terminations.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total XDS connection terminations in ztunnel",0,istio,ztunnel xds terminations,
465-
istio.ztunnel.connection.opens.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total connections opened in ztunnel",0,istio,ztunnel connection opens,
466-
istio.ztunnel.connection.closes.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total connections closed in ztunnel",0,istio,ztunnel connection closes,
467-
istio.ztunnel.connection.termination.total,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total connection terminations in ztunnel",0,istio,ztunnel connection terminations,
459+
istio.ztunnel.on_demand_dns.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total on-demand DNS requests in ztunnel",0,istio,ztunnel on-demand dns,
460+
istio.ztunnel.active_proxy_count,gauge,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Number of active in-pod proxies managed by ztunnel",0,istio,ztunnel active proxies,
461+
istio.ztunnel.pending_proxy_count,gauge,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Number of pending in-pod proxies in ztunnel",0,istio,ztunnel pending proxies,
462+
istio.ztunnel.proxies_started.count,count,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total number of in-pod proxies started by ztunnel",0,istio,ztunnel proxies started,
463+
istio.ztunnel.proxies_stopped.count,count,,,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total number of in-pod proxies stopped by ztunnel",0,istio,ztunnel proxies stopped,
464+
istio.ztunnel.xds.connection_terminations.count,count,,connection,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total XDS connection terminations in ztunnel",0,istio,ztunnel xds terminations,
465+
istio.ztunnel.xds.message.count,count,,message,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total XDS messages exchanged between ztunnel and istiod",0,istio,ztunnel xds messages,
466+
istio.ztunnel.xds.message_bytes.count,count,,byte,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total bytes of XDS messages exchanged between ztunnel and istiod",0,istio,ztunnel xds message bytes,
468467
istio.waypoint.request.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Total HTTP requests through waypoint proxy",0,istio,waypoint requests,
469468
istio.waypoint.request.duration.milliseconds.count,count,,request,,"[OpenMetrics V1 and V2 and Istio v1.24+] Count of HTTP request durations through waypoint proxy",0,istio,waypoint request duration count,
470469
istio.waypoint.request.duration.milliseconds.sum,count,,millisecond,,"[OpenMetrics V1 and V2 and Istio v1.24+] Sum of HTTP request durations through waypoint proxy",0,istio,waypoint request duration sum,

istio/tests/common.py

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -409,19 +409,26 @@
409409
'istio.galley.istio.authentication.meshpolicies',
410410
]
411411

412-
# Ambient mode (ztunnel) - default namespace istio.ztunnel (OpenMetrics submits counters as .count)
413-
V2_ZTUNNEL_METRICS = [
412+
# Ambient mode (ztunnel) - default namespace istio.ztunnel.
413+
# Ztunnel counters use `# TYPE foo counter` + `foo_total{} N`, which the legacy parser drops; require the v2 parser.
414+
V2_ZTUNNEL_COUNTER_METRICS = [
414415
'istio.ztunnel.tcp.connections_opened.count',
415416
'istio.ztunnel.tcp.connections_closed.count',
416417
'istio.ztunnel.tcp.send_bytes.count',
417418
'istio.ztunnel.tcp.received_bytes.count',
418-
'istio.ztunnel.dns.requests.count',
419-
'istio.ztunnel.dns.upstream_requests.count',
420-
'istio.ztunnel.dns.upstream_failures.count',
421-
'istio.ztunnel.connection.opens.count',
422-
'istio.ztunnel.connection.closes.count',
419+
'istio.ztunnel.xds.message.count',
420+
'istio.ztunnel.xds.message_bytes.count',
421+
'istio.ztunnel.proxies_started.count',
423422
]
424423

424+
# Gauges, unaffected by the legacy-parser counter bug; split out so the regression test pins only counters.
425+
V2_ZTUNNEL_GAUGE_METRICS = [
426+
'istio.ztunnel.active_proxy_count',
427+
'istio.ztunnel.pending_proxy_count',
428+
]
429+
430+
V2_ZTUNNEL_METRICS = V2_ZTUNNEL_COUNTER_METRICS + V2_ZTUNNEL_GAUGE_METRICS
431+
425432
# Ambient mode (waypoint) - default namespace istio.waypoint
426433
V2_WAYPOINT_METRICS = [
427434
'istio.waypoint.request.count',

0 commit comments

Comments
 (0)