Skip to content

Commit e3df1e1

Browse files
fix(metrics): remove gen_ai.agent.id from all GenAI metric dimensions (#323)
gen_ai.agent.id was set to the span ID (unique per invocation), causing unbounded metric cardinality across all metric types. This attribute is useful for trace lookup on spans but has no meaningful role in metric aggregation where per-run IDs create metric series explosion. Changes: - Remove gen_ai.agent.id from MetricsEmitter for LLMInvocation, EmbeddingInvocation, ToolCall, MCPToolCall, RetrievalInvocation, and AgentInvocation (where it was hardcoded to span_id) - gen_ai.agent.id remains on all span attributes (no span emitter change) - gen_ai.agent.name is unaffected (bounded cardinality, kept on metrics) - Update test assertions to verify gen_ai.agent.id is absent from metric data points - Document span-only constraint in semconv-reference.md Verified: 213 unit tests pass; end-to-end SRE Copilot run confirmed telemetry forwarded to local OTel collector with no regressions. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 9268959 commit e3df1e1

4 files changed

Lines changed: 19 additions & 41 deletions

File tree

docs/semconv-reference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,7 @@ Note: Including high-cardinality values into metrics association-properties may
282282
| Attribute | Type | Description | OTel Semconv |
283283
|---|---|---|---|
284284
| `gen_ai.agent.name` | string | Human-readable agent name | [Standard](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-agent-spans.md) |
285-
| `gen_ai.agent.id` | string | Unique agent identifier | [Standard](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-agent-spans.md) |
285+
| `gen_ai.agent.id` | string | Unique agent identifier. **Span-only** — excluded from metric dimensions due to unbounded per-invocation cardinality (value equals the span ID). | [Standard](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-agent-spans.md) |
286286
| `gen_ai.agent.description` | string | Agent description | [Standard](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-agent-spans.md) |
287287
| `gen_ai.agent.version` | string | Agent version | [Standard](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-agent-spans.md) |
288288
| `gen_ai.agent.tools` | string[] | Available tool names | **SDOT extension** |
@@ -555,7 +555,7 @@ These attributes follow the current [OTel Gen AI semantic conventions](https://g
555555
| Category | Attributes |
556556
|---|---|
557557
| **Core** | `gen_ai.operation.name`, `gen_ai.provider.name`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.response.id`, `gen_ai.output.type` |
558-
| **Agent** | `gen_ai.agent.name`, `gen_ai.agent.id`, `gen_ai.agent.description`, `gen_ai.agent.version` |
558+
| **Agent** | `gen_ai.agent.name`, `gen_ai.agent.description`, `gen_ai.agent.version` (`gen_ai.agent.id` is **span-only** — see note above) |
559559
| **Workflow** | `gen_ai.workflow.name` |
560560
| **Conversation** | `gen_ai.conversation.id`, `gen_ai.data_source.id` |
561561
| **Tokens** | `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.usage.cache_creation.input_tokens`, `gen_ai.usage.cache_read.input_tokens` |

util/opentelemetry-util-genai/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ All notable changes to this repository are documented in this file.
1010

1111
### Changed
1212

13+
- **`gen_ai.agent.id` removed from all GenAI metric dimensions** — The attribute was set to the span ID (unique per invocation), causing unbounded metric cardinality. It remains available on spans where per-invocation identity is expected and useful. `gen_ai.agent.name` is unaffected.
1314
- **`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` now accepts mode values directly** — Accepts `NO_CONTENT`, `SPAN_ONLY`, `EVENT_ONLY`, `SPAN_AND_EVENT` in addition to legacy `true`/`false`. Aligns with upstream OpenTelemetry GenAI conventions.
1415
- **Removed experimental mode gating** — Content capture no longer requires an experimental stability flag.
1516

util/opentelemetry-util-genai/src/opentelemetry/util/genai/emitters/metrics.py

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,6 @@ def on_end(self, obj: Any) -> None:
105105
metric_attrs[GenAI.GEN_AI_AGENT_NAME] = (
106106
llm_invocation.agent_name
107107
)
108-
if llm_invocation.agent_id:
109-
metric_attrs[GenAI.GEN_AI_AGENT_ID] = llm_invocation.agent_id
110108

111109
# Add session context if configured
112110
metric_attrs.update(get_context_metric_attributes(llm_invocation))
@@ -170,10 +168,6 @@ def on_end(self, obj: Any) -> None:
170168
metric_attrs[GenAI.GEN_AI_AGENT_NAME] = (
171169
embedding_invocation.agent_name
172170
)
173-
if embedding_invocation.agent_id:
174-
metric_attrs[GenAI.GEN_AI_AGENT_ID] = (
175-
embedding_invocation.agent_id
176-
)
177171

178172
# Add session context if configured
179173
metric_attrs.update(
@@ -224,8 +218,6 @@ def on_error(self, error: Error, obj: Any) -> None:
224218
metric_attrs[GenAI.GEN_AI_AGENT_NAME] = (
225219
llm_invocation.agent_name
226220
)
227-
if llm_invocation.agent_id:
228-
metric_attrs[GenAI.GEN_AI_AGENT_ID] = llm_invocation.agent_id
229221
if getattr(error, "type", None) is not None:
230222
metric_attrs[ErrorAttributes.ERROR_TYPE] = (
231223
error.type.__qualname__
@@ -252,8 +244,6 @@ def on_error(self, error: Error, obj: Any) -> None:
252244
metric_attrs[GenAI.GEN_AI_TOOL_NAME] = obj.name
253245
if obj.agent_name:
254246
metric_attrs[GenAI.GEN_AI_AGENT_NAME] = obj.agent_name
255-
if obj.agent_id:
256-
metric_attrs[GenAI.GEN_AI_AGENT_ID] = obj.agent_id
257247
if getattr(error, "type", None) is not None:
258248
metric_attrs[ErrorAttributes.ERROR_TYPE] = (
259249
error.type.__qualname__
@@ -289,8 +279,6 @@ def on_error(self, error: Error, obj: Any) -> None:
289279
metric_attrs[GenAI.GEN_AI_AGENT_NAME] = (
290280
tool_invocation.agent_name
291281
)
292-
if tool_invocation.agent_id:
293-
metric_attrs[GenAI.GEN_AI_AGENT_ID] = tool_invocation.agent_id
294282
if getattr(error, "type", None) is not None:
295283
metric_attrs[ErrorAttributes.ERROR_TYPE] = (
296284
error.type.__qualname__
@@ -319,10 +307,6 @@ def on_error(self, error: Error, obj: Any) -> None:
319307
metric_attrs[GenAI.GEN_AI_AGENT_NAME] = (
320308
embedding_invocation.agent_name
321309
)
322-
if embedding_invocation.agent_id:
323-
metric_attrs[GenAI.GEN_AI_AGENT_ID] = (
324-
embedding_invocation.agent_id
325-
)
326310
if getattr(error, "type", None) is not None:
327311
metric_attrs[ErrorAttributes.ERROR_TYPE] = (
328312
error.type.__qualname__
@@ -395,11 +379,6 @@ def _record_agent_metrics(
395379
metric_attrs = {
396380
GenAI.GEN_AI_OPERATION_NAME: agent.operation,
397381
GenAI.GEN_AI_AGENT_NAME: agent.name,
398-
GenAI.GEN_AI_AGENT_ID: (
399-
f"{agent.span_id:016x}"
400-
if agent.span_id is not None
401-
else str(id(agent))
402-
),
403382
}
404383
if agent.agent_type:
405384
metric_attrs["gen_ai.agent.type"] = agent.agent_type
@@ -437,8 +416,6 @@ def _record_retrieval_metrics(
437416
# Add agent context if available
438417
if retrieval.agent_name:
439418
metric_attrs[GenAI.GEN_AI_AGENT_NAME] = retrieval.agent_name
440-
if retrieval.agent_id:
441-
metric_attrs[GenAI.GEN_AI_AGENT_ID] = retrieval.agent_id
442419
# Add error type if present
443420
if error is not None and getattr(error, "type", None) is not None:
444421
metric_attrs[ErrorAttributes.ERROR_TYPE] = error.type.__qualname__
@@ -467,8 +444,6 @@ def _record_execute_tool_metrics(self, tool: ToolCall) -> None:
467444
metric_attrs[GenAI.GEN_AI_TOOL_NAME] = tool.name
468445
if tool.agent_name:
469446
metric_attrs[GenAI.GEN_AI_AGENT_NAME] = tool.agent_name
470-
if tool.agent_id:
471-
metric_attrs[GenAI.GEN_AI_AGENT_ID] = tool.agent_id
472447
metric_attrs.update(get_context_metric_attributes(tool))
473448
_record_duration(
474449
self._duration_histogram,

util/opentelemetry-util-genai/tests/test_metrics.py

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -260,8 +260,7 @@ def test_llm_metrics_include_agent_identity_when_present(self):
260260
agent_id="agent-123",
261261
)
262262
metrics_list = self._collect_metrics()
263-
# Collect token usage and duration datapoints and assert agent attrs present
264-
# We flatten all datapoints for easier searching
263+
# agent.name (bounded cardinality) is kept; agent.id (per-invocation) is excluded
265264
found_token_agent = False
266265
found_duration_agent = False
267266
for metric in metrics_list:
@@ -270,28 +269,29 @@ def test_llm_metrics_include_agent_identity_when_present(self):
270269
"gen_ai.client.operation.duration",
271270
):
272271
continue
273-
# metric.data.data_points for Histogram-like metrics
274272
data = getattr(metric, "data", None)
275273
if not data:
276274
continue
277275
data_points = getattr(data, "data_points", []) or []
278276
for dp in data_points:
279277
attrs = getattr(dp, "attributes", {}) or {}
280-
if (
281-
attrs.get("gen_ai.agent.name") == "router_agent"
282-
and attrs.get("gen_ai.agent.id") == "agent-123"
283-
):
278+
if attrs.get("gen_ai.agent.name") == "router_agent":
279+
self.assertNotIn(
280+
"gen_ai.agent.id",
281+
attrs,
282+
"gen_ai.agent.id must not appear on metric data points",
283+
)
284284
if metric.name == "gen_ai.client.token.usage":
285285
found_token_agent = True
286286
if metric.name == "gen_ai.client.operation.duration":
287287
found_duration_agent = True
288288
self.assertTrue(
289289
found_token_agent,
290-
"Expected token usage metric datapoint to include agent.name and agent.id",
290+
"Expected token usage metric datapoint to include agent.name",
291291
)
292292
self.assertTrue(
293293
found_duration_agent,
294-
"Expected operation duration metric datapoint to include agent.name and agent.id",
294+
"Expected operation duration metric datapoint to include agent.name",
295295
)
296296

297297
def test_llm_metrics_include_server_attributes(self):
@@ -391,15 +391,17 @@ def test_llm_metrics_inherit_agent_identity_from_context(self):
391391
continue
392392
for dp in getattr(data, "data_points", []) or []:
393393
attrs = getattr(dp, "attributes", {}) or {}
394-
if (
395-
attrs.get("gen_ai.agent.name") == "context_agent"
396-
and attrs.get("gen_ai.agent.id") == "agent-123"
397-
):
394+
if attrs.get("gen_ai.agent.name") == "context_agent":
395+
self.assertNotIn(
396+
"gen_ai.agent.id",
397+
attrs,
398+
"gen_ai.agent.id must not appear on metric data points",
399+
)
398400
inherited = True
399401
break
400402
self.assertTrue(
401403
inherited,
402-
"Expected metrics to inherit agent identity from active agent context",
404+
"Expected metrics to inherit agent.name from active agent context",
403405
)
404406

405407
def test_llm_duration_metric_includes_error_type_on_failure(self):

0 commit comments

Comments
 (0)