From 9a3873adcb6755cd34ce8f071f2de14e72f32f2a Mon Sep 17 00:00:00 2001 From: Wrisa Date: Thu, 2 Apr 2026 06:07:43 -0700 Subject: [PATCH] Proposal for adding agent name and workflow name on child spans --- .../agent-name-child-spans-proposal.md | 135 +++++++++++++++++ .../workflow-name-child-spans-proposal.md | 137 ++++++++++++++++++ 2 files changed, 272 insertions(+) create mode 100644 util/opentelemetry-util-genai/agent-name-child-spans-proposal.md create mode 100644 util/opentelemetry-util-genai/workflow-name-child-spans-proposal.md diff --git a/util/opentelemetry-util-genai/agent-name-child-spans-proposal.md b/util/opentelemetry-util-genai/agent-name-child-spans-proposal.md new file mode 100644 index 00000000..27c81447 --- /dev/null +++ b/util/opentelemetry-util-genai/agent-name-child-spans-proposal.md @@ -0,0 +1,135 @@ +# Proposal: `gen_ai.agent.name` on GenAI child spans and client metrics + +> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions). +> **Scope:** `gen_ai.agent.name` only — **`gen_ai.agent.id` is explicitly out of scope.** + +--- + +## 1. Motivation / Problem statement + +Multi-agent and orchestrated applications emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans that share the same **`gen_ai.request.model`** or **`gen_ai.tool.name`**. Those attributes alone do not identify **which logical agent** (e.g. planner vs retriever) initiated the operation. + +Today, operators must **reconstruct** agent context by **walking the trace** (e.g. from an `invoke_agent` parent). That is fragile when: + +- Traces are **incomplete**, **sampled**, or spans are analyzed **without** full parent chains. +- Backends need **simple filters** (`operation` + `model` + **agent**) without graph joins on every query. + +**Metrics** for `gen_ai.client.token.usage` and `gen_ai.client.operation.duration` are similarly hard to break down **by agent** without a standard attribute, which blocks **cost**, **latency**, and **SLO** views per agent. + +--- + +## 2. Goals + +- Standardize **`gen_ai.agent.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation is performed **on behalf of a named agent**. +- Add **`gen_ai.agent.name`** as a **documented** dimension on **GenAI client metrics** where it improves breakdown without mandating high cardinality. +- Keep **`gen_ai.agent.name`** as a **low-cardinality**, **logical** agent label (product/agent role), not a per-run identifier. + +--- + +## 3. Non-goals + +- **`gen_ai.agent.id`** or any **instance-level** agent identity on these spans or metrics (explicitly out of scope). +- Changing **required** attribute sets in a way that **breaks** existing instrumentations (prefer **recommended** / **opt-in** for metrics). + +--- + +## 4. Proposed solution + +### 4.1 Semantic meaning + +**`gen_ai.agent.name`** on a **child** span or metric record means: + +> The **logical name** of the agent **on whose behalf** this inference, embedding, retrieval, or tool execution was performed. + +It **SHOULD** align with the name used when that agent is represented by an **`invoke_agent`** (or equivalent) span in the same system, when such a span exists. + +### 4.2 Span convention changes (`gen-ai-spans.md`) + +For each of the following sections, **add** `gen_ai.agent.name` to the span attribute table: + +| Section | Span kinds / notes | +|--------|---------------------| +| Inference | e.g. `chat`, `generate_content`, `text_completion`, … | +| Embeddings | `embeddings` | +| Retrievals | `retrieval` | +| Execute tool | `execute_tool` | + +**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the agent name (typical for agent frameworks / wrappers). Omitted when there is **no** agent concept (raw model client). + +**Documentation notes (normative guidance):** + +- **MUST NOT** use this attribute for **end-user IDs**, **request IDs**, or other **unbounded** values. +- Instrumentations **SHOULD** use a **small, stable** set of names (e.g. `billing_support`, `research_agent`). + +### 4.3 Metric convention changes (`gen-ai-metrics.md`) + +Add **`gen_ai.agent.name`** to metric attribute tables where the operation can be tied to an agent, for example: + +| Metric | Suggested requirement | +|--------|------------------------| +| `gen_ai.client.token.usage` | Recommended when available | +| `gen_ai.client.operation.duration` | Recommended when available | +| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available | +| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available | + +**Guidance:** Same low-cardinality rules as spans; implementations **MAY** omit when no agent context exists. + +--- + +## 5. Use cases / rationale + +### 5.1 Spans + +- **Filtering and grouping** in trace UIs without inferring parent `invoke_agent`. +- **Disambiguation** when the same **model** or **tool** is used by **different** agents. +- **Attribution** when spans are stored or analyzed **without** full trace context. + +### 5.2 Metrics + +- **Token and cost** breakdown by agent. +- **Latency and error** SLOs **per agent** for the same `gen_ai.operation.name` and model. +- **Alerting** scoped to a specific agent’s behavior. + +--- + +## 6. Backward compatibility + +- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions. +- Respect existing GenAI **stability and opt-in** policy for emitting **latest experimental** vs legacy behavior. + +--- + +## 7. Alternatives considered + +| Alternative | Why not chosen | +|-------------|----------------| +| Rely only on **trace parent** (`invoke_agent`) | Fails for partial traces, sampling, and simple backend queries. | +| Use **custom** / vendor-specific attributes | Prevents **portable** dashboards and cross-vendor tooling. | +| Add **`gen_ai.agent.id`** on metrics | High **cardinality** risk; explicitly excluded from this proposal. | +| **Required** `gen_ai.agent.name` on all child spans | Breaks **non-agent** model client usage. | + +--- + +## 8. Open questions + +1. **Nested agents:** Should the spec say **“nearest owning agent”** vs **“root workflow agent”** when multiple agents nest? (Pick one default; allow instrumentation notes.) +2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` metrics—SIG preference for default cardinality. +3. **Streaming metrics:** Include **`gen_ai.agent.name`** on **time_to_first_chunk** / **time_per_output_chunk** in v1 of the change or follow-up PR? + +--- + +## 9. Specification / implementation checklist + +- [ ] Update **`model/`** YAML for affected span and metric definitions. +- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**. +- [ ] **CHANGELOG** entry under GenAI. +- [ ] Optional: examples in **non-normative** docs showing agent-attributed chat + tool spans. + +--- + +## 10. References + +- [OpenTelemetry Semantic Conventions repository](https://github.com/open-telemetry/semantic-conventions) +- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md) +- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md) +- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md) diff --git a/util/opentelemetry-util-genai/workflow-name-child-spans-proposal.md b/util/opentelemetry-util-genai/workflow-name-child-spans-proposal.md new file mode 100644 index 00000000..39b0f818 --- /dev/null +++ b/util/opentelemetry-util-genai/workflow-name-child-spans-proposal.md @@ -0,0 +1,137 @@ +# Proposal: `gen_ai.workflow.name` on GenAI child spans and client metrics + +> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions). +> **Scope:** `gen_ai.workflow.name` only — **workflow instance / id attributes are explicitly out of scope.** + +--- + +## 1. Motivation / Problem statement + +Orchestrated GenAI systems (graphs, crews, pipelines) emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans under a single **logical workflow** (e.g. `customer_support_pipeline`, `travel_planner_graph`). Today, **`gen_ai.workflow.name`** is naturally present on **`invoke_workflow`** (or equivalent) spans, but **child** operations often only show **model**, **tool**, or **provider**, not **which pipeline** they belong to. + +Operators then depend on **trace hierarchy** or custom attributes to answer: + +- Which **workflow** drove this **chat** or **tool** span? +- How do **token usage** and **latency** break down **by workflow** for the same model? + +Without a **standard** attribute on **child** spans and **client** metrics, backends cannot offer **portable** filters, dashboards, or SLOs **by workflow** without vendor-specific keys or parent-span joins. + +--- + +## 2. Goals + +- Standardize **`gen_ai.workflow.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation runs **in the context of a named workflow**. +- Add **`gen_ai.workflow.name`** as a **documented** dimension on **GenAI client metrics** (`gen_ai.client.token.usage`, `gen_ai.client.operation.duration`, and optionally streaming metrics) when workflow context is known. +- Treat the value as **low-cardinality**: a **stable logical name** for the orchestration unit (pipeline / app / graph), not a per-run id. + +--- + +## 3. Proposed solution + +### 3.1 Semantic meaning + +**`gen_ai.workflow.name`** on a **child** span or metric record means: + +> The **logical name** of the **workflow** (orchestration / pipeline) **within which** this inference, embedding, retrieval, or tool execution was performed. + +It **SHOULD** match the value used on the **`invoke_workflow`** span (or the workflow entity) for the same logical run when such a span exists. + +### 3.2 Span convention changes (`gen-ai-spans.md`) + +For each of the following sections, **add** `gen_ai.workflow.name` to the span attribute table: + +| Section | Notes | +|--------|--------| +| Inference | e.g. `chat`, `generate_content`, `text_completion`, … | +| Embeddings | `embeddings` | +| Retrievals | `retrieval` | +| Execute tool | `execute_tool` | + +**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the workflow name (e.g. from framework config, graph metadata, or explicit API). **Omitted** when there is **no** workflow context. + +**Normative guidance:** + +- **MUST NOT** use this attribute for **unbounded** values (raw user input, thread ids as workflow names, UUIDs per invocation). +- **SHOULD** use a **small, stable** set of names aligned with how the application names its pipelines in config or UI. + +### 3.3 Metric convention changes (`gen-ai-metrics.md`) + +Add **`gen_ai.workflow.name`** to metric attribute tables where the operation can be tied to a workflow, for example: + +| Metric | Suggested requirement | +|--------|------------------------| +| `gen_ai.client.token.usage` | Recommended when available | +| `gen_ai.client.operation.duration` | Recommended when available | +| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available | +| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available | + +**Guidance:** Omit when no workflow context exists; same **low-cardinality** rules as spans. + +--- + +## 4. Use cases / rationale + +### 4.1 Spans + +- **Filter and group** child spans **by pipeline** without walking to **`invoke_workflow`**. +- **Compare** the same **model** or **tool** across **different** workflows (e.g. staging vs production pipeline name, or two products sharing one model). +- **Attribution** when spans are **sampled**, **exported in isolation**, or parent links are missing. + +### 4.2 Metrics + +- **Cost and token** usage **by workflow** (which pipeline consumes the most input tokens). +- **Latency and error** SLOs **per workflow** for the same `gen_ai.operation.name` and model. +- **Alerting** on a specific **workflow** without inferring it from trace structure. + +--- + +## 5. Relationship to `gen_ai.agent.name` + +When **both** apply (agent inside a workflow): + +- **Both** attributes **MAY** be set on the same span or metric record: workflow = **orchestration**, agent = **logical agent** within that orchestration. +- The spec **SHOULD** state that neither replaces the other; backends **MAY** group by workflow, agent, or both. + +--- + +## 6. Backward compatibility + +- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions. +- Align with GenAI **stability / opt-in** policy for experimental conventions. + +--- + +## 7. Alternatives considered + +| Alternative | Why not chosen | +|-------------|----------------| +| Rely only on **parent** `invoke_workflow` span | Weak for partial traces, sampling, and metric breakdowns without trace joins. | +| Use **custom** attributes (`workflow`, `pipeline_name`, …) | Hurts **interoperability** and shared dashboards. | +| Add **`gen_ai.workflow.id`** on metrics | **Cardinality** risk; out of scope. | +| **Required** `gen_ai.workflow.name` on all spans | Breaks **non-orchestrated** GenAI usage. | + +--- + +## 8. Open questions + +1. **Nested workflows:** If a span sits inside **nested** orchestration, should instrumentation set the **innermost**, **outermost**, or **both** (outer + inner via a future convention)? Recommend **innermost** as default with a one-line note unless SIG wants **outermost** for product-level reporting. +2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` given cardinality guidance. +3. **Streaming metrics:** Include workflow name on **time_to_first_chunk** / **time_per_output_chunk** in the first PR or a follow-up? + +--- + +## 9. Specification / implementation checklist + +- [ ] Update **`model/`** YAML for affected span and metric definitions. +- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**. +- [ ] **CHANGELOG** entry under GenAI. +- [ ] Optional: non-normative example (LangGraph / multi-step pipeline) showing workflow + agent on a **chat** and **execute_tool** span. + +--- + +## 10. References + +- [OpenTelemetry Semantic Conventions](https://github.com/open-telemetry/semantic-conventions) +- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md) +- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md) +- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)