signalfx · wrisa · Apr 2, 2026
@@ -0,0 +1,135 @@
+# Proposal: `gen_ai.agent.name` on GenAI child spans and client metrics
+
+> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions).  
+> **Scope:** `gen_ai.agent.name` only — **`gen_ai.agent.id` is explicitly out of scope.**
+
+---
+
+## 1. Motivation / Problem statement
+
+Multi-agent and orchestrated applications emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans that share the same **`gen_ai.request.model`** or **`gen_ai.tool.name`**. Those attributes alone do not identify **which logical agent** (e.g. planner vs retriever) initiated the operation.
+
+Today, operators must **reconstruct** agent context by **walking the trace** (e.g. from an `invoke_agent` parent). That is fragile when:
+
+- Traces are **incomplete**, **sampled**, or spans are analyzed **without** full parent chains.
+- Backends need **simple filters** (`operation` + `model` + **agent**) without graph joins on every query.
+
+**Metrics** for `gen_ai.client.token.usage` and `gen_ai.client.operation.duration` are similarly hard to break down **by agent** without a standard attribute, which blocks **cost**, **latency**, and **SLO** views per agent.
+
+---
+
+## 2. Goals
+
+- Standardize **`gen_ai.agent.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation is performed **on behalf of a named agent**.
+- Add **`gen_ai.agent.name`** as a **documented** dimension on **GenAI client metrics** where it improves breakdown without mandating high cardinality.
+- Keep **`gen_ai.agent.name`** as a **low-cardinality**, **logical** agent label (product/agent role), not a per-run identifier.
+
+---
+
+## 3. Non-goals
+
+- **`gen_ai.agent.id`** or any **instance-level** agent identity on these spans or metrics (explicitly out of scope).
+- Changing **required** attribute sets in a way that **breaks** existing instrumentations (prefer **recommended** / **opt-in** for metrics).
+
+---
+
+## 4. Proposed solution
+
+### 4.1 Semantic meaning
+
+**`gen_ai.agent.name`** on a **child** span or metric record means:
+
+> The **logical name** of the agent **on whose behalf** this inference, embedding, retrieval, or tool execution was performed.
+
+It **SHOULD** align with the name used when that agent is represented by an **`invoke_agent`** (or equivalent) span in the same system, when such a span exists.
+
+### 4.2 Span convention changes (`gen-ai-spans.md`)
+
+For each of the following sections, **add** `gen_ai.agent.name` to the span attribute table:
+
+| Section | Span kinds / notes |
+|--------|---------------------|
+| Inference | e.g. `chat`, `generate_content`, `text_completion`, … |
+| Embeddings | `embeddings` |
+| Retrievals | `retrieval` |
+| Execute tool | `execute_tool` |
+
+**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the agent name (typical for agent frameworks / wrappers). Omitted when there is **no** agent concept (raw model client).
+
+**Documentation notes (normative guidance):**
+
+- **MUST NOT** use this attribute for **end-user IDs**, **request IDs**, or other **unbounded** values.
+- Instrumentations **SHOULD** use a **small, stable** set of names (e.g. `billing_support`, `research_agent`).
+
+### 4.3 Metric convention changes (`gen-ai-metrics.md`)
+
+Add **`gen_ai.agent.name`** to metric attribute tables where the operation can be tied to an agent, for example:
+
+| Metric | Suggested requirement |
+|--------|------------------------|
+| `gen_ai.client.token.usage` | Recommended when available |
+| `gen_ai.client.operation.duration` | Recommended when available |
+| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available |
+| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available |
+
+**Guidance:** Same low-cardinality rules as spans; implementations **MAY** omit when no agent context exists.
+
+---
+
+## 5. Use cases / rationale
+
+### 5.1 Spans
+
+- **Filtering and grouping** in trace UIs without inferring parent `invoke_agent`.
+- **Disambiguation** when the same **model** or **tool** is used by **different** agents.
+- **Attribution** when spans are stored or analyzed **without** full trace context.
+
+### 5.2 Metrics
+
+- **Token and cost** breakdown by agent.
+- **Latency and error** SLOs **per agent** for the same `gen_ai.operation.name` and model.
+- **Alerting** scoped to a specific agent’s behavior.
+
+---
+
+## 6. Backward compatibility
+
+- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions.
+- Respect existing GenAI **stability and opt-in** policy for emitting **latest experimental** vs legacy behavior.
+
+---
+
+## 7. Alternatives considered
+
+| Alternative | Why not chosen |
+|-------------|----------------|
+| Rely only on **trace parent** (`invoke_agent`) | Fails for partial traces, sampling, and simple backend queries. |
+| Use **custom** / vendor-specific attributes | Prevents **portable** dashboards and cross-vendor tooling. |
+| Add **`gen_ai.agent.id`** on metrics | High **cardinality** risk; explicitly excluded from this proposal. |
+| **Required** `gen_ai.agent.name` on all child spans | Breaks **non-agent** model client usage. |
+
+---
+
+## 8. Open questions
+
+1. **Nested agents:** Should the spec say **“nearest owning agent”** vs **“root workflow agent”** when multiple agents nest? (Pick one default; allow instrumentation notes.)
+2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` metrics—SIG preference for default cardinality.
+3. **Streaming metrics:** Include **`gen_ai.agent.name`** on **time_to_first_chunk** / **time_per_output_chunk** in v1 of the change or follow-up PR?
+
+---
+
+## 9. Specification / implementation checklist
+
+- [ ] Update **`model/`** YAML for affected span and metric definitions.
+- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**.
+- [ ] **CHANGELOG** entry under GenAI.
+- [ ] Optional: examples in **non-normative** docs showing agent-attributed chat + tool spans.
+
+---
+
+## 10. References
+
+- [OpenTelemetry Semantic Conventions repository](https://github.com/open-telemetry/semantic-conventions)
+- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md)
+- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md)
+- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)
@@ -0,0 +1,137 @@
+# Proposal: `gen_ai.workflow.name` on GenAI child spans and client metrics
+
+> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions).  
+> **Scope:** `gen_ai.workflow.name` only — **workflow instance / id attributes are explicitly out of scope.**
+
+---
+
+## 1. Motivation / Problem statement
+
+Orchestrated GenAI systems (graphs, crews, pipelines) emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans under a single **logical workflow** (e.g. `customer_support_pipeline`, `travel_planner_graph`). Today, **`gen_ai.workflow.name`** is naturally present on **`invoke_workflow`** (or equivalent) spans, but **child** operations often only show **model**, **tool**, or **provider**, not **which pipeline** they belong to.
+
+Operators then depend on **trace hierarchy** or custom attributes to answer:
+
+- Which **workflow** drove this **chat** or **tool** span?
+- How do **token usage** and **latency** break down **by workflow** for the same model?
+
+Without a **standard** attribute on **child** spans and **client** metrics, backends cannot offer **portable** filters, dashboards, or SLOs **by workflow** without vendor-specific keys or parent-span joins.
+
+---
+
+## 2. Goals
+
+- Standardize **`gen_ai.workflow.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation runs **in the context of a named workflow**.
+- Add **`gen_ai.workflow.name`** as a **documented** dimension on **GenAI client metrics** (`gen_ai.client.token.usage`, `gen_ai.client.operation.duration`, and optionally streaming metrics) when workflow context is known.
+- Treat the value as **low-cardinality**: a **stable logical name** for the orchestration unit (pipeline / app / graph), not a per-run id.
+
+---
+
+## 3. Proposed solution
+
+### 3.1 Semantic meaning
+
+**`gen_ai.workflow.name`** on a **child** span or metric record means:
+
+> The **logical name** of the **workflow** (orchestration / pipeline) **within which** this inference, embedding, retrieval, or tool execution was performed.
+
+It **SHOULD** match the value used on the **`invoke_workflow`** span (or the workflow entity) for the same logical run when such a span exists.
+
+### 3.2 Span convention changes (`gen-ai-spans.md`)
+
+For each of the following sections, **add** `gen_ai.workflow.name` to the span attribute table:
+
+| Section | Notes |
+|--------|--------|
+| Inference | e.g. `chat`, `generate_content`, `text_completion`, … |
+| Embeddings | `embeddings` |
+| Retrievals | `retrieval` |
+| Execute tool | `execute_tool` |
+
+**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the workflow name (e.g. from framework config, graph metadata, or explicit API). **Omitted** when there is **no** workflow context.
+
+**Normative guidance:**
+
+- **MUST NOT** use this attribute for **unbounded** values (raw user input, thread ids as workflow names, UUIDs per invocation).
+- **SHOULD** use a **small, stable** set of names aligned with how the application names its pipelines in config or UI.
+
+### 3.3 Metric convention changes (`gen-ai-metrics.md`)
+
+Add **`gen_ai.workflow.name`** to metric attribute tables where the operation can be tied to a workflow, for example:
+
+| Metric | Suggested requirement |
+|--------|------------------------|
+| `gen_ai.client.token.usage` | Recommended when available |
+| `gen_ai.client.operation.duration` | Recommended when available |
+| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available |
+| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available |
+
+**Guidance:** Omit when no workflow context exists; same **low-cardinality** rules as spans.
+
+---
+
+## 4. Use cases / rationale
+
+### 4.1 Spans
+
+- **Filter and group** child spans **by pipeline** without walking to **`invoke_workflow`**.
+- **Compare** the same **model** or **tool** across **different** workflows (e.g. staging vs production pipeline name, or two products sharing one model).
+- **Attribution** when spans are **sampled**, **exported in isolation**, or parent links are missing.
+
+### 4.2 Metrics
+
+- **Cost and token** usage **by workflow** (which pipeline consumes the most input tokens).
+- **Latency and error** SLOs **per workflow** for the same `gen_ai.operation.name` and model.
+- **Alerting** on a specific **workflow** without inferring it from trace structure.
+
+---
+
+## 5. Relationship to `gen_ai.agent.name`
+
+When **both** apply (agent inside a workflow):
+
+- **Both** attributes **MAY** be set on the same span or metric record: workflow = **orchestration**, agent = **logical agent** within that orchestration.
+- The spec **SHOULD** state that neither replaces the other; backends **MAY** group by workflow, agent, or both.
+
+---
+
+## 6. Backward compatibility
+
+- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions.
+- Align with GenAI **stability / opt-in** policy for experimental conventions.
+
+---
+
+## 7. Alternatives considered
+
+| Alternative | Why not chosen |
+|-------------|----------------|
+| Rely only on **parent** `invoke_workflow` span | Weak for partial traces, sampling, and metric breakdowns without trace joins. |
+| Use **custom** attributes (`workflow`, `pipeline_name`, …) | Hurts **interoperability** and shared dashboards. |
+| Add **`gen_ai.workflow.id`** on metrics | **Cardinality** risk; out of scope. |
+| **Required** `gen_ai.workflow.name` on all spans | Breaks **non-orchestrated** GenAI usage. |
+
+---
+
+## 8. Open questions
+
+1. **Nested workflows:** If a span sits inside **nested** orchestration, should instrumentation set the **innermost**, **outermost**, or **both** (outer + inner via a future convention)? Recommend **innermost** as default with a one-line note unless SIG wants **outermost** for product-level reporting.
+2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` given cardinality guidance.
+3. **Streaming metrics:** Include workflow name on **time_to_first_chunk** / **time_per_output_chunk** in the first PR or a follow-up?
+
+---
+
+## 9. Specification / implementation checklist
+
+- [ ] Update **`model/`** YAML for affected span and metric definitions.
+- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**.
+- [ ] **CHANGELOG** entry under GenAI.
+- [ ] Optional: non-normative example (LangGraph / multi-step pipeline) showing workflow + agent on a **chat** and **execute_tool** span.
+
+---
+
+## 10. References
+
+- [OpenTelemetry Semantic Conventions](https://github.com/open-telemetry/semantic-conventions)
+- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md)
+- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md)
+- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)