signalfx · wrisa · Apr 5, 2026 · Apr 5, 2026 · Apr 5, 2026
@@ -0,0 +1,140 @@
+# Proposal: `gen_ai.agent.name` on GenAI child spans and client metrics
+
+> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions).  
+> **Scope:** `gen_ai.agent.name` only — **`gen_ai.agent.id` is explicitly out of scope.**
+
+---
+
+## 1. Motivation / Problem statement
+
+Multi-agent and orchestrated applications emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans that share the same **`gen_ai.request.model`** or **`gen_ai.tool.name`**. Those attributes alone do not identify **which logical agent** (e.g. planner vs retriever) initiated the operation.
+
+**Metrics** for `gen_ai.client.token.usage` and `gen_ai.client.operation.duration` are similarly hard to break down **by agent** without a standard attribute, which blocks **cost**, **latency**, and **SLO** views per agent.
+
+---
+
+## 2. Goals
+
+- Standardize **`gen_ai.agent.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation is performed **on behalf of a named agent**.
+- Add **`gen_ai.agent.name`** as a **documented** dimension on **GenAI client metrics** where it improves breakdown without mandating high cardinality.
+- Keep **`gen_ai.agent.name`** as a **low-cardinality**, **logical** agent label (product/agent role), not a per-run identifier.
+
+---
+
+## 3. Proposed solution
+
+### 3.1 Semantic meaning
+
+**`gen_ai.agent.name`** on a **child** span or metric record means:
+
+> The **logical name** of the agent **on whose behalf** this inference, embedding, retrieval, or tool execution was performed.
+
+It **SHOULD** align with the name used when that agent is represented by an **`invoke_agent`** (or equivalent) span in the same system, when such a span exists.
+
+### 3.2 Span convention changes (`gen-ai-spans.md`)
+
+For each of the following sections, **add** `gen_ai.agent.name` to the span attribute table:
+
+| Section | Span kinds / notes |
+|--------|---------------------|
+| Inference | e.g. `chat`, `generate_content`, `text_completion`, … |
+| Embeddings | `embeddings` |
+| Retrievals | `retrieval` |
+| Execute tool | `execute_tool` |
+
+**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the agent name (typical for agent frameworks / wrappers). Omitted when there is **no** agent concept (raw model client).
+
+**Documentation notes (normative guidance):**
+
+- **MUST NOT** use this attribute for **end-user IDs**, **request IDs**, or other **unbounded** values.
+- Instrumentations **SHOULD** use a **small, stable** set of names (e.g. `billing_support`, `research_agent`).
+
+### 3.3 Metric convention changes (`gen-ai-metrics.md`)
+
+Add **`gen_ai.agent.name`** to metric attribute tables where the operation can be tied to an agent, for example:
+
+| Metric | Suggested requirement |
+|--------|------------------------|
+| `gen_ai.client.token.usage` | Recommended when available |
+| `gen_ai.client.operation.duration` | Recommended when available |
+| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available |
+| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available |
+
+**Guidance:** Same low-cardinality rules as spans; implementations **MAY** omit when no agent context exists.
+
+---
+
+## 4. Use cases / rationale
+
+### 4.1 Spans
+
+- **Filtering and grouping** in trace UIs without inferring parent `invoke_agent`.
+- **Disambiguation** when the same **model** or **tool** is used by **different** agents.
+
+### 4.2 Metrics
+
+- **Token and cost** breakdown by agent.
+- **Latency and error** SLOs **per agent** for the same `gen_ai.operation.name` and model.
+
+---
+
+## 5. Sample screenshots (Splunk Observability Cloud)
+
+The following examples use a **travel-planner** style multi-agent app (LangGraph-style workflow with `invoke_workflow`, `invoke_agent`, `chat`, and `execute_tool` spans). They illustrate how **`gen_ai.agent.name`** on **child** spans and **client metrics** appears in **Splunk APM** and **chart builders** when using OpenTelemetry GenAI instrumentation.
+
+### 5.1 Trace view — inference (`chat`) span
+
+A **`chat`** span for `gpt-4.1-mini` under the **coordinator** agent shows **`gen_ai.agent.name`: `coordinator`** in span properties, alongside `gen_ai.operation.name`, token usage, and model attributes—without inferring the agent only from a parent `invoke_agent` row in the UI.
+
+![Splunk APM trace view: chat span with gen_ai.agent.name](images/splunk-apm-trace-chat-span-agent-name.png)
+
+### 5.2 Trace view — `execute_tool` span
+
+An **`execute_tool`** span (**`mock_search_flights`**) carries **`gen_ai.agent.name`: `flight_specialist`**, linking the tool execution to the agent that invoked it in the same trace.
+
+![Splunk APM trace view: execute_tool span with gen_ai.agent.name](images/splunk-apm-trace-execute-tool-span-agent-name.png)
+
+### 5.3 Metrics — duration by agent for `execute_tool`
+
+**`gen_ai.client.operation.duration`** can be filtered (e.g. `gen_ai.operation.name: execute_tool`) and broken down or filtered by **`gen_ai.agent.name`** (`hotel_specialist`, `flight_specialist`, `activity_specialist`, …) in the plot editor.
+
+![Splunk chart: execute_tool duration with gen_ai.agent.name](images/splunk-chart-execute-tool-duration-by-agent.png)
+
+### 5.4 Metrics — duration by agent for `chat`
+
+The same pattern applies to **`chat`** operations: filter on **`gen_ai.operation.name: chat`** and use **`gen_ai.agent.name`** to compare **coordinator** vs specialist agents.
+
+![Splunk chart: chat duration with gen_ai.agent.name](images/splunk-chart-chat-duration-by-agent.png)
+
+---
+
+## 6. Backward compatibility
+
+- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions.
+- Respect existing GenAI **stability and opt-in** policy for emitting **latest experimental** vs legacy behavior.
+
+---
+
+## 7. Open questions
+
+1. **Nested agents:** Should the spec say **“nearest owning agent”** vs **“root workflow agent”** when multiple agents nest? (Pick one default; allow instrumentation notes.)
+2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` metrics—SIG preference for default cardinality.
+3. **Streaming metrics:** Include **`gen_ai.agent.name`** on **time_to_first_chunk** / **time_per_output_chunk** in v1 of the change or follow-up PR?
+
+---
+
+## 8. Specification / implementation checklist
+
+- [ ] Update **`model/`** YAML for affected span and metric definitions.
+- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**.
+- [ ] **CHANGELOG** entry under GenAI.
+- [ ] Optional: examples in **non-normative** docs showing agent-attributed chat + tool spans.
+
+---
+
+## 9. References
+
+- [OpenTelemetry Semantic Conventions repository](https://github.com/open-telemetry/semantic-conventions)
+- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md)
+- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md)
+- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)
@@ -0,0 +1,154 @@
+# Proposal: `gen_ai.workflow.name` on GenAI child spans and client metrics
+
+> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions).  
+> **Scope:** `gen_ai.workflow.name` only — **workflow instance / id attributes are explicitly out of scope.**
+
+---
+
+## 1. Motivation / Problem statement
+
+Orchestrated GenAI systems (graphs, crews, pipelines) emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans under a single **logical workflow** (e.g. `customer_support_pipeline`, `travel_planner_graph`). Today, **`gen_ai.workflow.name`** is naturally present on **`invoke_workflow`** (or equivalent) spans, but **child** operations often only show **model**, **tool**, or **provider**, not **which pipeline** they belong to.
+
+Operators then depend on **trace hierarchy** or custom attributes to answer:
+
+- Which **workflow** drove this **chat** or **tool** span?
+- How do **token usage** and **latency** break down **by workflow** for the same model?
+
+Without a **standard** attribute on **child** spans and **client** metrics, backends cannot offer **portable** filters, dashboards, or SLOs **by workflow** without vendor-specific keys or parent-span joins.
+
+---
+
+## 2. Goals
+
+- Standardize **`gen_ai.workflow.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation runs **in the context of a named workflow**.
+- Add **`gen_ai.workflow.name`** as a **documented** dimension on **GenAI client metrics** (`gen_ai.client.token.usage`, `gen_ai.client.operation.duration`, and optionally streaming metrics) when workflow context is known.
+- Treat the value as **low-cardinality**: a **stable logical name** for the orchestration unit (pipeline / app / graph), not a per-run id.
+
+---
+
+## 3. Proposed solution
+
+### 3.1 Semantic meaning
+
+**`gen_ai.workflow.name`** on a **child** span or metric record means:
+
+> The **logical name** of the **workflow** (orchestration / pipeline) **within which** this inference, embedding, retrieval, or tool execution was performed.
+
+It **SHOULD** match the value used on the **`invoke_workflow`** span (or the workflow entity) for the same logical run when such a span exists.
+
+### 3.2 Span convention changes (`gen-ai-spans.md`)
+
+For each of the following sections, **add** `gen_ai.workflow.name` to the span attribute table:
+
+| Section | Notes |
+|--------|--------|
+| Inference | e.g. `chat`, `generate_content`, `text_completion`, … |
+| Embeddings | `embeddings` |
+| Retrievals | `retrieval` |
+| Execute tool | `execute_tool` |
+
+**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the workflow name (e.g. from framework config, graph metadata, or explicit API). **Omitted** when there is **no** workflow context.
+
+**Normative guidance:**
+
+- **MUST NOT** use this attribute for **unbounded** values (raw user input, thread ids as workflow names, UUIDs per invocation).
+- **SHOULD** use a **small, stable** set of names aligned with how the application names its pipelines in config or UI.
+
+### 3.3 Metric convention changes (`gen-ai-metrics.md`)
+
+Add **`gen_ai.workflow.name`** to metric attribute tables where the operation can be tied to a workflow, for example:
+
+| Metric | Suggested requirement |
+|--------|------------------------|
+| `gen_ai.client.token.usage` | Recommended when available |
+| `gen_ai.client.operation.duration` | Recommended when available |
+| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available |
+| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available |
+
+**Guidance:** Omit when no workflow context exists; same **low-cardinality** rules as spans.
+
+---
+
+## 4. Use cases / rationale
+
+### 4.1 Spans
+
+- **Filter and group** child spans **by pipeline** without walking to **`invoke_workflow`**.
+- **Compare** the same **model** or **tool** across **different** workflows (e.g. staging vs production pipeline name, or two products sharing one model).
+
+### 4.2 Metrics
+
+- **Cost and token** usage **by workflow** (which pipeline consumes the most input tokens).
+- **Latency and error** SLOs **per workflow** for the same `gen_ai.operation.name` and model.
+
+---
+
+## 5. Sample screenshots (Splunk Observability Cloud)
+
+The images below are **illustrative mockups**
+
+### 5.1 Trace view — inference (`chat`) span
+
+A **`chat`** span for `gpt-4.1-mini` nested under a **LangGraph** workflow shows **`gen_ai.workflow.name`: `LangGraph`** in span properties, alongside `gen_ai.operation.name`, token usage, and model attributes—so the pipeline is visible on the **child** span, not only on **`invoke_workflow`**.
+
+![Splunk-style trace mockup: chat span with gen_ai.workflow.name](images/splunk-apm-trace-chat-span-workflow-name.png)
+
+### 5.2 Trace view — `execute_tool` span
+
+An **`execute_tool`** span (**`mock_search_flights`**) carries **`gen_ai.workflow.name`: `LangGraph`**, linking the tool execution to the **workflow** that owns the run.
+
+![Splunk-style trace mockup: execute_tool span with gen_ai.workflow.name](images/splunk-apm-trace-execute-tool-span-workflow-name.png)
+
+### 5.3 Metrics — duration by workflow for `execute_tool`
+
+**`gen_ai.client.operation.duration`** can be filtered (e.g. `gen_ai.operation.name: execute_tool`) and broken down or filtered by **`gen_ai.workflow.name`** (`travel_booking_pipeline`, `support_triage`, `content_review`, …) in the plot editor.
+
+![Splunk-style chart mockup: execute_tool duration with gen_ai.workflow.name](images/splunk-chart-execute-tool-duration-by-workflow.png)
+
+### 5.4 Metrics — duration by workflow for `chat`
+
+The same pattern applies to **`chat`** operations: filter on **`gen_ai.operation.name: chat`** and use **`gen_ai.workflow.name`** to compare pipelines (e.g. **LangGraph** vs **`customer_support_pipeline`**).
+
+![Splunk-style chart mockup: chat duration with gen_ai.workflow.name](images/splunk-chart-chat-duration-by-workflow.png)
+
+---
+
+## 6. Relationship to `gen_ai.agent.name`
+
+When **both** apply (agent inside a workflow):
+
+- **Both** attributes **MAY** be set on the same span or metric record: workflow = **orchestration**, agent = **logical agent** within that orchestration.
+- The spec **SHOULD** state that neither replaces the other; backends **MAY** group by workflow, agent, or both.
+
+---
+
+## 7. Backward compatibility
+
+- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions.
+- Align with GenAI **stability / opt-in** policy for experimental conventions.
+
+---
+
+## 8. Open questions
+
+1. **Nested workflows:** If a span sits inside **nested** orchestration, should instrumentation set the **innermost**, **outermost**, or **both** (outer + inner via a future convention)? Recommend **innermost** as default with a one-line note unless SIG wants **outermost** for product-level reporting.
+2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` given cardinality guidance.
+3. **Streaming metrics:** Include workflow name on **time_to_first_chunk** / **time_per_output_chunk** in the first PR or a follow-up?
+
+---
+
+## 9. Specification / implementation checklist
+
+- [ ] Update **`model/`** YAML for affected span and metric definitions.
+- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**.
+- [ ] **CHANGELOG** entry under GenAI.
+- [ ] Optional: non-normative example (LangGraph / multi-step pipeline) showing workflow + agent on a **chat** and **execute_tool** span.
+
+---
+
+## 10. References
+
+- [OpenTelemetry Semantic Conventions](https://github.com/open-telemetry/semantic-conventions)
+- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md)
+- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md)
+- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)