diff --git a/util/opentelemetry-util-genai/agent_name.md b/util/opentelemetry-util-genai/agent_name.md new file mode 100644 index 00000000..ab67b493 --- /dev/null +++ b/util/opentelemetry-util-genai/agent_name.md @@ -0,0 +1,140 @@ +# Proposal: `gen_ai.agent.name` on GenAI child spans and client metrics + +> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions). +> **Scope:** `gen_ai.agent.name` only — **`gen_ai.agent.id` is explicitly out of scope.** + +--- + +## 1. Motivation / Problem statement + +Multi-agent and orchestrated applications emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans that share the same **`gen_ai.request.model`** or **`gen_ai.tool.name`**. Those attributes alone do not identify **which logical agent** (e.g. planner vs retriever) initiated the operation. + +**Metrics** for `gen_ai.client.token.usage` and `gen_ai.client.operation.duration` are similarly hard to break down **by agent** without a standard attribute, which blocks **cost**, **latency**, and **SLO** views per agent. + +--- + +## 2. Goals + +- Standardize **`gen_ai.agent.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation is performed **on behalf of a named agent**. +- Add **`gen_ai.agent.name`** as a **documented** dimension on **GenAI client metrics** where it improves breakdown without mandating high cardinality. +- Keep **`gen_ai.agent.name`** as a **low-cardinality**, **logical** agent label (product/agent role), not a per-run identifier. + +--- + +## 3. Proposed solution + +### 3.1 Semantic meaning + +**`gen_ai.agent.name`** on a **child** span or metric record means: + +> The **logical name** of the agent **on whose behalf** this inference, embedding, retrieval, or tool execution was performed. + +It **SHOULD** align with the name used when that agent is represented by an **`invoke_agent`** (or equivalent) span in the same system, when such a span exists. + +### 3.2 Span convention changes (`gen-ai-spans.md`) + +For each of the following sections, **add** `gen_ai.agent.name` to the span attribute table: + +| Section | Span kinds / notes | +|--------|---------------------| +| Inference | e.g. `chat`, `generate_content`, `text_completion`, … | +| Embeddings | `embeddings` | +| Retrievals | `retrieval` | +| Execute tool | `execute_tool` | + +**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the agent name (typical for agent frameworks / wrappers). Omitted when there is **no** agent concept (raw model client). + +**Documentation notes (normative guidance):** + +- **MUST NOT** use this attribute for **end-user IDs**, **request IDs**, or other **unbounded** values. +- Instrumentations **SHOULD** use a **small, stable** set of names (e.g. `billing_support`, `research_agent`). + +### 3.3 Metric convention changes (`gen-ai-metrics.md`) + +Add **`gen_ai.agent.name`** to metric attribute tables where the operation can be tied to an agent, for example: + +| Metric | Suggested requirement | +|--------|------------------------| +| `gen_ai.client.token.usage` | Recommended when available | +| `gen_ai.client.operation.duration` | Recommended when available | +| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available | +| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available | + +**Guidance:** Same low-cardinality rules as spans; implementations **MAY** omit when no agent context exists. + +--- + +## 4. Use cases / rationale + +### 4.1 Spans + +- **Filtering and grouping** in trace UIs without inferring parent `invoke_agent`. +- **Disambiguation** when the same **model** or **tool** is used by **different** agents. + +### 4.2 Metrics + +- **Token and cost** breakdown by agent. +- **Latency and error** SLOs **per agent** for the same `gen_ai.operation.name` and model. + +--- + +## 5. Sample screenshots (Splunk Observability Cloud) + +The following examples use a **travel-planner** style multi-agent app (LangGraph-style workflow with `invoke_workflow`, `invoke_agent`, `chat`, and `execute_tool` spans). They illustrate how **`gen_ai.agent.name`** on **child** spans and **client metrics** appears in **Splunk APM** and **chart builders** when using OpenTelemetry GenAI instrumentation. + +### 5.1 Trace view — inference (`chat`) span + +A **`chat`** span for `gpt-4.1-mini` under the **coordinator** agent shows **`gen_ai.agent.name`: `coordinator`** in span properties, alongside `gen_ai.operation.name`, token usage, and model attributes—without inferring the agent only from a parent `invoke_agent` row in the UI. + +![Splunk APM trace view: chat span with gen_ai.agent.name](images/splunk-apm-trace-chat-span-agent-name.png) + +### 5.2 Trace view — `execute_tool` span + +An **`execute_tool`** span (**`mock_search_flights`**) carries **`gen_ai.agent.name`: `flight_specialist`**, linking the tool execution to the agent that invoked it in the same trace. + +![Splunk APM trace view: execute_tool span with gen_ai.agent.name](images/splunk-apm-trace-execute-tool-span-agent-name.png) + +### 5.3 Metrics — duration by agent for `execute_tool` + +**`gen_ai.client.operation.duration`** can be filtered (e.g. `gen_ai.operation.name: execute_tool`) and broken down or filtered by **`gen_ai.agent.name`** (`hotel_specialist`, `flight_specialist`, `activity_specialist`, …) in the plot editor. + +![Splunk chart: execute_tool duration with gen_ai.agent.name](images/splunk-chart-execute-tool-duration-by-agent.png) + +### 5.4 Metrics — duration by agent for `chat` + +The same pattern applies to **`chat`** operations: filter on **`gen_ai.operation.name: chat`** and use **`gen_ai.agent.name`** to compare **coordinator** vs specialist agents. + +![Splunk chart: chat duration with gen_ai.agent.name](images/splunk-chart-chat-duration-by-agent.png) + +--- + +## 6. Backward compatibility + +- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions. +- Respect existing GenAI **stability and opt-in** policy for emitting **latest experimental** vs legacy behavior. + +--- + +## 7. Open questions + +1. **Nested agents:** Should the spec say **“nearest owning agent”** vs **“root workflow agent”** when multiple agents nest? (Pick one default; allow instrumentation notes.) +2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` metrics—SIG preference for default cardinality. +3. **Streaming metrics:** Include **`gen_ai.agent.name`** on **time_to_first_chunk** / **time_per_output_chunk** in v1 of the change or follow-up PR? + +--- + +## 8. Specification / implementation checklist + +- [ ] Update **`model/`** YAML for affected span and metric definitions. +- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**. +- [ ] **CHANGELOG** entry under GenAI. +- [ ] Optional: examples in **non-normative** docs showing agent-attributed chat + tool spans. + +--- + +## 9. References + +- [OpenTelemetry Semantic Conventions repository](https://github.com/open-telemetry/semantic-conventions) +- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md) +- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md) +- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md) diff --git a/util/opentelemetry-util-genai/images/splunk-apm-trace-chat-span-agent-name.png b/util/opentelemetry-util-genai/images/splunk-apm-trace-chat-span-agent-name.png new file mode 100644 index 00000000..641e089d Binary files /dev/null and b/util/opentelemetry-util-genai/images/splunk-apm-trace-chat-span-agent-name.png differ diff --git a/util/opentelemetry-util-genai/images/splunk-apm-trace-chat-span-workflow-name.png b/util/opentelemetry-util-genai/images/splunk-apm-trace-chat-span-workflow-name.png new file mode 100644 index 00000000..f8103bb1 Binary files /dev/null and b/util/opentelemetry-util-genai/images/splunk-apm-trace-chat-span-workflow-name.png differ diff --git a/util/opentelemetry-util-genai/images/splunk-apm-trace-execute-tool-span-agent-name.png b/util/opentelemetry-util-genai/images/splunk-apm-trace-execute-tool-span-agent-name.png new file mode 100644 index 00000000..2bffdf3d Binary files /dev/null and b/util/opentelemetry-util-genai/images/splunk-apm-trace-execute-tool-span-agent-name.png differ diff --git a/util/opentelemetry-util-genai/images/splunk-apm-trace-execute-tool-span-workflow-name.png b/util/opentelemetry-util-genai/images/splunk-apm-trace-execute-tool-span-workflow-name.png new file mode 100644 index 00000000..548d7d46 Binary files /dev/null and b/util/opentelemetry-util-genai/images/splunk-apm-trace-execute-tool-span-workflow-name.png differ diff --git a/util/opentelemetry-util-genai/images/splunk-chart-chat-duration-by-agent.png b/util/opentelemetry-util-genai/images/splunk-chart-chat-duration-by-agent.png new file mode 100644 index 00000000..8817d781 Binary files /dev/null and b/util/opentelemetry-util-genai/images/splunk-chart-chat-duration-by-agent.png differ diff --git a/util/opentelemetry-util-genai/images/splunk-chart-chat-duration-by-workflow.png b/util/opentelemetry-util-genai/images/splunk-chart-chat-duration-by-workflow.png new file mode 100644 index 00000000..2058e607 Binary files /dev/null and b/util/opentelemetry-util-genai/images/splunk-chart-chat-duration-by-workflow.png differ diff --git a/util/opentelemetry-util-genai/images/splunk-chart-execute-tool-duration-by-agent.png b/util/opentelemetry-util-genai/images/splunk-chart-execute-tool-duration-by-agent.png new file mode 100644 index 00000000..58d1a90c Binary files /dev/null and b/util/opentelemetry-util-genai/images/splunk-chart-execute-tool-duration-by-agent.png differ diff --git a/util/opentelemetry-util-genai/images/splunk-chart-execute-tool-duration-by-workflow.png b/util/opentelemetry-util-genai/images/splunk-chart-execute-tool-duration-by-workflow.png new file mode 100644 index 00000000..58fcfbd3 Binary files /dev/null and b/util/opentelemetry-util-genai/images/splunk-chart-execute-tool-duration-by-workflow.png differ diff --git a/util/opentelemetry-util-genai/workflow_name.md b/util/opentelemetry-util-genai/workflow_name.md new file mode 100644 index 00000000..d795ea51 --- /dev/null +++ b/util/opentelemetry-util-genai/workflow_name.md @@ -0,0 +1,154 @@ +# Proposal: `gen_ai.workflow.name` on GenAI child spans and client metrics + +> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions). +> **Scope:** `gen_ai.workflow.name` only — **workflow instance / id attributes are explicitly out of scope.** + +--- + +## 1. Motivation / Problem statement + +Orchestrated GenAI systems (graphs, crews, pipelines) emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans under a single **logical workflow** (e.g. `customer_support_pipeline`, `travel_planner_graph`). Today, **`gen_ai.workflow.name`** is naturally present on **`invoke_workflow`** (or equivalent) spans, but **child** operations often only show **model**, **tool**, or **provider**, not **which pipeline** they belong to. + +Operators then depend on **trace hierarchy** or custom attributes to answer: + +- Which **workflow** drove this **chat** or **tool** span? +- How do **token usage** and **latency** break down **by workflow** for the same model? + +Without a **standard** attribute on **child** spans and **client** metrics, backends cannot offer **portable** filters, dashboards, or SLOs **by workflow** without vendor-specific keys or parent-span joins. + +--- + +## 2. Goals + +- Standardize **`gen_ai.workflow.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation runs **in the context of a named workflow**. +- Add **`gen_ai.workflow.name`** as a **documented** dimension on **GenAI client metrics** (`gen_ai.client.token.usage`, `gen_ai.client.operation.duration`, and optionally streaming metrics) when workflow context is known. +- Treat the value as **low-cardinality**: a **stable logical name** for the orchestration unit (pipeline / app / graph), not a per-run id. + +--- + +## 3. Proposed solution + +### 3.1 Semantic meaning + +**`gen_ai.workflow.name`** on a **child** span or metric record means: + +> The **logical name** of the **workflow** (orchestration / pipeline) **within which** this inference, embedding, retrieval, or tool execution was performed. + +It **SHOULD** match the value used on the **`invoke_workflow`** span (or the workflow entity) for the same logical run when such a span exists. + +### 3.2 Span convention changes (`gen-ai-spans.md`) + +For each of the following sections, **add** `gen_ai.workflow.name` to the span attribute table: + +| Section | Notes | +|--------|--------| +| Inference | e.g. `chat`, `generate_content`, `text_completion`, … | +| Embeddings | `embeddings` | +| Retrievals | `retrieval` | +| Execute tool | `execute_tool` | + +**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the workflow name (e.g. from framework config, graph metadata, or explicit API). **Omitted** when there is **no** workflow context. + +**Normative guidance:** + +- **MUST NOT** use this attribute for **unbounded** values (raw user input, thread ids as workflow names, UUIDs per invocation). +- **SHOULD** use a **small, stable** set of names aligned with how the application names its pipelines in config or UI. + +### 3.3 Metric convention changes (`gen-ai-metrics.md`) + +Add **`gen_ai.workflow.name`** to metric attribute tables where the operation can be tied to a workflow, for example: + +| Metric | Suggested requirement | +|--------|------------------------| +| `gen_ai.client.token.usage` | Recommended when available | +| `gen_ai.client.operation.duration` | Recommended when available | +| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available | +| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available | + +**Guidance:** Omit when no workflow context exists; same **low-cardinality** rules as spans. + +--- + +## 4. Use cases / rationale + +### 4.1 Spans + +- **Filter and group** child spans **by pipeline** without walking to **`invoke_workflow`**. +- **Compare** the same **model** or **tool** across **different** workflows (e.g. staging vs production pipeline name, or two products sharing one model). + +### 4.2 Metrics + +- **Cost and token** usage **by workflow** (which pipeline consumes the most input tokens). +- **Latency and error** SLOs **per workflow** for the same `gen_ai.operation.name` and model. + +--- + +## 5. Sample screenshots (Splunk Observability Cloud) + +The images below are **illustrative mockups** + +### 5.1 Trace view — inference (`chat`) span + +A **`chat`** span for `gpt-4.1-mini` nested under a **LangGraph** workflow shows **`gen_ai.workflow.name`: `LangGraph`** in span properties, alongside `gen_ai.operation.name`, token usage, and model attributes—so the pipeline is visible on the **child** span, not only on **`invoke_workflow`**. + +![Splunk-style trace mockup: chat span with gen_ai.workflow.name](images/splunk-apm-trace-chat-span-workflow-name.png) + +### 5.2 Trace view — `execute_tool` span + +An **`execute_tool`** span (**`mock_search_flights`**) carries **`gen_ai.workflow.name`: `LangGraph`**, linking the tool execution to the **workflow** that owns the run. + +![Splunk-style trace mockup: execute_tool span with gen_ai.workflow.name](images/splunk-apm-trace-execute-tool-span-workflow-name.png) + +### 5.3 Metrics — duration by workflow for `execute_tool` + +**`gen_ai.client.operation.duration`** can be filtered (e.g. `gen_ai.operation.name: execute_tool`) and broken down or filtered by **`gen_ai.workflow.name`** (`travel_booking_pipeline`, `support_triage`, `content_review`, …) in the plot editor. + +![Splunk-style chart mockup: execute_tool duration with gen_ai.workflow.name](images/splunk-chart-execute-tool-duration-by-workflow.png) + +### 5.4 Metrics — duration by workflow for `chat` + +The same pattern applies to **`chat`** operations: filter on **`gen_ai.operation.name: chat`** and use **`gen_ai.workflow.name`** to compare pipelines (e.g. **LangGraph** vs **`customer_support_pipeline`**). + +![Splunk-style chart mockup: chat duration with gen_ai.workflow.name](images/splunk-chart-chat-duration-by-workflow.png) + +--- + +## 6. Relationship to `gen_ai.agent.name` + +When **both** apply (agent inside a workflow): + +- **Both** attributes **MAY** be set on the same span or metric record: workflow = **orchestration**, agent = **logical agent** within that orchestration. +- The spec **SHOULD** state that neither replaces the other; backends **MAY** group by workflow, agent, or both. + +--- + +## 7. Backward compatibility + +- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions. +- Align with GenAI **stability / opt-in** policy for experimental conventions. + +--- + +## 8. Open questions + +1. **Nested workflows:** If a span sits inside **nested** orchestration, should instrumentation set the **innermost**, **outermost**, or **both** (outer + inner via a future convention)? Recommend **innermost** as default with a one-line note unless SIG wants **outermost** for product-level reporting. +2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` given cardinality guidance. +3. **Streaming metrics:** Include workflow name on **time_to_first_chunk** / **time_per_output_chunk** in the first PR or a follow-up? + +--- + +## 9. Specification / implementation checklist + +- [ ] Update **`model/`** YAML for affected span and metric definitions. +- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**. +- [ ] **CHANGELOG** entry under GenAI. +- [ ] Optional: non-normative example (LangGraph / multi-step pipeline) showing workflow + agent on a **chat** and **execute_tool** span. + +--- + +## 10. References + +- [OpenTelemetry Semantic Conventions](https://github.com/open-telemetry/semantic-conventions) +- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md) +- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md) +- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)