Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions util/opentelemetry-util-genai/agent_name.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Proposal: `gen_ai.agent.name` on GenAI child spans and client metrics

> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions).
> **Scope:** `gen_ai.agent.name` only — **`gen_ai.agent.id` is explicitly out of scope.**

---

## 1. Motivation / Problem statement

Multi-agent and orchestrated applications emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans that share the same **`gen_ai.request.model`** or **`gen_ai.tool.name`**. Those attributes alone do not identify **which logical agent** (e.g. planner vs retriever) initiated the operation.

**Metrics** for `gen_ai.client.token.usage` and `gen_ai.client.operation.duration` are similarly hard to break down **by agent** without a standard attribute, which blocks **cost**, **latency**, and **SLO** views per agent.

---

## 2. Goals

- Standardize **`gen_ai.agent.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation is performed **on behalf of a named agent**.
- Add **`gen_ai.agent.name`** as a **documented** dimension on **GenAI client metrics** where it improves breakdown without mandating high cardinality.
- Keep **`gen_ai.agent.name`** as a **low-cardinality**, **logical** agent label (product/agent role), not a per-run identifier.

---

## 3. Proposed solution

### 3.1 Semantic meaning

**`gen_ai.agent.name`** on a **child** span or metric record means:

> The **logical name** of the agent **on whose behalf** this inference, embedding, retrieval, or tool execution was performed.

It **SHOULD** align with the name used when that agent is represented by an **`invoke_agent`** (or equivalent) span in the same system, when such a span exists.

### 3.2 Span convention changes (`gen-ai-spans.md`)

For each of the following sections, **add** `gen_ai.agent.name` to the span attribute table:

| Section | Span kinds / notes |
|--------|---------------------|
| Inference | e.g. `chat`, `generate_content`, `text_completion`, … |
| Embeddings | `embeddings` |
| Retrievals | `retrieval` |
| Execute tool | `execute_tool` |

**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the agent name (typical for agent frameworks / wrappers). Omitted when there is **no** agent concept (raw model client).

**Documentation notes (normative guidance):**

- **MUST NOT** use this attribute for **end-user IDs**, **request IDs**, or other **unbounded** values.
- Instrumentations **SHOULD** use a **small, stable** set of names (e.g. `billing_support`, `research_agent`).

### 3.3 Metric convention changes (`gen-ai-metrics.md`)

Add **`gen_ai.agent.name`** to metric attribute tables where the operation can be tied to an agent, for example:

| Metric | Suggested requirement |
|--------|------------------------|
| `gen_ai.client.token.usage` | Recommended when available |
| `gen_ai.client.operation.duration` | Recommended when available |
| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available |
| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available |

**Guidance:** Same low-cardinality rules as spans; implementations **MAY** omit when no agent context exists.

---

## 4. Use cases / rationale

### 4.1 Spans

- **Filtering and grouping** in trace UIs without inferring parent `invoke_agent`.
- **Disambiguation** when the same **model** or **tool** is used by **different** agents.

### 4.2 Metrics

- **Token and cost** breakdown by agent.
- **Latency and error** SLOs **per agent** for the same `gen_ai.operation.name` and model.

---

## 5. Sample screenshots (Splunk Observability Cloud)

The following examples use a **travel-planner** style multi-agent app (LangGraph-style workflow with `invoke_workflow`, `invoke_agent`, `chat`, and `execute_tool` spans). They illustrate how **`gen_ai.agent.name`** on **child** spans and **client metrics** appears in **Splunk APM** and **chart builders** when using OpenTelemetry GenAI instrumentation.

### 5.1 Trace view — inference (`chat`) span

A **`chat`** span for `gpt-4.1-mini` under the **coordinator** agent shows **`gen_ai.agent.name`: `coordinator`** in span properties, alongside `gen_ai.operation.name`, token usage, and model attributes—without inferring the agent only from a parent `invoke_agent` row in the UI.

![Splunk APM trace view: chat span with gen_ai.agent.name](images/splunk-apm-trace-chat-span-agent-name.png)

### 5.2 Trace view — `execute_tool` span

An **`execute_tool`** span (**`mock_search_flights`**) carries **`gen_ai.agent.name`: `flight_specialist`**, linking the tool execution to the agent that invoked it in the same trace.

![Splunk APM trace view: execute_tool span with gen_ai.agent.name](images/splunk-apm-trace-execute-tool-span-agent-name.png)

### 5.3 Metrics — duration by agent for `execute_tool`

**`gen_ai.client.operation.duration`** can be filtered (e.g. `gen_ai.operation.name: execute_tool`) and broken down or filtered by **`gen_ai.agent.name`** (`hotel_specialist`, `flight_specialist`, `activity_specialist`, …) in the plot editor.

![Splunk chart: execute_tool duration with gen_ai.agent.name](images/splunk-chart-execute-tool-duration-by-agent.png)

### 5.4 Metrics — duration by agent for `chat`

The same pattern applies to **`chat`** operations: filter on **`gen_ai.operation.name: chat`** and use **`gen_ai.agent.name`** to compare **coordinator** vs specialist agents.

![Splunk chart: chat duration with gen_ai.agent.name](images/splunk-chart-chat-duration-by-agent.png)

---

## 6. Backward compatibility

- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions.
- Respect existing GenAI **stability and opt-in** policy for emitting **latest experimental** vs legacy behavior.

---

## 7. Open questions

1. **Nested agents:** Should the spec say **“nearest owning agent”** vs **“root workflow agent”** when multiple agents nest? (Pick one default; allow instrumentation notes.)
2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` metrics—SIG preference for default cardinality.
3. **Streaming metrics:** Include **`gen_ai.agent.name`** on **time_to_first_chunk** / **time_per_output_chunk** in v1 of the change or follow-up PR?

---

## 8. Specification / implementation checklist

- [ ] Update **`model/`** YAML for affected span and metric definitions.
- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**.
- [ ] **CHANGELOG** entry under GenAI.
- [ ] Optional: examples in **non-normative** docs showing agent-attributed chat + tool spans.

---

## 9. References

- [OpenTelemetry Semantic Conventions repository](https://github.com/open-telemetry/semantic-conventions)
- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md)
- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md)
- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
154 changes: 154 additions & 0 deletions util/opentelemetry-util-genai/workflow_name.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Proposal: `gen_ai.workflow.name` on GenAI child spans and client metrics

> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions).
> **Scope:** `gen_ai.workflow.name` only — **workflow instance / id attributes are explicitly out of scope.**

---

## 1. Motivation / Problem statement

Orchestrated GenAI systems (graphs, crews, pipelines) emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans under a single **logical workflow** (e.g. `customer_support_pipeline`, `travel_planner_graph`). Today, **`gen_ai.workflow.name`** is naturally present on **`invoke_workflow`** (or equivalent) spans, but **child** operations often only show **model**, **tool**, or **provider**, not **which pipeline** they belong to.

Operators then depend on **trace hierarchy** or custom attributes to answer:

- Which **workflow** drove this **chat** or **tool** span?
- How do **token usage** and **latency** break down **by workflow** for the same model?

Without a **standard** attribute on **child** spans and **client** metrics, backends cannot offer **portable** filters, dashboards, or SLOs **by workflow** without vendor-specific keys or parent-span joins.

---

## 2. Goals

- Standardize **`gen_ai.workflow.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation runs **in the context of a named workflow**.
- Add **`gen_ai.workflow.name`** as a **documented** dimension on **GenAI client metrics** (`gen_ai.client.token.usage`, `gen_ai.client.operation.duration`, and optionally streaming metrics) when workflow context is known.
- Treat the value as **low-cardinality**: a **stable logical name** for the orchestration unit (pipeline / app / graph), not a per-run id.

---

## 3. Proposed solution

### 3.1 Semantic meaning

**`gen_ai.workflow.name`** on a **child** span or metric record means:

> The **logical name** of the **workflow** (orchestration / pipeline) **within which** this inference, embedding, retrieval, or tool execution was performed.

It **SHOULD** match the value used on the **`invoke_workflow`** span (or the workflow entity) for the same logical run when such a span exists.

### 3.2 Span convention changes (`gen-ai-spans.md`)

For each of the following sections, **add** `gen_ai.workflow.name` to the span attribute table:

| Section | Notes |
|--------|--------|
| Inference | e.g. `chat`, `generate_content`, `text_completion`, … |
| Embeddings | `embeddings` |
| Retrievals | `retrieval` |
| Execute tool | `execute_tool` |

**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the workflow name (e.g. from framework config, graph metadata, or explicit API). **Omitted** when there is **no** workflow context.

**Normative guidance:**

- **MUST NOT** use this attribute for **unbounded** values (raw user input, thread ids as workflow names, UUIDs per invocation).
- **SHOULD** use a **small, stable** set of names aligned with how the application names its pipelines in config or UI.

### 3.3 Metric convention changes (`gen-ai-metrics.md`)

Add **`gen_ai.workflow.name`** to metric attribute tables where the operation can be tied to a workflow, for example:

| Metric | Suggested requirement |
|--------|------------------------|
| `gen_ai.client.token.usage` | Recommended when available |
| `gen_ai.client.operation.duration` | Recommended when available |
| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available |
| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available |

**Guidance:** Omit when no workflow context exists; same **low-cardinality** rules as spans.

---

## 4. Use cases / rationale

### 4.1 Spans

- **Filter and group** child spans **by pipeline** without walking to **`invoke_workflow`**.
- **Compare** the same **model** or **tool** across **different** workflows (e.g. staging vs production pipeline name, or two products sharing one model).

### 4.2 Metrics

- **Cost and token** usage **by workflow** (which pipeline consumes the most input tokens).
- **Latency and error** SLOs **per workflow** for the same `gen_ai.operation.name` and model.

---

## 5. Sample screenshots (Splunk Observability Cloud)

The images below are **illustrative mockups**

### 5.1 Trace view — inference (`chat`) span

A **`chat`** span for `gpt-4.1-mini` nested under a **LangGraph** workflow shows **`gen_ai.workflow.name`: `LangGraph`** in span properties, alongside `gen_ai.operation.name`, token usage, and model attributes—so the pipeline is visible on the **child** span, not only on **`invoke_workflow`**.

![Splunk-style trace mockup: chat span with gen_ai.workflow.name](images/splunk-apm-trace-chat-span-workflow-name.png)

### 5.2 Trace view — `execute_tool` span

An **`execute_tool`** span (**`mock_search_flights`**) carries **`gen_ai.workflow.name`: `LangGraph`**, linking the tool execution to the **workflow** that owns the run.

![Splunk-style trace mockup: execute_tool span with gen_ai.workflow.name](images/splunk-apm-trace-execute-tool-span-workflow-name.png)

### 5.3 Metrics — duration by workflow for `execute_tool`

**`gen_ai.client.operation.duration`** can be filtered (e.g. `gen_ai.operation.name: execute_tool`) and broken down or filtered by **`gen_ai.workflow.name`** (`travel_booking_pipeline`, `support_triage`, `content_review`, …) in the plot editor.

![Splunk-style chart mockup: execute_tool duration with gen_ai.workflow.name](images/splunk-chart-execute-tool-duration-by-workflow.png)

### 5.4 Metrics — duration by workflow for `chat`

The same pattern applies to **`chat`** operations: filter on **`gen_ai.operation.name: chat`** and use **`gen_ai.workflow.name`** to compare pipelines (e.g. **LangGraph** vs **`customer_support_pipeline`**).

![Splunk-style chart mockup: chat duration with gen_ai.workflow.name](images/splunk-chart-chat-duration-by-workflow.png)

---

## 6. Relationship to `gen_ai.agent.name`

When **both** apply (agent inside a workflow):

- **Both** attributes **MAY** be set on the same span or metric record: workflow = **orchestration**, agent = **logical agent** within that orchestration.
- The spec **SHOULD** state that neither replaces the other; backends **MAY** group by workflow, agent, or both.

---

## 7. Backward compatibility

- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions.
- Align with GenAI **stability / opt-in** policy for experimental conventions.

---

## 8. Open questions

1. **Nested workflows:** If a span sits inside **nested** orchestration, should instrumentation set the **innermost**, **outermost**, or **both** (outer + inner via a future convention)? Recommend **innermost** as default with a one-line note unless SIG wants **outermost** for product-level reporting.
2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` given cardinality guidance.
3. **Streaming metrics:** Include workflow name on **time_to_first_chunk** / **time_per_output_chunk** in the first PR or a follow-up?

---

## 9. Specification / implementation checklist

- [ ] Update **`model/`** YAML for affected span and metric definitions.
- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**.
- [ ] **CHANGELOG** entry under GenAI.
- [ ] Optional: non-normative example (LangGraph / multi-step pipeline) showing workflow + agent on a **chat** and **execute_tool** span.

---

## 10. References

- [OpenTelemetry Semantic Conventions](https://github.com/open-telemetry/semantic-conventions)
- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md)
- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md)
- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)
Loading