Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions util/opentelemetry-util-genai/agent-name-child-spans-proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Proposal: `gen_ai.agent.name` on GenAI child spans and client metrics

> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions).
> **Scope:** `gen_ai.agent.name` only — **`gen_ai.agent.id` is explicitly out of scope.**

---

## 1. Motivation / Problem statement

Multi-agent and orchestrated applications emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans that share the same **`gen_ai.request.model`** or **`gen_ai.tool.name`**. Those attributes alone do not identify **which logical agent** (e.g. planner vs retriever) initiated the operation.

Today, operators must **reconstruct** agent context by **walking the trace** (e.g. from an `invoke_agent` parent). That is fragile when:

- Traces are **incomplete**, **sampled**, or spans are analyzed **without** full parent chains.
- Backends need **simple filters** (`operation` + `model` + **agent**) without graph joins on every query.

**Metrics** for `gen_ai.client.token.usage` and `gen_ai.client.operation.duration` are similarly hard to break down **by agent** without a standard attribute, which blocks **cost**, **latency**, and **SLO** views per agent.

---

## 2. Goals

- Standardize **`gen_ai.agent.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation is performed **on behalf of a named agent**.
- Add **`gen_ai.agent.name`** as a **documented** dimension on **GenAI client metrics** where it improves breakdown without mandating high cardinality.
- Keep **`gen_ai.agent.name`** as a **low-cardinality**, **logical** agent label (product/agent role), not a per-run identifier.

---

## 3. Non-goals

- **`gen_ai.agent.id`** or any **instance-level** agent identity on these spans or metrics (explicitly out of scope).
- Changing **required** attribute sets in a way that **breaks** existing instrumentations (prefer **recommended** / **opt-in** for metrics).

---

## 4. Proposed solution

### 4.1 Semantic meaning

**`gen_ai.agent.name`** on a **child** span or metric record means:

> The **logical name** of the agent **on whose behalf** this inference, embedding, retrieval, or tool execution was performed.

It **SHOULD** align with the name used when that agent is represented by an **`invoke_agent`** (or equivalent) span in the same system, when such a span exists.

### 4.2 Span convention changes (`gen-ai-spans.md`)

For each of the following sections, **add** `gen_ai.agent.name` to the span attribute table:

| Section | Span kinds / notes |
|--------|---------------------|
| Inference | e.g. `chat`, `generate_content`, `text_completion`, … |
| Embeddings | `embeddings` |
| Retrievals | `retrieval` |
| Execute tool | `execute_tool` |

**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the agent name (typical for agent frameworks / wrappers). Omitted when there is **no** agent concept (raw model client).

**Documentation notes (normative guidance):**

- **MUST NOT** use this attribute for **end-user IDs**, **request IDs**, or other **unbounded** values.
- Instrumentations **SHOULD** use a **small, stable** set of names (e.g. `billing_support`, `research_agent`).

### 4.3 Metric convention changes (`gen-ai-metrics.md`)

Add **`gen_ai.agent.name`** to metric attribute tables where the operation can be tied to an agent, for example:

| Metric | Suggested requirement |
|--------|------------------------|
| `gen_ai.client.token.usage` | Recommended when available |
| `gen_ai.client.operation.duration` | Recommended when available |
| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available |
| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available |

**Guidance:** Same low-cardinality rules as spans; implementations **MAY** omit when no agent context exists.

---

## 5. Use cases / rationale

### 5.1 Spans

- **Filtering and grouping** in trace UIs without inferring parent `invoke_agent`.
- **Disambiguation** when the same **model** or **tool** is used by **different** agents.
- **Attribution** when spans are stored or analyzed **without** full trace context.

### 5.2 Metrics

- **Token and cost** breakdown by agent.
- **Latency and error** SLOs **per agent** for the same `gen_ai.operation.name` and model.
- **Alerting** scoped to a specific agent’s behavior.

---

## 6. Backward compatibility

- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions.
- Respect existing GenAI **stability and opt-in** policy for emitting **latest experimental** vs legacy behavior.

---

## 7. Alternatives considered

| Alternative | Why not chosen |
|-------------|----------------|
| Rely only on **trace parent** (`invoke_agent`) | Fails for partial traces, sampling, and simple backend queries. |
| Use **custom** / vendor-specific attributes | Prevents **portable** dashboards and cross-vendor tooling. |
| Add **`gen_ai.agent.id`** on metrics | High **cardinality** risk; explicitly excluded from this proposal. |
| **Required** `gen_ai.agent.name` on all child spans | Breaks **non-agent** model client usage. |

---

## 8. Open questions

1. **Nested agents:** Should the spec say **“nearest owning agent”** vs **“root workflow agent”** when multiple agents nest? (Pick one default; allow instrumentation notes.)
2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` metrics—SIG preference for default cardinality.
3. **Streaming metrics:** Include **`gen_ai.agent.name`** on **time_to_first_chunk** / **time_per_output_chunk** in v1 of the change or follow-up PR?

---

## 9. Specification / implementation checklist

- [ ] Update **`model/`** YAML for affected span and metric definitions.
- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**.
- [ ] **CHANGELOG** entry under GenAI.
- [ ] Optional: examples in **non-normative** docs showing agent-attributed chat + tool spans.

---

## 10. References

- [OpenTelemetry Semantic Conventions repository](https://github.com/open-telemetry/semantic-conventions)
- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md)
- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md)
- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)
137 changes: 137 additions & 0 deletions util/opentelemetry-util-genai/workflow-name-child-spans-proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Proposal: `gen_ai.workflow.name` on GenAI child spans and client metrics

> **Intent:** Contribution draft for [open-telemetry/semantic-conventions](https://github.com/open-telemetry/semantic-conventions).
> **Scope:** `gen_ai.workflow.name` only — **workflow instance / id attributes are explicitly out of scope.**

---

## 1. Motivation / Problem statement

Orchestrated GenAI systems (graphs, crews, pipelines) emit many **inference**, **embeddings**, **retrieval**, and **execute_tool** spans under a single **logical workflow** (e.g. `customer_support_pipeline`, `travel_planner_graph`). Today, **`gen_ai.workflow.name`** is naturally present on **`invoke_workflow`** (or equivalent) spans, but **child** operations often only show **model**, **tool**, or **provider**, not **which pipeline** they belong to.

Operators then depend on **trace hierarchy** or custom attributes to answer:

- Which **workflow** drove this **chat** or **tool** span?
- How do **token usage** and **latency** break down **by workflow** for the same model?

Without a **standard** attribute on **child** spans and **client** metrics, backends cannot offer **portable** filters, dashboards, or SLOs **by workflow** without vendor-specific keys or parent-span joins.

---

## 2. Goals

- Standardize **`gen_ai.workflow.name`** on **inference**, **embeddings**, **retrieval**, and **execute_tool** **client** spans when the operation runs **in the context of a named workflow**.
- Add **`gen_ai.workflow.name`** as a **documented** dimension on **GenAI client metrics** (`gen_ai.client.token.usage`, `gen_ai.client.operation.duration`, and optionally streaming metrics) when workflow context is known.
- Treat the value as **low-cardinality**: a **stable logical name** for the orchestration unit (pipeline / app / graph), not a per-run id.

---

## 3. Proposed solution

### 3.1 Semantic meaning

**`gen_ai.workflow.name`** on a **child** span or metric record means:

> The **logical name** of the **workflow** (orchestration / pipeline) **within which** this inference, embedding, retrieval, or tool execution was performed.

It **SHOULD** match the value used on the **`invoke_workflow`** span (or the workflow entity) for the same logical run when such a span exists.

### 3.2 Span convention changes (`gen-ai-spans.md`)

For each of the following sections, **add** `gen_ai.workflow.name` to the span attribute table:

| Section | Notes |
|--------|--------|
| Inference | e.g. `chat`, `generate_content`, `text_completion`, … |
| Embeddings | `embeddings` |
| Retrievals | `retrieval` |
| Execute tool | `execute_tool` |

**Suggested requirement level:** **Recommended** — when the instrumentation **knows** the workflow name (e.g. from framework config, graph metadata, or explicit API). **Omitted** when there is **no** workflow context.

**Normative guidance:**

- **MUST NOT** use this attribute for **unbounded** values (raw user input, thread ids as workflow names, UUIDs per invocation).
- **SHOULD** use a **small, stable** set of names aligned with how the application names its pipelines in config or UI.

### 3.3 Metric convention changes (`gen-ai-metrics.md`)

Add **`gen_ai.workflow.name`** to metric attribute tables where the operation can be tied to a workflow, for example:

| Metric | Suggested requirement |
|--------|------------------------|
| `gen_ai.client.token.usage` | Recommended when available |
| `gen_ai.client.operation.duration` | Recommended when available |
| *(Optional)* `gen_ai.client.operation.time_to_first_chunk` | Recommended when available |
| *(Optional)* `gen_ai.client.operation.time_per_output_chunk` | Recommended when available |

**Guidance:** Omit when no workflow context exists; same **low-cardinality** rules as spans.

---

## 4. Use cases / rationale

### 4.1 Spans

- **Filter and group** child spans **by pipeline** without walking to **`invoke_workflow`**.
- **Compare** the same **model** or **tool** across **different** workflows (e.g. staging vs production pipeline name, or two products sharing one model).
- **Attribution** when spans are **sampled**, **exported in isolation**, or parent links are missing.

### 4.2 Metrics

- **Cost and token** usage **by workflow** (which pipeline consumes the most input tokens).
- **Latency and error** SLOs **per workflow** for the same `gen_ai.operation.name` and model.
- **Alerting** on a specific **workflow** without inferring it from trace structure.

---

## 5. Relationship to `gen_ai.agent.name`

When **both** apply (agent inside a workflow):

- **Both** attributes **MAY** be set on the same span or metric record: workflow = **orchestration**, agent = **logical agent** within that orchestration.
- The spec **SHOULD** state that neither replaces the other; backends **MAY** group by workflow, agent, or both.

---

## 6. Backward compatibility

- **Additive** only: new **recommended** (or **opt-in** for metrics, if SIG prefers) attributes/dimensions.
- Align with GenAI **stability / opt-in** policy for experimental conventions.

---

## 7. Alternatives considered

| Alternative | Why not chosen |
|-------------|----------------|
| Rely only on **parent** `invoke_workflow` span | Weak for partial traces, sampling, and metric breakdowns without trace joins. |
| Use **custom** attributes (`workflow`, `pipeline_name`, …) | Hurts **interoperability** and shared dashboards. |
| Add **`gen_ai.workflow.id`** on metrics | **Cardinality** risk; out of scope. |
| **Required** `gen_ai.workflow.name` on all spans | Breaks **non-orchestrated** GenAI usage. |

---

## 8. Open questions

1. **Nested workflows:** If a span sits inside **nested** orchestration, should instrumentation set the **innermost**, **outermost**, or **both** (outer + inner via a future convention)? Recommend **innermost** as default with a one-line note unless SIG wants **outermost** for product-level reporting.
2. **Metrics requirement level:** **Recommended** vs **Opt-in** for `gen_ai.client.*` given cardinality guidance.
3. **Streaming metrics:** Include workflow name on **time_to_first_chunk** / **time_per_output_chunk** in the first PR or a follow-up?

---

## 9. Specification / implementation checklist

- [ ] Update **`model/`** YAML for affected span and metric definitions.
- [ ] Regenerate **`docs/gen-ai/gen-ai-spans.md`** and **`docs/gen-ai/gen-ai-metrics.md`**.
- [ ] **CHANGELOG** entry under GenAI.
- [ ] Optional: non-normative example (LangGraph / multi-step pipeline) showing workflow + agent on a **chat** and **execute_tool** span.

---

## 10. References

- [OpenTelemetry Semantic Conventions](https://github.com/open-telemetry/semantic-conventions)
- [GenAI spans](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md)
- [GenAI metrics](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md)
- [Contributing](https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md)
Loading