Generic Grouping and Relationship Primitives for GenAI Semantic Conventions

### Area(s)

area:gen-ai

### What's missing?

Two primitives that would let frameworks express structure without requiring new span types:

1. **Sub-trace grouping** - there is no way to say "these spans belong to the same logical unit" (a round, a task, a step) without creating a parent span solely for grouping purposes. Without this, backends must infer grouping from span order, which is brittle and breaks when retries, internal framework spans, or nested agents appear between the expected spans (as demonstrated in open-telemetry/semantic-conventions-genai#81 discussion).

2. **Typed relationships** - there is no way to express "this tool execution was triggered by that LLM output" for example or "this agent delegates to that agent". Without this, backends can not trace causality through an agent's decision chain. They can see that a tool was called, but not which LLM output caused it. OTel [span links](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#link) already exists in the spec ([links-between-spans](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans)) and could serve this purpose, but they are not used anywhere in the GenAI conventions today.

### Current State

GenAI semantic conventions (as of v1.40.0+) provide a solid foundation:

| Type | Coverage |
|------|----------|
| Agent Spans | `create_agent`, `invoke_agent` |
| Workflow Spans | `invoke_workflow` |
| LLM Spans | `inference`, `embeddings`, `retrieval` |
| Tool Spans | `execute_tool` |
| Evaluation | `gen_ai.evaluation.result` event |
| Identity | `gen_ai.tool.call.id`, `gen_ai.agent.id`, `gen_ai.conversation.id` |

SIG is currently receiving many proposals where each introduce new span types to address specific GenAI patterns:

| Issue | Proposed Span Type | Purpose |
|-------|-------------------|---------|
| open-telemetry/semantic-conventions-genai#81 | ReAct iteration span | Group LLM + tool cycles into rounds (ReAct) |
| open-telemetry/semantic-conventions-genai#55 | `gen_ai.workflow` + `gen_ai.task` | workflow orchestration and task decomposition |
| open-telemetry/semantic-conventions-genai#37 | `gen_ai.task.*` | Task lifecycle, subtasks, scheduling |
| open-telemetry/semantic-conventions-genai#57 | `orchestrate_tools` | Group tool invocation cycles |
| open-telemetry/semantic-conventions-genai#86 | `gen_ai.skill` | Skill lifecycle (loading, filtering, invocation) |
| open-telemetry/semantic-conventions-genai#41 | `gen_ai.agent.skills.*` | Agent skill attributes |
| open-telemetry/semantic-conventions-genai#35 | Tasks, actions, agents, teams, artifacts, memory | Full agentic ontology |

Each of these proposals although well motivated, but taken together they represent an **N+1 span type problem**: every new GenAI pattern (rounds, tasks, skills, orchestration, guardrails, memory) gets its own span type with its own attributes

This creates several risks:
- **Instrumentation burden**: libraries must implement an ever growing set of span types
- **Backend fragmentation**: each backend must understand each span type to render it meaningfully
- **Cross-framework inconsistency**: what LangChain calls a "round", CrewAI calls a "task", and DSPy calls a "module step"

### What already works well (pattern to follow)

OTel GenAI conventions already use generic primitives successfully:

- **[`gen_ai.operation.name`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/registry.yaml#L377)** - an extensible enum (`chat`, `invoke_agent`, `execute_tool` etc) that lets frameworks declare what a span represents without needing a separate span definition per operation
- **[`session.id`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/session/registry.yaml#L17)** - a generic attribute in the OTel registry (not GenAI-specific) that any domain can use for session grouping
- **[`gen_ai.tool.type`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/registry.yaml#L271)** - a generic field (`function`, `extension`, `datastore`) rather than separate span types per tool kind.

These are "box of shapes" primitives: OTel defines the shape vocabulary, frameworks decide which shapes to use.

### Describe the solution you'd like

### 1. Generic Grouping Attributes

Two new attributes in the `gen_ai` registry:

| Attribute | Type | Description | Example |
|-----------|------|-------------|---------|
| `gen_ai.group.id` | string | Identifier for a logical group of spans within a trace | `"round-3"`, `"task-research"`, `"step-plan"` |
| `gen_ai.group.type` | string (open enum) | Type of the logical group | `"react_round"`, `"task"`, `"planning_step"`, `"skill"` |

**How this subsumes existing proposals:**

The idea is straightforward: instead of defining a new span type for each concept, instrumentations add `gen_ai.group.id` and `gen_ai.group.type` as attributes on the spans they are **already emitting** today (e.g., `chat`, `execute_tool`, `invoke_agent`). The grouping is expressed by giving related spans the same `gen_ai.group.id` value,  no new span definitions, no new parent spans required.

**Concrete example of grouping ReAct rounds:**

Consider a ReAct agent that takes 2 rounds to answer a question. Today an instrumentation already emits these spans:

```
invoke_agent research_agent
├── chat gpt-4              <- round 1: LLM decides to call a tool
├── execute_tool web_search  <- round 1: tool executes
├── chat gpt-4              <- round 2: LLM reviews result and calls another tool
├── execute_tool summarize   <- round 2: tool executes
└── chat gpt-4              <- final: LLM produces answer
```

Without a grouping primitive, a backend must guess which spans belong to which round by pattern-matching on span order which breaks when retries, internal framework spans, or nested agents appear (as discussed in open-telemetry/semantic-conventions-genai#81).

With the grouping primitive, the instrumentation simply tags each span with a shared group ID:

```
invoke_agent research_agent
├── chat gpt-4               {gen_ai.group.id: "round-1", gen_ai.group.type: "react_round"}
├── execute_tool web_search   {gen_ai.group.id: "round-1", gen_ai.group.type: "react_round"}
├── chat gpt-4               {gen_ai.group.id: "round-2", gen_ai.group.type: "react_round"}
├── execute_tool summarize    {gen_ai.group.id: "round-2", gen_ai.group.type: "react_round"}
└── chat gpt-4               (no group, means this is final answer, and not part of a react cycle)
```

The spans are the same ones the instrumentation already produces today. No new span types or wrapper spans are introduced. The only addition is two attributes (`gen_ai.group.id` and `gen_ai.group.type`) on those spans. Now any backend can:
- **Count rounds** by counting distinct `gen_ai.group.id` values where `gen_ai.group.type` is `"react_round"`, in this case, 2
- **Group spans by round** for visualization display
- Alert on agents exceeding N rounds

No new span type was defined and no wrapper span was created. The structure is explicit and not inferred.

By "add" we mean: include `gen_ai.group.id` and `gen_ai.group.type` as referenced attributes in the existing span definitions in [`spans.yaml`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml) (e.g., under [`span.gen_ai.invoke_agent.client`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml#L336) and [`span.gen_ai.execute_tool.internal`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml#L397)), the same way attributes like `gen_ai.agent.name` or `gen_ai.tool.call.id` are referenced today. Instrumentations would then set these attributes when creating those spans.

| Proposal | Current approach (new span type) | With grouping primitive (attributes on spans already being emitted) |
|----------|---------------------------|------------------------|
|#3419 ReAct Iterations Spans | Define a new `react_iteration` span that wraps each LLM+tool cycle | Add `gen_ai.group.type = "react_round"` and a shared `gen_ai.group.id` to the [`chat`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml#L104) and [`execute_tool`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml#L397) spans that the instrumentation already produces |
|#2912 Add Tasks | Define a new `gen_ai.task` span type | Add `gen_ai.group.type = "task"` and a shared `gen_ai.group.id` to the [`invoke_agent`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml#L336) / [`execute_tool`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml#L397) spans that already represent the work |
|#2993 Add Tool orchestration | Define a new `orchestrate_tools` span | Add `gen_ai.group.type = "tool_cycle"` and a shared `gen_ai.group.id` to the [`chat`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml#L104) and [`execute_tool`](https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/spans.yaml#L397) spans within the cycle |
|#3540 Add skill | Define a new `gen_ai.skill` span type | Add `gen_ai.group.type = "skill"` and a shared `gen_ai.group.id` to the spans emitted during skill execution |

**Key design decisions:**
- `gen_ai.group.type` is an **open enum**, frameworks define their own values, OTel does not prescribe what a "round" or "task" means
- Multiple group memberships can be expressed by recording the attribute on multiple spans with the same `gen_ai.group.id`
- Requirement level: `recommended` on `invoke_agent`, `execute_tool`, and `invoke_workflow` spans
- This does not prevent frameworks from also creating parent spans for visualization hierarchy, it just provides an alternative that doesn't require it

### 2. Typed Span Links for GenAI Relationships

OTel [span links](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans) already support expressing non-parent/child relationships between spans. The GenAI conventions should document and recommend their use.

Span links are not unknown to the GenAI SIG and they have come up in a few specific discussions:
- open-telemetry/semantic-conventions-genai#33 proposes span links for evaluation -> target binding
- open-telemetry/semantic-conventions#2563 discusses them as a mechanism for correlating evaluation events with scored spans
- open-telemetry/semantic-conventions#2083 references them in the context of MCP semantic conventions

However in each case span links are proposed as a solution for one narrow use case.They have not been considered as a general-purpose relationship primitive across GenAI conventions. This proposal suggests elevating them to that role.

**Proposed guidance:**

Instrumentations should consider using span links to express causal or semantic relationships between GenAI spans that are not captured by the parent-child hierarchy. When creating a span link, instrumentations should set an attribute on the link describing the relationship type.

**Suggested link relationship types:**

| Relationship | Description | Example |
|-------------|-------------|---------|
| `triggered_by` | The span was triggered by the linked span's output | Tool execution triggered by LLM `tool_call` response |
| `delegates_to` | This agent delegates work to the linked agent | parent agent invoking a sub-agent |
| `evaluates` | Evaluation targets the linked span | Evaluation event referencing the span it scores |

Note: open-telemetry/semantic-conventions-genai#33 proposes span links for evaluation->target binding, validating this approach and this proposal is a generalization of that pattern.

**Example:**

```
invoke_agent main_agent
├── chat gpt-4                     (span_id: A)
│   └── response includes a tool_call: "search"
├── execute_tool search            (span_id: B, link: {span_id: A, type: "triggered_by"})
└── gen_ai.evaluation.result       (link: {span_id: B type: "evaluates"})
```

## Relationship to Existing Proposals

This proposal is complementary to, not a replacement for the existing proposals. It offers a design principle:

- **For proposals that need a new span type** (for example open-telemetry/semantic-conventions-genai#55 workflows): the span type may still be valuable, but the grouping primitive provides a lighter weight alternative for simpler cases
- **For proposals that primarily need grouping** (#3419 ReAct rounds, open-telemetry/semantic-conventions-genai#57 tool orchestration) the grouping primitive may be sufficient without a new span type
- **For proposals that need relationships** (#2626 evaluation spans): typed span links provide the mechanism


### Tip

<sub>[React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>

Proposal	Current approach (new span type)	With grouping primitive (attributes on spans already being emitted)
#3419 ReAct Iterations Spans	Define a new `react_iteration` span that wraps each LLM+tool cycle	Add `gen_ai.group.type = "react_round"` and a shared `gen_ai.group.id` to the `chat` and `execute_tool` spans that the instrumentation already produces
#2912 Add Tasks	Define a new `gen_ai.task` span type	Add `gen_ai.group.type = "task"` and a shared `gen_ai.group.id` to the `invoke_agent` / `execute_tool` spans that already represent the work
#2993 Add Tool orchestration	Define a new `orchestrate_tools` span	Add `gen_ai.group.type = "tool_cycle"` and a shared `gen_ai.group.id` to the `chat` and `execute_tool` spans within the cycle
#3540 Add skill	Define a new `gen_ai.skill` span type	Add `gen_ai.group.type = "skill"` and a shared `gen_ai.group.id` to the spans emitted during skill execution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic Grouping and Relationship Primitives for GenAI Semantic Conventions #3575

Area(s)

What's missing?

Current State

What already works well (pattern to follow)

Describe the solution you'd like

1. Generic Grouping Attributes

2. Typed Span Links for GenAI Relationships

Relationship to Existing Proposals

Tip

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Type	Coverage
Agent Spans	`create_agent`, `invoke_agent`
Workflow Spans	`invoke_workflow`
LLM Spans	`inference`, `embeddings`, `retrieval`
Tool Spans	`execute_tool`
Evaluation	`gen_ai.evaluation.result` event
Identity	`gen_ai.tool.call.id`, `gen_ai.agent.id`, `gen_ai.conversation.id`

Issue	Proposed Span Type	Purpose
open-telemetry/semantic-conventions-genai#81	ReAct iteration span	Group LLM + tool cycles into rounds (ReAct)
open-telemetry/semantic-conventions-genai#55	`gen_ai.workflow` + `gen_ai.task`	workflow orchestration and task decomposition
open-telemetry/semantic-conventions-genai#37	`gen_ai.task.*`	Task lifecycle, subtasks, scheduling
open-telemetry/semantic-conventions-genai#57	`orchestrate_tools`	Group tool invocation cycles
open-telemetry/semantic-conventions-genai#86	`gen_ai.skill`	Skill lifecycle (loading, filtering, invocation)
open-telemetry/semantic-conventions-genai#41	`gen_ai.agent.skills.*`	Agent skill attributes
open-telemetry/semantic-conventions-genai#35	Tasks, actions, agents, teams, artifacts, memory	Full agentic ontology

Attribute	Type	Description	Example
`gen_ai.group.id`	string	Identifier for a logical group of spans within a trace	`"round-3"`, `"task-research"`, `"step-plan"`
`gen_ai.group.type`	string (open enum)	Type of the logical group	`"react_round"`, `"task"`, `"planning_step"`, `"skill"`

Relationship	Description	Example
`triggered_by`	The span was triggered by the linked span's output	Tool execution triggered by LLM `tool_call` response
`delegates_to`	This agent delegates work to the linked agent	parent agent invoking a sub-agent
`evaluates`	Evaluation targets the linked span	Evaluation event referencing the span it scores

Generic Grouping and Relationship Primitives for GenAI Semantic Conventions #3575

Description

Area(s)

What's missing?

Current State

What already works well (pattern to follow)

Describe the solution you'd like

1. Generic Grouping Attributes

2. Typed Span Links for GenAI Relationships

Relationship to Existing Proposals

Tip

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions