Skip to content

Generic Grouping and Relationship Primitives for GenAI Semantic Conventions #3575

@KazChe

Description

@KazChe

Area(s)

area:gen-ai

What's missing?

Two primitives that would let frameworks express structure without requiring new span types:

  1. Sub-trace grouping - there is no way to say "these spans belong to the same logical unit" (a round, a task, a step) without creating a parent span solely for grouping purposes. Without this, backends must infer grouping from span order, which is brittle and breaks when retries, internal framework spans, or nested agents appear between the expected spans (as demonstrated in Adding ReAct Iterations Spans in Reasoning-Acting Agents semantic-conventions-genai#81 discussion).

  2. Typed relationships - there is no way to express "this tool execution was triggered by that LLM output" for example or "this agent delegates to that agent". Without this, backends can not trace causality through an agent's decision chain. They can see that a tool was called, but not which LLM output caused it. OTel span links already exists in the spec (links-between-spans) and could serve this purpose, but they are not used anywhere in the GenAI conventions today.

Current State

GenAI semantic conventions (as of v1.40.0+) provide a solid foundation:

Type Coverage
Agent Spans create_agent, invoke_agent
Workflow Spans invoke_workflow
LLM Spans inference, embeddings, retrieval
Tool Spans execute_tool
Evaluation gen_ai.evaluation.result event
Identity gen_ai.tool.call.id, gen_ai.agent.id, gen_ai.conversation.id

SIG is currently receiving many proposals where each introduce new span types to address specific GenAI patterns:

Issue Proposed Span Type Purpose
open-telemetry/semantic-conventions-genai#81 ReAct iteration span Group LLM + tool cycles into rounds (ReAct)
open-telemetry/semantic-conventions-genai#55 gen_ai.workflow + gen_ai.task workflow orchestration and task decomposition
open-telemetry/semantic-conventions-genai#37 gen_ai.task.* Task lifecycle, subtasks, scheduling
open-telemetry/semantic-conventions-genai#57 orchestrate_tools Group tool invocation cycles
open-telemetry/semantic-conventions-genai#86 gen_ai.skill Skill lifecycle (loading, filtering, invocation)
open-telemetry/semantic-conventions-genai#41 gen_ai.agent.skills.* Agent skill attributes
open-telemetry/semantic-conventions-genai#35 Tasks, actions, agents, teams, artifacts, memory Full agentic ontology

Each of these proposals although well motivated, but taken together they represent an N+1 span type problem: every new GenAI pattern (rounds, tasks, skills, orchestration, guardrails, memory) gets its own span type with its own attributes

This creates several risks:

  • Instrumentation burden: libraries must implement an ever growing set of span types
  • Backend fragmentation: each backend must understand each span type to render it meaningfully
  • Cross-framework inconsistency: what LangChain calls a "round", CrewAI calls a "task", and DSPy calls a "module step"

What already works well (pattern to follow)

OTel GenAI conventions already use generic primitives successfully:

  • gen_ai.operation.name - an extensible enum (chat, invoke_agent, execute_tool etc) that lets frameworks declare what a span represents without needing a separate span definition per operation
  • session.id - a generic attribute in the OTel registry (not GenAI-specific) that any domain can use for session grouping
  • gen_ai.tool.type - a generic field (function, extension, datastore) rather than separate span types per tool kind.

These are "box of shapes" primitives: OTel defines the shape vocabulary, frameworks decide which shapes to use.

Describe the solution you'd like

1. Generic Grouping Attributes

Two new attributes in the gen_ai registry:

Attribute Type Description Example
gen_ai.group.id string Identifier for a logical group of spans within a trace "round-3", "task-research", "step-plan"
gen_ai.group.type string (open enum) Type of the logical group "react_round", "task", "planning_step", "skill"

How this subsumes existing proposals:

The idea is straightforward: instead of defining a new span type for each concept, instrumentations add gen_ai.group.id and gen_ai.group.type as attributes on the spans they are already emitting today (e.g., chat, execute_tool, invoke_agent). The grouping is expressed by giving related spans the same gen_ai.group.id value, no new span definitions, no new parent spans required.

Concrete example of grouping ReAct rounds:

Consider a ReAct agent that takes 2 rounds to answer a question. Today an instrumentation already emits these spans:

invoke_agent research_agent
├── chat gpt-4              <- round 1: LLM decides to call a tool
├── execute_tool web_search  <- round 1: tool executes
├── chat gpt-4              <- round 2: LLM reviews result and calls another tool
├── execute_tool summarize   <- round 2: tool executes
└── chat gpt-4              <- final: LLM produces answer

Without a grouping primitive, a backend must guess which spans belong to which round by pattern-matching on span order which breaks when retries, internal framework spans, or nested agents appear (as discussed in open-telemetry/semantic-conventions-genai#81).

With the grouping primitive, the instrumentation simply tags each span with a shared group ID:

invoke_agent research_agent
├── chat gpt-4               {gen_ai.group.id: "round-1", gen_ai.group.type: "react_round"}
├── execute_tool web_search   {gen_ai.group.id: "round-1", gen_ai.group.type: "react_round"}
├── chat gpt-4               {gen_ai.group.id: "round-2", gen_ai.group.type: "react_round"}
├── execute_tool summarize    {gen_ai.group.id: "round-2", gen_ai.group.type: "react_round"}
└── chat gpt-4               (no group, means this is final answer, and not part of a react cycle)

The spans are the same ones the instrumentation already produces today. No new span types or wrapper spans are introduced. The only addition is two attributes (gen_ai.group.id and gen_ai.group.type) on those spans. Now any backend can:

  • Count rounds by counting distinct gen_ai.group.id values where gen_ai.group.type is "react_round", in this case, 2
  • Group spans by round for visualization display
  • Alert on agents exceeding N rounds

No new span type was defined and no wrapper span was created. The structure is explicit and not inferred.

By "add" we mean: include gen_ai.group.id and gen_ai.group.type as referenced attributes in the existing span definitions in spans.yaml (e.g., under span.gen_ai.invoke_agent.client and span.gen_ai.execute_tool.internal), the same way attributes like gen_ai.agent.name or gen_ai.tool.call.id are referenced today. Instrumentations would then set these attributes when creating those spans.

Proposal Current approach (new span type) With grouping primitive (attributes on spans already being emitted)
#3419 ReAct Iterations Spans Define a new react_iteration span that wraps each LLM+tool cycle Add gen_ai.group.type = "react_round" and a shared gen_ai.group.id to the chat and execute_tool spans that the instrumentation already produces
#2912 Add Tasks Define a new gen_ai.task span type Add gen_ai.group.type = "task" and a shared gen_ai.group.id to the invoke_agent / execute_tool spans that already represent the work
#2993 Add Tool orchestration Define a new orchestrate_tools span Add gen_ai.group.type = "tool_cycle" and a shared gen_ai.group.id to the chat and execute_tool spans within the cycle
#3540 Add skill Define a new gen_ai.skill span type Add gen_ai.group.type = "skill" and a shared gen_ai.group.id to the spans emitted during skill execution

Key design decisions:

  • gen_ai.group.type is an open enum, frameworks define their own values, OTel does not prescribe what a "round" or "task" means
  • Multiple group memberships can be expressed by recording the attribute on multiple spans with the same gen_ai.group.id
  • Requirement level: recommended on invoke_agent, execute_tool, and invoke_workflow spans
  • This does not prevent frameworks from also creating parent spans for visualization hierarchy, it just provides an alternative that doesn't require it

2. Typed Span Links for GenAI Relationships

OTel span links already support expressing non-parent/child relationships between spans. The GenAI conventions should document and recommend their use.

Span links are not unknown to the GenAI SIG and they have come up in a few specific discussions:

However in each case span links are proposed as a solution for one narrow use case.They have not been considered as a general-purpose relationship primitive across GenAI conventions. This proposal suggests elevating them to that role.

Proposed guidance:

Instrumentations should consider using span links to express causal or semantic relationships between GenAI spans that are not captured by the parent-child hierarchy. When creating a span link, instrumentations should set an attribute on the link describing the relationship type.

Suggested link relationship types:

Relationship Description Example
triggered_by The span was triggered by the linked span's output Tool execution triggered by LLM tool_call response
delegates_to This agent delegates work to the linked agent parent agent invoking a sub-agent
evaluates Evaluation targets the linked span Evaluation event referencing the span it scores

Note: open-telemetry/semantic-conventions-genai#33 proposes span links for evaluation->target binding, validating this approach and this proposal is a generalization of that pattern.

Example:

invoke_agent main_agent
├── chat gpt-4                     (span_id: A)
│   └── response includes a tool_call: "search"
├── execute_tool search            (span_id: B, link: {span_id: A, type: "triggered_by"})
└── gen_ai.evaluation.result       (link: {span_id: B type: "evaluates"})

Relationship to Existing Proposals

This proposal is complementary to, not a replacement for the existing proposals. It offers a design principle:

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Need triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions