Skip to content

Adding missing attributes and metric for llamaindex instrumentation#274

Draft
shuningc wants to merge 1 commit intomainfrom
HYBIM-620-LlamaindexMissingAttrs
Draft

Adding missing attributes and metric for llamaindex instrumentation#274
shuningc wants to merge 1 commit intomainfrom
HYBIM-620-LlamaindexMissingAttrs

Conversation

@shuningc
Copy link
Copy Markdown
Contributor

@shuningc shuningc commented Apr 15, 2026

LlamaIndex instrumentation was missing several GenAI semantic convention attributes that LangChain already supports. Without these, users get incomplete observability — no response model info, no finish reasons, no streaming metrics, and no tool visibility. This change brings LlamaIndex to parity so users get consistent telemetry regardless of which framework they use.
Added 6 new LLM span attributes and TTFT metric emission:

Attribute Description
gen_ai.response.model Model name from response (fallback: raw → kwargs → request model)
gen_ai.response.finish_reasons Completion finish reasons (e.g. ["stop"])
gen_ai.request.max_tokens Max tokens from LLM metadata
gen_ai.request.stream Whether the request was streaming
gen_ai.response.time_to_first_chunk TTFT for streaming calls
gen_ai.tool.definitions Tool definitions from agent context
Plus the gen_ai.client.operation.time_to_first_chunk histogram metric, matching LangChain's metrics pipeline.

Changed files

  • [callback_handler.py] — Extract new attributes from raw LLM response, TTFT from tracker, tool definitions from agent context

  • event_handler.py — [TTFTTracker] class and [LlamaindexEventHandler] listening to [LLMChatInProgressEvent] for streaming chunk timing

  • [invocation_manager.py] — TTFT tracking methods, [find_agent_with_tools()] fallback for async ContextVar propagation

  • workflow_instrumentation.py — Capture agent tools in [wrap_agent_run()] and register with invocation manager

  • init.py — Wire up TTFTTracker and EventHandler during instrumentation

  • [test_agent_attributes.py] (new) — 9 tests: response model, finish reasons, token usage, max tokens, tool definitions, TTFT span + metric, non-streaming

  • [test_ttft.py] (new) — 16 unit tests for TTFTTracker and ContextVar correlation

  • [test_circuit_agent.py] (new) — Live integration test with Circuit API (skipped in CI)

  • [CHANGELOG.md] — Document new features

  • [.env.example] (new) — Example environment config for Circuit credentials
    Configuration
    OTEL_INSTRUMENTATION_GENAI_CAPTURE_TOOL_DEFINITIONS=true to enable tool definitions capture.

TTFT challenge: two separate systems

Calculating Time To First Chunk in LlamaIndex is significantly harder than in LangChain. LangChain has a single callback system — on_llm_new_token fires for every streaming token, so TTFT is simply time.perf_counter() - start_time on the first token callback.

LlamaIndex has two separate instrumentation systems that don't share state:

  1. Callback system (BaseCallbackHandler) — fires on_event_start/on_event_end for LLM calls with an event_id. This is where we create spans and set attributes.
  2. Event system (BaseEventHandler) — fires LLMChatStartEvent and LLMChatInProgressEvent for streaming chunks with a span_id. This is the only place streaming tokens are visible.

The problem: the callback system never sees individual streaming tokens, and the event system doesn't have access to the span/invocation objects. They use different ID schemes (event_id vs span_id).

Solution: TTFTTracker + ContextVar bridge

We introduced three components to bridge the gap:

  • TTFTTracker — a standalone timing class that records start times keyed by span_id, and calculates TTFT when the first token arrives. It also maintains a mapping between event_id (callback system) and span_id (event system).
  • LlamaindexEventHandler — a BaseEventHandler that listens for LLMChatStartEvent (records start time) and LLMChatInProgressEvent (records first token arrival). It writes timing data into the shared TTFTTracker.
  • ContextVar _current_llm_event_id — when the callback handler starts an LLM event (on_event_start), it stores its event_id in a ContextVar. The event handler reads this ContextVar when LLMChatStartEvent fires (same execution context), creating the event_id ↔ span_id association in TTFTTracker.

When the callback handler's on_event_end fires, it calls invocation_manager.get_ttft_for_event(event_id) which looks up the associated span_id in TTFTTracker, retrieves the calculated TTFT, and sets it on the LLMInvocation. The existing MetricsEmitter in the util layer then emits the gen_ai.client.operation.time_to_first_chunk histogram metric automatically — no changes needed in the emitter.

Tool definitions: async ContextVar propagation

Tool definitions faced a similar cross-boundary challenge. Agent tools are registered in wrap_agent_run() (synchronous wrapper), but the LLM callback fires inside async agent execution where ContextVars may not propagate. We solved this by:

  • Registering the agent with invocation_manager before the async wrapped call
  • Adding find_agent_with_tools() as a fallback search across all registered agents

What changed

File Change
callback_handler.py Extract response model (fallback chain: raw → kwargs → request model), finish reasons, max tokens, TTFT from tracker, tool definitions from agent context, request_stream flag
event_handler.py (new) TTFTTracker, LlamaindexEventHandler, ContextVar correlation functions
invocation_manager.py TTFT tracking methods (set_ttft_tracker, get_ttft_for_event, cleanup_event_tracking), find_agent_with_tools() fallback
workflow_instrumentation.py Capture agent tools in wrap_agent_run(), register with invocation manager before async execution
init.py Wire up TTFTTracker, EventHandler, and dispatcher registration
test_agent_attributes.py (new) 9 tests: response model, finish reasons, tokens, max tokens, tool definitions, TTFT span + metric, non-streaming
test_ttft.py (new) 16 unit tests for TTFTTracker and ContextVar correlation
test_circuit_agent.py (new) Live integration test with Circuit API (skipped in CI)
CHANGELOG.md Document new features
.env.example (new) Example environment config for examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant