Adding missing attributes and metric for llamaindex instrumentation#274
Draft
Adding missing attributes and metric for llamaindex instrumentation#274
Conversation
This was referenced Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LlamaIndex instrumentation was missing several GenAI semantic convention attributes that LangChain already supports. Without these, users get incomplete observability — no response model info, no finish reasons, no streaming metrics, and no tool visibility. This change brings LlamaIndex to parity so users get consistent telemetry regardless of which framework they use.
Added 6 new LLM span attributes and TTFT metric emission:
Changed files
[callback_handler.py] — Extract new attributes from raw LLM response, TTFT from tracker, tool definitions from agent context
event_handler.py — [TTFTTracker] class and [LlamaindexEventHandler] listening to [LLMChatInProgressEvent] for streaming chunk timing
[invocation_manager.py] — TTFT tracking methods, [find_agent_with_tools()] fallback for async ContextVar propagation
workflow_instrumentation.py — Capture agent tools in [wrap_agent_run()] and register with invocation manager
init.py — Wire up TTFTTracker and EventHandler during instrumentation
[test_agent_attributes.py] (new) — 9 tests: response model, finish reasons, token usage, max tokens, tool definitions, TTFT span + metric, non-streaming
[test_ttft.py] (new) — 16 unit tests for TTFTTracker and ContextVar correlation
[test_circuit_agent.py] (new) — Live integration test with Circuit API (skipped in CI)
[CHANGELOG.md] — Document new features
[.env.example] (new) — Example environment config for Circuit credentials
Configuration
OTEL_INSTRUMENTATION_GENAI_CAPTURE_TOOL_DEFINITIONS=true to enable tool definitions capture.
TTFT challenge: two separate systems
Calculating Time To First Chunk in LlamaIndex is significantly harder than in LangChain. LangChain has a single callback system — on_llm_new_token fires for every streaming token, so TTFT is simply time.perf_counter() - start_time on the first token callback.
LlamaIndex has two separate instrumentation systems that don't share state:
The problem: the callback system never sees individual streaming tokens, and the event system doesn't have access to the span/invocation objects. They use different ID schemes (event_id vs span_id).
Solution: TTFTTracker + ContextVar bridge
We introduced three components to bridge the gap:
When the callback handler's on_event_end fires, it calls invocation_manager.get_ttft_for_event(event_id) which looks up the associated span_id in TTFTTracker, retrieves the calculated TTFT, and sets it on the LLMInvocation. The existing MetricsEmitter in the util layer then emits the gen_ai.client.operation.time_to_first_chunk histogram metric automatically — no changes needed in the emitter.
Tool definitions: async ContextVar propagation
Tool definitions faced a similar cross-boundary challenge. Agent tools are registered in wrap_agent_run() (synchronous wrapper), but the LLM callback fires inside async agent execution where ContextVars may not propagate. We solved this by:
What changed