HYBIM 665 Converting stop_llm calls from sync to async#319
Open
shuningc wants to merge 6 commits into
Open
Conversation
…el_planner Accidentally committed during local debugging; not related to async finalization work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
TelemetryHandler.stop_llm() (and all other stop_/fail_ methods) runs entirely on the caller's request thread. This includes:
string truncation)
In a multi-step agentic workflow, stop_llm is called once per LLM turn, so these costs stack. In high-throughput services that have added manual
instrumentation calls on the hot path, this adds measurable latency to every user-facing request.
Solution
Add an optional background ThreadPoolExecutor inside TelemetryHandler. When OTEL_INSTRUMENTATION_GENAI_ASYNC_FINALIZATION=true, the expensive part of
each stop_/fail_ call is submitted to the thread pool instead of running inline. The flag defaults to false — no behavior change without opt-in.
File-by-file changes
Why: All configuration in this library is driven by env vars. Adding the two new variables here keeps them alongside the existing constants, makes them
importable by name, and ensures they appear in all for documentation tooling.
Why this approach: TelemetryHandler is the single shared singleton that all instrumented frameworks call. Fixing it here fixes LangChain, LlamaIndex,
CrewAI, and FastMCP in one place without changing any instrumentation code.
What runs inline (unchanged behaviour):
would corrupt parent/child span relationships for any code that runs after stop* returns
span.is_recording() to decide whether to open new child spans; if span.end() is deferred, it sees the parent still open and creates extra outer wrapper
spans)
What moves to the background:
_submit_finalization(fn) helper logic:
shutdown(wait=True): Drains the thread pool. Called automatically by _reset_for_testing() so tests don't leak background threads between cases.
Why: The async/inline boundary is subtle. Without tests it's easy to accidentally move something that must stay inline (like _pop_current_span) into the
background closure.
Four test classes:
Correctness verification
Two local test scripts run the same agent query twice (sync then async) against the real Circuit API and compare span output:
Both confirm same span count, same span names, same attributes between sync and async modes.
Non-breaking guarantee
are unaffected
langchain sync output
async span output