HYBIM 665 Converting stop_llm calls from sync to async by shuningc · Pull Request #319 · signalfx/splunk-otel-python-contrib

shuningc · 2026-05-08T07:01:29Z

Problem

TelemetryHandler.stop_llm() (and all other stop_/fail_ methods) runs entirely on the caller's request thread. This includes:

Message serialization — _emitter.on_end() → SpanEmitter._apply_finish_attrs() serializes input/output messages into span attributes (JSON encoding,
string truncation)
Completion callbacks — _notify_completion() fans out to every registered callback (evaluators, persistence hooks, upload hooks), which may do I/O
Metric flush — meter_provider.force_flush() waits for the metric export cycle

In a multi-step agentic workflow, stop_llm is called once per LLM turn, so these costs stack. In high-throughput services that have added manual
instrumentation calls on the hot path, this adds measurable latency to every user-facing request.

Solution

Add an optional background ThreadPoolExecutor inside TelemetryHandler. When OTEL_INSTRUMENTATION_GENAI_ASYNC_FINALIZATION=true, the expensive part of
each stop_/fail_ call is submitted to the thread pool instead of running inline. The flag defaults to false — no behavior change without opt-in.

File-by-file changes

environment_variables.py — two new env var constants

Why: All configuration in this library is driven by env vars. Adding the two new variables here keeps them alongside the existing constants, makes them
importable by name, and ensures they appear in all for documentation tooling.

OTEL_INSTRUMENTATION_GENAI_ASYNC_FINALIZATION — opt-in flag (default false)
OTEL_INSTRUMENTATION_GENAI_ASYNC_FINALIZATION_QUEUE_SIZE — max concurrent+queued tasks before falling back inline (default 128)

handler.py — core implementation

Why this approach: TelemetryHandler is the single shared singleton that all instrumented frameworks call. Fixing it here fixes LangChain, LlamaIndex,
CrewAI, and FastMCP in one place without changing any instrumentation code.

What runs inline (unchanged behaviour):

invocation.end_time — must be set before the call returns so timing is accurate
pop_current_span — resets a ContextVar and detaches an OTel context token that belongs to the caller's thread. Moving this to a background thread
would corrupt parent/child span relationships for any code that runs after stop* returns
_agent_context_stack pop in stop_agent/fail_agent — same thread-safety requirement
_emitter.on_end() / _emitter.on_error() — calls span.end(), which must complete before the framework proceeds (LlamaIndex's workflow event loop checks
span.is_recording() to decide whether to open new child spans; if span.end() is deferred, it sees the parent still open and creates extra outer wrapper
spans)

What moves to the background:

_notify_completion(invocation) — fans out to completion callbacks (evaluators, upload hooks). These are the expensive part
meter_provider.force_flush() — waits for metric export

_submit_finalization(fn) helper logic:

If _finalizer_executor is None (flag off or after shutdown): run fn() inline — identical to old behaviour
If the BoundedSemaphore cannot be acquired (queue full): run fn() inline — telemetry is never silently dropped
Otherwise: submit to the thread pool and release the semaphore in a done_callback when the task completes

shutdown(wait=True): Drains the thread pool. Called automatically by _reset_for_testing() so tests don't leak background threads between cases.

tests/test_async_finalization.py — new test file (16 tests)

Why: The async/inline boundary is subtle. Without tests it's easy to accidentally move something that must stay inline (like _pop_current_span) into the
background closure.

Four test classes:

TestDefaultInlineBehavior — with flag off, confirms executor is None and on_end/callbacks are called synchronously before stop_llm returns
TestAsyncFinalizationEnabled — with flag on, confirms:
- on_end runs on the caller's thread before stop_llm returns
- _notify_completion runs on a background thread
- end_time is set inline
- _pop_current_span restores context inline
- callbacks eventually complete after shutdown(wait=True)
- shutdown drains all queued work
- all invocation types (stop_embedding, stop_workflow, fail_llm) follow the same pattern
TestQueueFullFallback — drains the semaphore to simulate a full queue, verifies on_end falls back to the caller's thread
TestShutdown — shutdown() is idempotent, clears executor, and subsequent stop_llm calls still emit telemetry inline

Correctness verification

Two local test scripts run the same agent query twice (sync then async) against the real Circuit API and compare span output:

.local/test_circuit_langchain.py — LangChain create_react_agent via LangchainInstrumentor (auto instrumentation)

Both confirm same span count, same span names, same attributes between sync and async modes.

Non-breaking guarantee

Flag defaults to false — zero behaviour change for all existing users
When flag is on, span.end() still completes before stop_* returns, so all OTel span timing, parent/child relationships, and framework-level span checks
are unaffected
Queue-full fallback ensures telemetry is never dropped under backpressure

langchain sync output

[
  {
    "name": "chat gpt-5-nano",
    "trace_id": "c0d7afd89e3687f2967972ab19ccb102",
    "span_id": "888688537ba42111",
    "parent_span_id": "145768e7306eb692",
    "start_time": 1777946530406614000,
    "end_time": 1777946534038007000,
    "duration_ms": 3631.393,
    "status": "StatusCode.UNSET",
    "attributes": {
      "gen_ai.tool.definitions": "[{\"type\": \"function\", \"function\": {\"name\": \"get_weather\", \"description\": \"Get the current weather for a city.\", \"parameters\": {\"properties\": {\"city\": {\"type\": \"string\"}}, \"required\": [\"city\"], \"type\": \"object\"}}}, {\"type\": \"function\", \"function\": {\"name\": \"get_time\", \"description\": \"Get the current time in a timezone.\", \"parameters\": {\"properties\": {\"timezone\": {\"type\": \"string\"}}, \"required\": [\"timezone\"], \"type\": \"object\"}}}, {\"type\": \"function\", \"function\": {\"name\": \"calculate\", \"description\": \"Calculate a math expression.\", \"parameters\": {\"properties\": {\"expression\": {\"type\": \"string\"}}, \"required\": [\"expression\"], \"type\": \"object\"}}}]",
      "gen_ai.evaluation.sampled": true,
      "gen_ai.evaluation.error": "None",
      "gen_ai.input.messages": "[{\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"You are a helpful assistant. Use tools when needed.\"}]}, {\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"What is 2 + 2?\"}]}]",
      "gen_ai.provider.name": "openai",
      "gen_ai.agent.name": "circuit-react-agent",
      "gen_ai.agent.id": "c49542bbf7b48e41",
      "gen_ai.request.model": "gpt-5-nano",
      "gen_ai.operation.name": "chat",
      "gen_ai.response.model": "gpt-5-nano-2025-08-07",
      "gen_ai.usage.input_tokens": 188,
      "gen_ai.usage.output_tokens": 75,
      "gen_ai.response.finish_reasons": [
        "stop"
      ],
      "gen_ai.request.stream": false,
      "gen_ai.output.messages": "[{\"role\": \"assistant\", \"parts\": [{\"type\": \"text\", \"content\": \"4\"}], \"finish_reason\": \"stop\"}]"
    },
    "events": []
  },
  {
    "name": "step model",
    "trace_id": "c0d7afd89e3687f2967972ab19ccb102",
    "span_id": "145768e7306eb692",
    "parent_span_id": "c49542bbf7b48e41",
    "start_time": 1777946530403703000,
    "end_time": 1777946534038877000,
    "duration_ms": 3635.174,
    "status": "StatusCode.UNSET",
    "attributes": {
      "gen_ai.step.name": "model",
      "gen_ai.step.type": "chain",
      "gen_ai.evaluation.sampled": true,
      "gen_ai.evaluation.error": "None",
      "gen_ai.agent.name": "circuit-react-agent",
      "gen_ai.agent.id": "c49542bbf7b48e41"
    },
    "events": []
  },
  {
    "name": "invoke_agent circuit-react-agent",
    "trace_id": "c0d7afd89e3687f2967972ab19ccb102",
    "span_id": "c49542bbf7b48e41",
    "parent_span_id": null,
    "start_time": 1777946530402301000,
    "end_time": 1777946534046669000,
    "duration_ms": 3644.368,
    "status": "StatusCode.UNSET",
    "attributes": {
      "gen_ai.agent.id": "c49542bbf7b48e41",
      "gen_ai.framework": "langchain",
      "gen_ai.evaluation.sampled": true,
      "gen_ai.evaluation.error": "None",
      "gen_ai.output.messages": "[{\"role\": \"assistant\", \"parts\": [{\"type\": \"text\", \"content\": \"4\"}]}]",
      "gen_ai.agent.name": "circuit-react-agent",
      "gen_ai.conversation_root": true,
      "gen_ai.operation.name": "invoke_agent"
    },
    "events": []
  }
]

async span output

[
  {
    "name": "chat gpt-5-nano",
    "trace_id": "7535042f0263427201f59e5f01475de5",
    "span_id": "dc5a3b3c6cf342f3",
    "parent_span_id": "9f1809b2d7c2fc17",
    "start_time": 1777946534057605000,
    "end_time": 1777946535507804000,
    "duration_ms": 1450.199,
    "status": "StatusCode.UNSET",
    "attributes": {
      "gen_ai.tool.definitions": "[{\"type\": \"function\", \"function\": {\"name\": \"get_weather\", \"description\": \"Get the current weather for a city.\", \"parameters\": {\"properties\": {\"city\": {\"type\": \"string\"}}, \"required\": [\"city\"], \"type\": \"object\"}}}, {\"type\": \"function\", \"function\": {\"name\": \"get_time\", \"description\": \"Get the current time in a timezone.\", \"parameters\": {\"properties\": {\"timezone\": {\"type\": \"string\"}}, \"required\": [\"timezone\"], \"type\": \"object\"}}}, {\"type\": \"function\", \"function\": {\"name\": \"calculate\", \"description\": \"Calculate a math expression.\", \"parameters\": {\"properties\": {\"expression\": {\"type\": \"string\"}}, \"required\": [\"expression\"], \"type\": \"object\"}}}]",
      "gen_ai.evaluation.sampled": true,
      "gen_ai.evaluation.error": "None",
      "gen_ai.input.messages": "[{\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"You are a helpful assistant. Use tools when needed.\"}]}, {\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"What is 2 + 2?\"}]}]",
      "gen_ai.provider.name": "openai",
      "gen_ai.agent.name": "circuit-react-agent",
      "gen_ai.agent.id": "69d14c88266c2d3b",
      "gen_ai.request.model": "gpt-5-nano",
      "gen_ai.operation.name": "chat",
      "gen_ai.response.model": "gpt-5-nano-2025-08-07",
      "gen_ai.usage.input_tokens": 188,
      "gen_ai.usage.output_tokens": 75,
      "gen_ai.response.finish_reasons": [
        "stop"
      ],
      "gen_ai.request.stream": false,
      "gen_ai.output.messages": "[{\"role\": \"assistant\", \"parts\": [{\"type\": \"text\", \"content\": \"4\"}], \"finish_reason\": \"stop\"}]"
    },
    "events": []
  },
  {
    "name": "step model",
    "trace_id": "7535042f0263427201f59e5f01475de5",
    "span_id": "9f1809b2d7c2fc17",
    "parent_span_id": "69d14c88266c2d3b",
    "start_time": 1777946534054480000,
    "end_time": 1777946535508086000,
    "duration_ms": 1453.606,
    "status": "StatusCode.UNSET",
    "attributes": {
      "gen_ai.step.name": "model",
      "gen_ai.step.type": "chain",
      "gen_ai.evaluation.sampled": true,
      "gen_ai.evaluation.error": "None",
      "gen_ai.agent.name": "circuit-react-agent",
      "gen_ai.agent.id": "69d14c88266c2d3b"
    },
    "events": []
  },
  {
    "name": "invoke_agent circuit-react-agent",
    "trace_id": "7535042f0263427201f59e5f01475de5",
    "span_id": "69d14c88266c2d3b",
    "parent_span_id": null,
    "start_time": 1777946534053922000,
    "end_time": 1777946535508631000,
    "duration_ms": 1454.709,
    "status": "StatusCode.UNSET",
    "attributes": {
      "gen_ai.agent.id": "69d14c88266c2d3b",
      "gen_ai.framework": "langchain",
      "gen_ai.evaluation.sampled": true,
      "gen_ai.evaluation.error": "None",
      "gen_ai.output.messages": "[{\"role\": \"assistant\", \"parts\": [{\"type\": \"text\", \"content\": \"4\"}]}]",
      "gen_ai.agent.name": "circuit-react-agent",
      "gen_ai.conversation_root": true,
      "gen_ai.operation.name": "invoke_agent"
    },
    "events": []
  }
]

…el_planner Accidentally committed during local debugging; not related to async finalization work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

shuningc and others added 6 commits May 7, 2026 23:46

Converting stop_llm call from sync to async

ef65901

revert(example): remove debug Console exporters from multi_agent_trav…

4035db9

…el_planner Accidentally committed during local debugging; not related to async finalization work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Linting fix

3130298

Ruff format fix

fadd509

Dealing with comments

9b35e3c

Removing evaluation error attribute from span emission

ed9abd4

shuningc requested review from a team as code owners May 8, 2026 07:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HYBIM 665 Converting stop_llm calls from sync to async#319

HYBIM 665 Converting stop_llm calls from sync to async#319
shuningc wants to merge 6 commits into
mainfrom
HYBIM-665-async-stop_llm

shuningc commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuningc commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant