Migrate Langfuse to typed LLM event#143
Merged
Merged
Conversation
Drive the openarmature.llm.complete Generation observation lifecycle from LlmCompletionEvent on the success path, mirroring PR 3b's OTel shape. Open and close the Generation in one shot at typed-event arrival with start_time back-dated by latency_ms so the Langfuse UI shows the adapter-boundary measurement rather than dispatcher queue delay. Sentinel pair stays for the failure path until the spec extends LlmCompletionEvent with error semantics (coord thread filed). Widen the LangfuseClient Protocol with optional start_time on generation() and end_time on Span/Generation handle end(). The SDK adapter forwards both to the v4 SDK; the InMemory client stores them on LangfuseObservation for test assertions. Drop the sentinel NodeEvent pair emission on the success path from OpenAIProvider.complete(). The bundled OTel and Langfuse observers consume the typed event directly; external custom observers consuming LLM events MUST migrate to isinstance discrimination. The sentinel completed event still fires on the failure path; sentinel started is no longer emitted.
There was a problem hiding this comment.
Pull request overview
This PR migrates Langfuse LLM observability to consume the typed LlmCompletionEvent on the success path (mirroring the OTel observer), including back-dated timestamps based on latency_ms, and removes success-path sentinel NodeEvent emissions from OpenAIProvider.complete().
Changes:
- Langfuse observer now opens+closes a Generation directly from
LlmCompletionEvent, withstart_timeback-dated fromlatency_ms; failure path remains sentinel-driven. - Langfuse client Protocol and SDK adapter add optional
start_time/end_timepassthrough for Generation creation and handle.end(...). - Provider/tests/CHANGELOG updated to reflect dropping success-path sentinel events and shifting observers to typed-event discrimination.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_observability_langfuse.py | Adds typed LlmCompletionEvent coverage for Langfuse Generation emission, timestamp back-dating, and failure sentinel behavior. |
| tests/unit/test_observability_langfuse_adapter.py | Verifies SDK adapter forwards start_time/end_time to Langfuse v4 SDK calls. |
| tests/unit/test_llm_provider.py | Migrates assertions from sentinel success events to typed-event-only success emission; updates failure expectations. |
| tests/integration/test_langfuse_sdk_adapter.py | Adds opt-in live integration test to round-trip start/end timestamps through Langfuse REST projection. |
| src/openarmature/observability/otel/observer.py | Updates sentinel/typed-event documentation to reflect v0.13.0 behavior. |
| src/openarmature/observability/langfuse/observer.py | Implements typed-event success-path Generation emission and switches sentinel handling to failure-only. |
| src/openarmature/observability/langfuse/client.py | Extends Protocol + in-memory client to track optional start/end timestamps and accept end_time in .end(). |
| src/openarmature/observability/langfuse/adapter.py | Forwards start_time to start_observation(...) and end_time to obs.end(end_time=...). |
| src/openarmature/llm/providers/openai.py | Removes success-path sentinel dispatch; keeps failure sentinel completed dispatch and emits typed event on success. |
| CHANGELOG.md | Updates release notes to reflect typed-event-driven observers, timestamp passthrough, and sentinel emission changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The v4.7 Langfuse SDK exposes timestamp control only on its internal OTel tracer, not on the public start_observation() API that the adapter was calling. Two quirks the original Protocol widening got wrong, both surfaced by live-account verification: start_observation() rejects a start_time kwarg with TypeError. When start_time is supplied, the adapter now mirrors the SDK's own create_event precedent: open the OTel span directly via _otel_tracer.start_span(name=, start_time=int_ns) within a trace context and wrap the result in LangfuseGeneration. The existing no-start_time path still uses the public API. LangfuseSpan.end(end_time) is typed Optional[int] (nanoseconds), not datetime. The adapter now converts the Protocol's datetime surface to nanoseconds before forwarding. Without the conversion the OTel span_processor's formatter crashes with TypeError on end_time / 1e9 deep in the SDK. Strengthen the unit tests: spy on both _otel_tracer.start_span and start_observation so the back-dated path asserts the OTel route is taken and the public-API path asserts the OTel tracer is NOT touched. The previous monkeypatch test accepted **kwargs and would have passed even with the broken implementation. Widen the integration test's REST poll budget to 180s and use server-side name+type filters; add a diagnostic that lists observation names actually projected under the trace_id when the target Generation doesn't appear, so a future name-mismatch SDK change surfaces explicitly.
Tighten three stale comments that still referred to a sentinel "pair" on the failure path. The provider now emits only a single sentinel completed event on failure (no started counterpart); the comments in langfuse/observer.py (dispatch site + handler docstring) and openai.py (failure-emission site) didn't catch up with the v0.13.0 emission change in the same PR.
Both descriptions were written before live-account verification revealed that v4.7 Langfuse SDK's start_observation rejects start_time with TypeError. The CHANGELOG entry claimed the adapter forwards via start_observation(start_time=...); the integration test docstring said unit tests validate that surface. Rewrote both to describe the actual routing: the back-dated path bypasses start_observation and goes through the private _otel_tracer.start_ span, wrapping the OTel span in LangfuseGeneration directly. The guarded failure mode shifts accordingly: not "SDK silently drops start_time" but "future SDK renames _otel_tracer or moves LangfuseGeneration", breaking the private-API path silently.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
openarmature.llm.completeGeneration observation lifecycle to drive offLlmCompletionEventon the success path. Mirrors PR 3b's OTel shape: open + close the Generation in one shot at typed-event arrival, withstart_timeback-dated bylatency_msso the Langfuse UI shows the adapter-boundary measurement rather than dispatcher queue delay.LangfuseClientProtocol with optionalstart_time/end_timeongeneration(...)and the Generation/Span handles'end(...). SDK adapter forwards both to the v4 SDK;InMemoryLangfuseClientstores them onLangfuseObservationfor test assertions.NodeEventpair emission on the success path fromOpenAIProvider.complete(). The bundled OTel and Langfuse observers now consume the typed event directly. Breaking for external custom observers: anyone filtering LLM calls byevent.namespace == LLM_NAMESPACEMUST migrate toisinstance(event, LlmCompletionEvent)to continue seeing successful LLM calls. Failure-path sentinel emission stays (until the spec extendsLlmCompletionEventwith error semantics — coord thread filed).Scope
PR 3c of the proposal 0049 + 0057 rollout. After this lands:
discuss-llm-completion-event-failure-pathcoord thread proposes either anLlmFailedEventvariant or error fields onLlmCompletionEvent; a future python release drops the failure-side sentinel once spec resolves.Test plan
uv run pytest tests/unit/test_observability_langfuse.py tests/unit/test_observability_langfuse_adapter.py— 51 passed.uv run pytest tests/— 1213 passed (added 10 new tests across the three areas).uv run pyrightclean.uv run ruff check+ruff format --checkclean.tests/integration/test_langfuse_sdk_adapter.py::test_sdk_adapter_generation_timestamps_round_trip_through_langfuse— opt-in; run locally withLANGFUSE_PUBLIC_KEY/LANGFUSE_SECRET_KEY/LANGFUSE_BASE_URLenv vars set.Test coverage added
test_observability_langfuse.py: happy-path field mapping, latency back-dating, zero-duration fallback, no-invocation drop,disable_llm_spansgate, sentinel-error path.test_observability_langfuse_adapter.py:start_timeflows tostart_observation(),end_timeflows toobs.end(), no-kwarg branch pinned._resolve_llm_parent_observation_idkeyword-only refactor.test_llm_provider.pymigrated from sentinel-pair assertions to single-typed-event assertions on the success path.