Skip to content

Migrate Langfuse to typed LLM event#143

Merged
chris-colinsky merged 4 commits into
mainfrom
feature/0049-pr3c-langfuse-typed-migration
Jun 9, 2026
Merged

Migrate Langfuse to typed LLM event#143
chris-colinsky merged 4 commits into
mainfrom
feature/0049-pr3c-langfuse-typed-migration

Conversation

@chris-colinsky

@chris-colinsky chris-colinsky commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

  • Migrates the Langfuse observer's openarmature.llm.complete Generation observation lifecycle to drive off LlmCompletionEvent on the success path. Mirrors PR 3b's OTel shape: open + close the Generation in one shot at typed-event arrival, with start_time back-dated by latency_ms so the Langfuse UI shows the adapter-boundary measurement rather than dispatcher queue delay.
  • Widens the LangfuseClient Protocol with optional start_time / end_time on generation(...) and the Generation/Span handles' end(...). SDK adapter forwards both to the v4 SDK; InMemoryLangfuseClient stores them on LangfuseObservation for test assertions.
  • Drops the sentinel NodeEvent pair emission on the success path from OpenAIProvider.complete(). The bundled OTel and Langfuse observers now consume the typed event directly. Breaking for external custom observers: anyone filtering LLM calls by event.namespace == LLM_NAMESPACE MUST migrate to isinstance(event, LlmCompletionEvent) to continue seeing successful LLM calls. Failure-path sentinel emission stays (until the spec extends LlmCompletionEvent with error semantics — coord thread filed).
  • Updates the PR 3b CHANGELOG entry that said "sentinel pair drops in v0.15.0" — that timeline collapsed to v0.13.0 for the success path; failure-side sentinel stays until spec evolves.

Scope

PR 3c of the proposal 0049 + 0057 rollout. After this lands:

  • 3d: Activate conformance fixtures 060-068 against the typed-event harness.
  • The discuss-llm-completion-event-failure-path coord thread proposes either an LlmFailedEvent variant or error fields on LlmCompletionEvent; a future python release drops the failure-side sentinel once spec resolves.

Test plan

  • uv run pytest tests/unit/test_observability_langfuse.py tests/unit/test_observability_langfuse_adapter.py — 51 passed.
  • uv run pytest tests/ — 1213 passed (added 10 new tests across the three areas).
  • uv run pyright clean.
  • uv run ruff check + ruff format --check clean.
  • Live-account verification of the v4 SDK timestamp passthrough via tests/integration/test_langfuse_sdk_adapter.py::test_sdk_adapter_generation_timestamps_round_trip_through_langfuse — opt-in; run locally with LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY / LANGFUSE_BASE_URL env vars set.

Test coverage added

  • 6 new Langfuse typed-event tests in test_observability_langfuse.py: happy-path field mapping, latency back-dating, zero-duration fallback, no-invocation drop, disable_llm_spans gate, sentinel-error path.
  • 3 new SDK passthrough tests in test_observability_langfuse_adapter.py: start_time flows to start_observation(), end_time flows to obs.end(), no-kwarg branch pinned.
  • 1 white-box parallel-branches error-path test pinning the _resolve_llm_parent_observation_id keyword-only refactor.
  • 1 opt-in integration test round-tripping back-dated timestamps through Langfuse Cloud.
  • 5 existing tests in test_llm_provider.py migrated from sentinel-pair assertions to single-typed-event assertions on the success path.

Drive the openarmature.llm.complete Generation observation
lifecycle from LlmCompletionEvent on the success path, mirroring
PR 3b's OTel shape. Open and close the Generation in one shot at
typed-event arrival with start_time back-dated by latency_ms so
the Langfuse UI shows the adapter-boundary measurement rather
than dispatcher queue delay. Sentinel pair stays for the failure
path until the spec extends LlmCompletionEvent with error
semantics (coord thread filed).

Widen the LangfuseClient Protocol with optional start_time on
generation() and end_time on Span/Generation handle end(). The
SDK adapter forwards both to the v4 SDK; the InMemory client
stores them on LangfuseObservation for test assertions.

Drop the sentinel NodeEvent pair emission on the success path
from OpenAIProvider.complete(). The bundled OTel and Langfuse
observers consume the typed event directly; external custom
observers consuming LLM events MUST migrate to isinstance
discrimination. The sentinel completed event still fires on the
failure path; sentinel started is no longer emitted.
Copilot AI review requested due to automatic review settings June 9, 2026 00:28
Comment thread src/openarmature/observability/langfuse/client.py
Comment thread src/openarmature/observability/langfuse/client.py

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates Langfuse LLM observability to consume the typed LlmCompletionEvent on the success path (mirroring the OTel observer), including back-dated timestamps based on latency_ms, and removes success-path sentinel NodeEvent emissions from OpenAIProvider.complete().

Changes:

  • Langfuse observer now opens+closes a Generation directly from LlmCompletionEvent, with start_time back-dated from latency_ms; failure path remains sentinel-driven.
  • Langfuse client Protocol and SDK adapter add optional start_time/end_time passthrough for Generation creation and handle .end(...).
  • Provider/tests/CHANGELOG updated to reflect dropping success-path sentinel events and shifting observers to typed-event discrimination.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/test_observability_langfuse.py Adds typed LlmCompletionEvent coverage for Langfuse Generation emission, timestamp back-dating, and failure sentinel behavior.
tests/unit/test_observability_langfuse_adapter.py Verifies SDK adapter forwards start_time/end_time to Langfuse v4 SDK calls.
tests/unit/test_llm_provider.py Migrates assertions from sentinel success events to typed-event-only success emission; updates failure expectations.
tests/integration/test_langfuse_sdk_adapter.py Adds opt-in live integration test to round-trip start/end timestamps through Langfuse REST projection.
src/openarmature/observability/otel/observer.py Updates sentinel/typed-event documentation to reflect v0.13.0 behavior.
src/openarmature/observability/langfuse/observer.py Implements typed-event success-path Generation emission and switches sentinel handling to failure-only.
src/openarmature/observability/langfuse/client.py Extends Protocol + in-memory client to track optional start/end timestamps and accept end_time in .end().
src/openarmature/observability/langfuse/adapter.py Forwards start_time to start_observation(...) and end_time to obs.end(end_time=...).
src/openarmature/llm/providers/openai.py Removes success-path sentinel dispatch; keeps failure sentinel completed dispatch and emits typed event on success.
CHANGELOG.md Updates release notes to reflect typed-event-driven observers, timestamp passthrough, and sentinel emission changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openarmature/observability/langfuse/observer.py Outdated
Comment thread src/openarmature/observability/langfuse/observer.py Outdated
Comment thread src/openarmature/llm/providers/openai.py Outdated
The v4.7 Langfuse SDK exposes timestamp control only on its
internal OTel tracer, not on the public start_observation() API
that the adapter was calling. Two quirks the original Protocol
widening got wrong, both surfaced by live-account verification:

start_observation() rejects a start_time kwarg with TypeError.
When start_time is supplied, the adapter now mirrors the SDK's
own create_event precedent: open the OTel span directly via
_otel_tracer.start_span(name=, start_time=int_ns) within a
trace context and wrap the result in LangfuseGeneration. The
existing no-start_time path still uses the public API.

LangfuseSpan.end(end_time) is typed Optional[int] (nanoseconds),
not datetime. The adapter now converts the Protocol's datetime
surface to nanoseconds before forwarding. Without the conversion
the OTel span_processor's formatter crashes with TypeError on
end_time / 1e9 deep in the SDK.

Strengthen the unit tests: spy on both _otel_tracer.start_span
and start_observation so the back-dated path asserts the OTel
route is taken and the public-API path asserts the OTel tracer
is NOT touched. The previous monkeypatch test accepted **kwargs
and would have passed even with the broken implementation.

Widen the integration test's REST poll budget to 180s and use
server-side name+type filters; add a diagnostic that lists
observation names actually projected under the trace_id when
the target Generation doesn't appear, so a future name-mismatch
SDK change surfaces explicitly.
Tighten three stale comments that still referred to a sentinel
"pair" on the failure path. The provider now emits only a single
sentinel completed event on failure (no started counterpart);
the comments in langfuse/observer.py (dispatch site + handler
docstring) and openai.py (failure-emission site) didn't catch up
with the v0.13.0 emission change in the same PR.
Copilot AI review requested due to automatic review settings June 9, 2026 02:03

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment thread CHANGELOG.md Outdated
Comment thread tests/integration/test_langfuse_sdk_adapter.py Outdated
Both descriptions were written before live-account verification
revealed that v4.7 Langfuse SDK's start_observation rejects
start_time with TypeError. The CHANGELOG entry claimed the adapter
forwards via start_observation(start_time=...); the integration
test docstring said unit tests validate that surface. Rewrote
both to describe the actual routing: the back-dated path bypasses
start_observation and goes through the private _otel_tracer.start_
span, wrapping the OTel span in LangfuseGeneration directly. The
guarded failure mode shifts accordingly: not "SDK silently drops
start_time" but "future SDK renames _otel_tracer or moves
LangfuseGeneration", breaking the private-API path silently.
@chris-colinsky chris-colinsky merged commit aaa3cc3 into main Jun 9, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the feature/0049-pr3c-langfuse-typed-migration branch June 9, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants