Migrate Langfuse to typed LLM event by chris-colinsky · Pull Request #143 · LunarCommand/openarmature-python

chris-colinsky · 2026-06-09T00:28:53Z

Summary

Migrates the Langfuse observer's openarmature.llm.complete Generation observation lifecycle to drive off LlmCompletionEvent on the success path. Mirrors PR 3b's OTel shape: open + close the Generation in one shot at typed-event arrival, with start_time back-dated by latency_ms so the Langfuse UI shows the adapter-boundary measurement rather than dispatcher queue delay.
Widens the LangfuseClient Protocol with optional start_time / end_time on generation(...) and the Generation/Span handles' end(...). SDK adapter forwards both to the v4 SDK; InMemoryLangfuseClient stores them on LangfuseObservation for test assertions.
Drops the sentinel NodeEvent pair emission on the success path from OpenAIProvider.complete(). The bundled OTel and Langfuse observers now consume the typed event directly. Breaking for external custom observers: anyone filtering LLM calls by event.namespace == LLM_NAMESPACE MUST migrate to isinstance(event, LlmCompletionEvent) to continue seeing successful LLM calls. Failure-path sentinel emission stays (until the spec extends LlmCompletionEvent with error semantics — coord thread filed).
Updates the PR 3b CHANGELOG entry that said "sentinel pair drops in v0.15.0" — that timeline collapsed to v0.13.0 for the success path; failure-side sentinel stays until spec evolves.

Scope

PR 3c of the proposal 0049 + 0057 rollout. After this lands:

3d: Activate conformance fixtures 060-068 against the typed-event harness.
The discuss-llm-completion-event-failure-path coord thread proposes either an LlmFailedEvent variant or error fields on LlmCompletionEvent; a future python release drops the failure-side sentinel once spec resolves.

Test plan

uv run pytest tests/unit/test_observability_langfuse.py tests/unit/test_observability_langfuse_adapter.py — 51 passed.
uv run pytest tests/ — 1213 passed (added 10 new tests across the three areas).
uv run pyright clean.
uv run ruff check + ruff format --check clean.
Live-account verification of the v4 SDK timestamp passthrough via tests/integration/test_langfuse_sdk_adapter.py::test_sdk_adapter_generation_timestamps_round_trip_through_langfuse — opt-in; run locally with LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY / LANGFUSE_BASE_URL env vars set.

Test coverage added

6 new Langfuse typed-event tests in test_observability_langfuse.py: happy-path field mapping, latency back-dating, zero-duration fallback, no-invocation drop, disable_llm_spans gate, sentinel-error path.
3 new SDK passthrough tests in test_observability_langfuse_adapter.py: start_time flows to start_observation(), end_time flows to obs.end(), no-kwarg branch pinned.
1 white-box parallel-branches error-path test pinning the _resolve_llm_parent_observation_id keyword-only refactor.
1 opt-in integration test round-tripping back-dated timestamps through Langfuse Cloud.
5 existing tests in test_llm_provider.py migrated from sentinel-pair assertions to single-typed-event assertions on the success path.

Drive the openarmature.llm.complete Generation observation lifecycle from LlmCompletionEvent on the success path, mirroring PR 3b's OTel shape. Open and close the Generation in one shot at typed-event arrival with start_time back-dated by latency_ms so the Langfuse UI shows the adapter-boundary measurement rather than dispatcher queue delay. Sentinel pair stays for the failure path until the spec extends LlmCompletionEvent with error semantics (coord thread filed). Widen the LangfuseClient Protocol with optional start_time on generation() and end_time on Span/Generation handle end(). The SDK adapter forwards both to the v4 SDK; the InMemory client stores them on LangfuseObservation for test assertions. Drop the sentinel NodeEvent pair emission on the success path from OpenAIProvider.complete(). The bundled OTel and Langfuse observers consume the typed event directly; external custom observers consuming LLM events MUST migrate to isinstance discrimination. The sentinel completed event still fires on the failure path; sentinel started is no longer emitted.

Copilot

Pull request overview

This PR migrates Langfuse LLM observability to consume the typed LlmCompletionEvent on the success path (mirroring the OTel observer), including back-dated timestamps based on latency_ms, and removes success-path sentinel NodeEvent emissions from OpenAIProvider.complete().

Changes:

Langfuse observer now opens+closes a Generation directly from LlmCompletionEvent, with start_time back-dated from latency_ms; failure path remains sentinel-driven.
Langfuse client Protocol and SDK adapter add optional start_time/end_time passthrough for Generation creation and handle .end(...).
Provider/tests/CHANGELOG updated to reflect dropping success-path sentinel events and shifting observers to typed-event discrimination.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit/test_observability_langfuse.py	Adds typed `LlmCompletionEvent` coverage for Langfuse Generation emission, timestamp back-dating, and failure sentinel behavior.
tests/unit/test_observability_langfuse_adapter.py	Verifies SDK adapter forwards `start_time`/`end_time` to Langfuse v4 SDK calls.
tests/unit/test_llm_provider.py	Migrates assertions from sentinel success events to typed-event-only success emission; updates failure expectations.
tests/integration/test_langfuse_sdk_adapter.py	Adds opt-in live integration test to round-trip start/end timestamps through Langfuse REST projection.
src/openarmature/observability/otel/observer.py	Updates sentinel/typed-event documentation to reflect v0.13.0 behavior.
src/openarmature/observability/langfuse/observer.py	Implements typed-event success-path Generation emission and switches sentinel handling to failure-only.
src/openarmature/observability/langfuse/client.py	Extends Protocol + in-memory client to track optional start/end timestamps and accept `end_time` in `.end()`.
src/openarmature/observability/langfuse/adapter.py	Forwards `start_time` to `start_observation(...)` and `end_time` to `obs.end(end_time=...)`.
src/openarmature/llm/providers/openai.py	Removes success-path sentinel dispatch; keeps failure sentinel `completed` dispatch and emits typed event on success.
CHANGELOG.md	Updates release notes to reflect typed-event-driven observers, timestamp passthrough, and sentinel emission changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The v4.7 Langfuse SDK exposes timestamp control only on its internal OTel tracer, not on the public start_observation() API that the adapter was calling. Two quirks the original Protocol widening got wrong, both surfaced by live-account verification: start_observation() rejects a start_time kwarg with TypeError. When start_time is supplied, the adapter now mirrors the SDK's own create_event precedent: open the OTel span directly via _otel_tracer.start_span(name=, start_time=int_ns) within a trace context and wrap the result in LangfuseGeneration. The existing no-start_time path still uses the public API. LangfuseSpan.end(end_time) is typed Optional[int] (nanoseconds), not datetime. The adapter now converts the Protocol's datetime surface to nanoseconds before forwarding. Without the conversion the OTel span_processor's formatter crashes with TypeError on end_time / 1e9 deep in the SDK. Strengthen the unit tests: spy on both _otel_tracer.start_span and start_observation so the back-dated path asserts the OTel route is taken and the public-API path asserts the OTel tracer is NOT touched. The previous monkeypatch test accepted **kwargs and would have passed even with the broken implementation. Widen the integration test's REST poll budget to 180s and use server-side name+type filters; add a diagnostic that lists observation names actually projected under the trace_id when the target Generation doesn't appear, so a future name-mismatch SDK change surfaces explicitly.

Tighten three stale comments that still referred to a sentinel "pair" on the failure path. The provider now emits only a single sentinel completed event on failure (no started counterpart); the comments in langfuse/observer.py (dispatch site + handler docstring) and openai.py (failure-emission site) didn't catch up with the v0.13.0 emission change in the same PR.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Both descriptions were written before live-account verification revealed that v4.7 Langfuse SDK's start_observation rejects start_time with TypeError. The CHANGELOG entry claimed the adapter forwards via start_observation(start_time=...); the integration test docstring said unit tests validate that surface. Rewrote both to describe the actual routing: the back-dated path bypasses start_observation and goes through the private _otel_tracer.start_ span, wrapping the OTel span in LangfuseGeneration directly. The guarded failure mode shifts accordingly: not "SDK silently drops start_time" but "future SDK renames _otel_tracer or moves LangfuseGeneration", breaking the private-API path silently.

Copilot AI review requested due to automatic review settings June 9, 2026 00:28

Copilot started reviewing on behalf of chris-colinsky June 9, 2026 00:29 View session

github-code-quality Bot found potential problems Jun 9, 2026

View reviewed changes

Comment thread src/openarmature/observability/langfuse/client.py

Comment thread src/openarmature/observability/langfuse/client.py

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Comment thread src/openarmature/observability/langfuse/observer.py Outdated

Comment thread src/openarmature/observability/langfuse/observer.py Outdated

Comment thread src/openarmature/llm/providers/openai.py Outdated

chris-colinsky added 2 commits June 8, 2026 17:59

Copilot AI review requested due to automatic review settings June 9, 2026 02:03

Copilot started reviewing on behalf of chris-colinsky June 9, 2026 02:03 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Comment thread tests/integration/test_langfuse_sdk_adapter.py Outdated

chris-colinsky merged commit aaa3cc3 into main Jun 9, 2026
6 checks passed

chris-colinsky deleted the feature/0049-pr3c-langfuse-typed-migration branch June 9, 2026 02:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Langfuse to typed LLM event#143

Migrate Langfuse to typed LLM event#143
chris-colinsky merged 4 commits into
mainfrom
feature/0049-pr3c-langfuse-typed-migration

chris-colinsky commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chris-colinsky commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Test plan

Test coverage added

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chris-colinsky commented Jun 9, 2026 •

edited

Loading