Skip to content

Commit be28eac

Browse files
docs(langfuse): document trace naming and timeouts
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
1 parent 62b6bf7 commit be28eac

4 files changed

Lines changed: 88 additions & 4 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Mix 35+ components to build custom agents without inheritance bloat. The Entity-
139139
- **Tool Ecosystem** — Auto-discovery via `@tool` decorator, manual approval flows, secure `bwrap` sandboxing, and composable skills.
140140
- **MCP Integration** — Connect to external MCP tool servers via stdio, SSE, or HTTP transports with namespaced tool mapping.
141141
- **Prometheus Metrics**, Install low-cardinality runtime, LLM, tool, streaming, and runtime-control metrics on any `World` and expose them via render, ASGI/WSGI, or a standalone `/metrics` server.
142-
- **Langfuse Observability**, Capture full-fidelity traces, spans, and observations via `ecs-agent[langfuse]`. Install `install_langfuse_observability()` on any `World` to export user input, LLM generations, tool calls, retries, subagent runs, and errors to Langfuse. Supports mandatory redaction, one trace per interactive user turn (with one-shot run compatibility), nested `subagent.<name>` spans with child LLM/tool observations, tool calls that nest under the generation that requested them, recorded operation timing exported through Langfuse SDK v4 historical observation starts plus manual endings, readable model identifiers from `LLM_MODEL`, integer token usage, resilient background export, and Langfuse Sessions by propagating `session_id` as a trace-level attribute rather than metadata-only. See [`docs/features/langfuse.md`](docs/features/langfuse.md) for configuration via `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST`, plus live test commands (OpenAI/Anthropic) and skip behavior when credentials are missing. Credential rotation is recommended if keys are exposed.
142+
- **Langfuse Observability**, Capture traces, spans, and observations via `ecs-agent[langfuse]`. Install `install_langfuse_observability()` on any `World` to export user input, LLM generations, tool calls, retries, subagent runs, and errors to Langfuse; raw input and output capture remains enabled by default for backward compatibility and can be disabled with `LangfuseConfig(capture_input=False, capture_output=False)`. Supports mandatory redaction, one trace per interactive user turn (with one-shot run compatibility), nested `subagent.<name>` spans with child LLM/tool observations, tool calls that nest under the generation that requested them, recorded operation end timing through the Langfuse SDK v4 public lifecycle, optional private historical start-time backdating with `enable_private_v4_historical_otel=True`, readable model identifiers from `LLM_MODEL`, integer token usage, resilient background export, and Langfuse Sessions by propagating `session_id` as a trace-level attribute rather than metadata-only. See [`docs/features/langfuse.md`](docs/features/langfuse.md) for configuration via `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST`, plus live test commands (OpenAI/Anthropic) and skip behavior when credentials are missing. Credential rotation is recommended if keys are exposed.
143143

144144
## Architecture
145145

@@ -415,7 +415,7 @@ See [`docs/`](docs/) for detailed guides:
415415
- [Models](docs/models.md), model selection, registry routing, and built-in model implementations
416416
- [Streaming](docs/features/streaming.md), SSE streaming setup and usage
417417
- [Prometheus Metrics](docs/features/metrics.md), low-cardinality metrics and `/metrics` exposure helpers
418-
- [Langfuse Observability](docs/features/langfuse.md), full-fidelity traces, spans, and observations
418+
- [Langfuse Observability](docs/features/langfuse.md), traces, spans, observations, raw capture controls, and optional historical timing
419419
- [Structured Output](docs/features/structured-output.md), Pydantic schema → JSON mode
420420
- [Serialization](docs/features/serialization.md), World state persistence
421421
- [Logging](docs/features/logging.md), structlog integration

docs/features/langfuse.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,43 @@ The integration uses the following environment variables for configuration:
4747
- `LANGFUSE_PUBLIC_KEY`: Your Langfuse project public key.
4848
- `LANGFUSE_SECRET_KEY`: Your Langfuse project secret key.
4949
- `LANGFUSE_HOST` or `LANGFUSE_BASE_URL`: The Langfuse API host.
50+
- `LANGFUSE_TIMEOUT`: Langfuse SDK HTTP timeout in seconds. This also controls the default HTTP OTLP span exporter timeout in Langfuse SDK v4.
51+
52+
`LangfuseConfig` also exposes runtime safety controls:
53+
54+
- `capture_input` / `capture_output`: Enabled by default for backward-compatible full-fidelity traces. Set either value to `False` to suppress raw inputs or outputs from Langfuse export while preserving metadata, timing, model, usage, and redaction reports.
55+
- `enable_private_v4_historical_otel`: Disabled by default. When enabled, the adapter may use Langfuse SDK v4 private OpenTelemetry hooks to backdate observation start times. Keep this off unless you have validated the exact Langfuse SDK version you run in production.
56+
- `timeout`: Optional Langfuse SDK HTTP timeout in seconds. Use `LangfuseConfig(timeout=30)` or `LANGFUSE_TIMEOUT` for slower self-hosted deployments.
57+
58+
### Export timeout tuning
59+
60+
Langfuse SDK v4 exports observations through the OpenTelemetry HTTP OTLP exporter. A log such as `Failed to export span batch ... Read timed out. (read timeout=...)` means the background span batch upload to the configured Langfuse host timed out; it does not indicate a failed agent run. For slower self-hosted endpoints, raise the SDK timeout explicitly:
61+
62+
```python
63+
install_langfuse_observability(
64+
world,
65+
LangfuseConfig(
66+
timeout=30,
67+
flush_at=32,
68+
flush_interval=2.0,
69+
),
70+
)
71+
```
72+
73+
`flush_at` and `flush_interval` control how often batches are sent. They do not increase the per-request read timeout. If you configure OpenTelemetry directly, the corresponding OTLP environment variable is `OTEL_EXPORTER_OTLP_TRACES_TIMEOUT` (or the generic `OTEL_EXPORTER_OTLP_TIMEOUT`), in seconds.
5074

5175
> **Security Note**: Never hardcode your secret keys in source code. Use environment variables or a secret manager. If your credentials have been exposed outside a secure environment, we recommend a full credential rotation immediately.
5276
77+
## Langfuse Data Model
78+
79+
Langfuse organizes telemetry as **Session > Trace > Observation**:
80+
81+
- **Session**: A conversation or workflow grouping that can contain many traces. `LangfuseConfig.session_id` sets this grouping attribute.
82+
- **Trace**: A trace container for one request, user turn, or one-shot agent run. The trace owns the shared `trace_id` and groups the observation tree.
83+
- **Observation**: A node inside a trace. Span / Generation / Event records are observation types with different payload shapes.
84+
85+
The distinction between a trace container and a root observation is important. A trace container is the Langfuse grouping object; the root observation is the first visible node in that trace's tree. In ecs-agent, `trace_id` identifies the trace container, `root_observation_id` is the conceptual root node for child observations, and `parent_observation_id` links each child observation to its parent. The current internal `TelemetryRecord(kind="trace")` represents a trace root record; if such a record also has `parent_observation_id`, the Langfuse adapter treats it as a child span observation rather than as a second top-level trace container.
86+
5387
## Langfuse Sessions
5488

5589
Langfuse Sessions require `session_id` as a trace-level session attribute, not only as `metadata.session_id`. The SDK v4 adapter creates observations with `start_as_current_observation(...)` and calls `propagate_attributes(session_id=...)` while the root observation is active, so Langfuse can group the complete trace chain in the Sessions UI.
@@ -114,7 +148,7 @@ Observations captured include:
114148
- **Context Pressure**: Information about conversation compaction or windowing.
115149
- **Scores**: Automated evaluation scores if provided.
116150

117-
For Langfuse SDK v4, completed ECS records are exported with the manual observation lifecycle. When the SDK exposes its OpenTelemetry-backed manual span hooks, the adapter creates the observation with the recorded operation `start_time` and then calls `end(end_time=...)` with the recorded operation end timestamp, both converted to Langfuse's nanosecond epoch format. This keeps LLM reasoning, tool execution, and subagent span durations aligned with the actual work rather than the telemetry export call duration, preventing the Langfuse UI from displaying zero-latency observations for operations that already completed before export. Older or test clients that do not expose historical OTel start hooks fall back to `start_observation(...)` plus `end(end_time=...)`. Active root observations still use `start_as_current_observation(...)` so `session_id` can be propagated while the trace context is current.
151+
For Langfuse SDK v4, completed ECS records are exported with the public manual observation lifecycle by default: the adapter starts the observation when it exports the record, then calls `end(end_time=...)` with the recorded operation end timestamp. If you explicitly set `LangfuseConfig(enable_private_v4_historical_otel=True)` and your validated SDK version exposes the required private OpenTelemetry-backed span hooks, the adapter can also backdate the observation start using the recorded operation `start_time`; older, unsupported, or non-opted-in clients fall back to `start_observation(...)` plus `end(end_time=...)`. Active root observations still use `start_as_current_observation(...)` so `session_id` can be propagated while the trace context is current.
118152

119153
## Alerts and Monitoring
120154

@@ -127,6 +161,7 @@ The integration follows a strict data privacy policy. While raw prompts, respons
127161
- **Mandatory Redaction**: Sensitive patterns (like API keys or tokens) are automatically redacted from payloads.
128162
- **Redaction Reports**: Exported metadata includes counts and names of applied redaction rules, but never the redacted content itself.
129163
- **Model Names**: `LLM_MODEL` is intentionally not redacted because model identifiers drive Langfuse generation grouping and dashboard filters. Do not encode credentials, tenant secrets, or private data in model names.
164+
- **Raw Payload Capture**: Redaction is not a general privacy filter. User prompts, tool arguments, tool results, and model outputs may contain business-sensitive data that does not look like a secret. Use `LangfuseConfig(capture_input=False, capture_output=False)` when raw content should not leave the process.
130165

131166
## Telemetry Resilience
132167

examples/e2e/plan_and_task/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ Install the optional extra before enabling Langfuse for this example:
191191
uv pip install -e ".[langfuse]"
192192
```
193193

194-
When `PLAN_TASK_LANGFUSE` is enabled, `main.py` calls `install_plan_task_langfuse_observability()` after `build_plan_task_world(...)` creates the `World` and before `Runner.run(...)` starts. In interactive mode, every `UserInputReceivedEvent` starts a `user.turn` trace that covers the complete chain from that user input until the next user input or process exit: prompt normalization, retrieval/compaction, LLM generations, tool calls, subagent spans, retries, errors, context pressure, and completion scores all stay inside that turn trace. One-shot runs without interactive input keep the runner trace for backward compatibility. Completed LLM, tool, and subagent observations preserve the ECS-recorded start and end timestamps when exported to Langfuse SDK v4, so the Langfuse UI reports the actual operation latency instead of the near-zero telemetry export duration.
194+
When `PLAN_TASK_LANGFUSE` is enabled, `main.py` calls `install_plan_task_langfuse_observability()` after `build_plan_task_world(...)` creates the `World` and before `Runner.run(...)` starts. In interactive mode, every `UserInputReceivedEvent` starts a `user.turn` trace that covers the complete chain from that user input until the next user input or process exit: prompt normalization, retrieval/compaction, LLM generations, tool calls, subagent spans, retries, errors, context pressure, and completion scores all stay inside that turn trace. One-shot runs without interactive input keep the runner trace for backward compatibility. Completed LLM, tool, and subagent observations export their ECS-recorded end timestamps through the Langfuse SDK v4 public lifecycle; preserving historical start timestamps requires explicitly validating your SDK version and enabling `LangfuseConfig(enable_private_v4_historical_otel=True)` because that path uses private SDK hooks. Raw prompts, tool arguments, and outputs are captured by default for backward compatibility; use `LangfuseConfig(capture_input=False, capture_output=False)` if raw content should not leave the process.
195195

196196
Subagents are exported as `subagent.<name>` spans inside the active `user.turn` trace. Their child-world LLM calls are exported as `generation` observations under that subagent span, and child-world tool/retrieval/API work is exported as child spans/events under the same turn trace rather than creating another top-level Langfuse trace. When a child-world generation requests a tool, that tool observation stays attached to the requesting generation so the Langfuse hierarchy shows the exact delegation chain.
197197

tests/test_docs_langfuse_observability.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,21 @@ def test_langfuse_docs_mention_configuration() -> None:
5454
assert has_host, f"{source} missing LANGFUSE_HOST or LANGFUSE_BASE_URL alias policy"
5555

5656

57+
def test_langfuse_docs_describe_export_timeout_controls() -> None:
58+
"""Docs must explain how to tune self-hosted Langfuse export timeouts."""
59+
docs = get_langfuse_docs_content()
60+
61+
required_phrases = [
62+
"LangfuseConfig(timeout=",
63+
"LANGFUSE_TIMEOUT",
64+
"OTEL_EXPORTER_OTLP_TRACES_TIMEOUT",
65+
"read timeout",
66+
"flush_interval",
67+
]
68+
for phrase in required_phrases:
69+
assert phrase in docs, f"docs/features/langfuse.md missing timeout guidance: {phrase}"
70+
71+
5772
def test_langfuse_docs_describe_session_attribute_propagation() -> None:
5873
"""Docs must explain Langfuse Sessions need trace-level session propagation."""
5974
readme = get_readme_content()
@@ -106,6 +121,40 @@ def test_langfuse_docs_mention_credential_rotation() -> None:
106121
assert "rotation" in content.lower() or "rotate" in content.lower(), f"{source} must mention credential rotation"
107122

108123

124+
def test_langfuse_docs_describe_capture_controls_and_private_otel_opt_in() -> None:
125+
"""Docs must keep Langfuse safety controls visible where behavior is advertised."""
126+
readme = get_readme_content()
127+
docs = get_langfuse_docs_content()
128+
plan_task_readme = (PROJECT_ROOT / "examples" / "e2e" / "plan_and_task" / "README.md").read_text(
129+
encoding="utf-8"
130+
)
131+
132+
for content, source in [
133+
(readme, "README"),
134+
(docs, "docs/features/langfuse.md"),
135+
(plan_task_readme, "examples/e2e/plan_and_task/README.md"),
136+
]:
137+
assert "capture_input=False" in content, f"{source} missing capture_input opt-out guidance"
138+
assert "capture_output=False" in content, f"{source} missing capture_output opt-out guidance"
139+
assert "enable_private_v4_historical_otel" in content, f"{source} missing private OTel opt-in guidance"
140+
141+
142+
def test_langfuse_docs_explain_trace_and_observation_roots() -> None:
143+
"""Dedicated docs must distinguish Langfuse trace containers from root observations."""
144+
docs = get_langfuse_docs_content()
145+
146+
required_phrases = [
147+
"Session > Trace > Observation",
148+
"trace container",
149+
"root observation",
150+
"root_observation_id",
151+
"parent_observation_id",
152+
"Span / Generation / Event",
153+
]
154+
for phrase in required_phrases:
155+
assert phrase in docs, f"docs/features/langfuse.md missing trace-root concept: {phrase}"
156+
157+
109158
def test_langfuse_docs_do_not_include_secret_values() -> None:
110159
"""Docs must not contain secret values or assignment examples for secret env vars."""
111160
readme = get_readme_content()

0 commit comments

Comments
 (0)