You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -139,7 +139,7 @@ Mix 35+ components to build custom agents without inheritance bloat. The Entity-
139
139
-**Tool Ecosystem** — Auto-discovery via `@tool` decorator, manual approval flows, secure `bwrap` sandboxing, and composable skills.
140
140
-**MCP Integration** — Connect to external MCP tool servers via stdio, SSE, or HTTP transports with namespaced tool mapping.
141
141
-**Prometheus Metrics**, Install low-cardinality runtime, LLM, tool, streaming, and runtime-control metrics on any `World` and expose them via render, ASGI/WSGI, or a standalone `/metrics` server.
142
-
- **Langfuse Observability**, Capture full-fidelity traces, spans, and observations via `ecs-agent[langfuse]`. Install `install_langfuse_observability()` on any `World` to export user input, LLM generations, tool calls, retries, subagent runs, and errors to Langfuse. Supports mandatory redaction, one trace per interactive user turn (with one-shot run compatibility), nested `subagent.<name>` spans with child LLM/tool observations, tool calls that nest under the generation that requested them, recorded operation timing exported through Langfuse SDK v4 historical observation starts plus manual endings, readable model identifiers from `LLM_MODEL`, integer token usage, resilient background export, and Langfuse Sessions by propagating `session_id` as a trace-level attribute rather than metadata-only. See [`docs/features/langfuse.md`](docs/features/langfuse.md) for configuration via `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST`, plus live test commands (OpenAI/Anthropic) and skip behavior when credentials are missing. Credential rotation is recommended if keys are exposed.
142
+
- **Langfuse Observability**, Capture traces, spans, and observations via `ecs-agent[langfuse]`. Install `install_langfuse_observability()` on any `World` to export user input, LLM generations, tool calls, retries, subagent runs, and errors to Langfuse; raw input and output capture remains enabled by default for backward compatibility and can be disabled with `LangfuseConfig(capture_input=False, capture_output=False)`. Supports mandatory redaction, one trace per interactive user turn (with one-shot run compatibility), nested `subagent.<name>` spans with child LLM/tool observations, tool calls that nest under the generation that requested them, recorded operation end timing through the Langfuse SDK v4 public lifecycle, optional private historical start-time backdating with `enable_private_v4_historical_otel=True`, readable model identifiers from `LLM_MODEL`, integer token usage, resilient background export, and Langfuse Sessions by propagating `session_id` as a trace-level attribute rather than metadata-only. See [`docs/features/langfuse.md`](docs/features/langfuse.md) for configuration via `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST`, plus live test commands (OpenAI/Anthropic) and skip behavior when credentials are missing. Credential rotation is recommended if keys are exposed.
143
143
144
144
## Architecture
145
145
@@ -415,7 +415,7 @@ See [`docs/`](docs/) for detailed guides:
415
415
-[Models](docs/models.md), model selection, registry routing, and built-in model implementations
416
416
-[Streaming](docs/features/streaming.md), SSE streaming setup and usage
417
417
-[Prometheus Metrics](docs/features/metrics.md), low-cardinality metrics and `/metrics` exposure helpers
418
-
-[Langfuse Observability](docs/features/langfuse.md), full-fidelity traces, spans, and observations
418
+
-[Langfuse Observability](docs/features/langfuse.md), traces, spans, observations, raw capture controls, and optional historical timing
Copy file name to clipboardExpand all lines: docs/features/langfuse.md
+36-1Lines changed: 36 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,9 +47,43 @@ The integration uses the following environment variables for configuration:
47
47
-`LANGFUSE_PUBLIC_KEY`: Your Langfuse project public key.
48
48
-`LANGFUSE_SECRET_KEY`: Your Langfuse project secret key.
49
49
-`LANGFUSE_HOST` or `LANGFUSE_BASE_URL`: The Langfuse API host.
50
+
-`LANGFUSE_TIMEOUT`: Langfuse SDK HTTP timeout in seconds. This also controls the default HTTP OTLP span exporter timeout in Langfuse SDK v4.
51
+
52
+
`LangfuseConfig` also exposes runtime safety controls:
53
+
54
+
-`capture_input` / `capture_output`: Enabled by default for backward-compatible full-fidelity traces. Set either value to `False` to suppress raw inputs or outputs from Langfuse export while preserving metadata, timing, model, usage, and redaction reports.
55
+
-`enable_private_v4_historical_otel`: Disabled by default. When enabled, the adapter may use Langfuse SDK v4 private OpenTelemetry hooks to backdate observation start times. Keep this off unless you have validated the exact Langfuse SDK version you run in production.
56
+
-`timeout`: Optional Langfuse SDK HTTP timeout in seconds. Use `LangfuseConfig(timeout=30)` or `LANGFUSE_TIMEOUT` for slower self-hosted deployments.
57
+
58
+
### Export timeout tuning
59
+
60
+
Langfuse SDK v4 exports observations through the OpenTelemetry HTTP OTLP exporter. A log such as `Failed to export span batch ... Read timed out. (read timeout=...)` means the background span batch upload to the configured Langfuse host timed out; it does not indicate a failed agent run. For slower self-hosted endpoints, raise the SDK timeout explicitly:
61
+
62
+
```python
63
+
install_langfuse_observability(
64
+
world,
65
+
LangfuseConfig(
66
+
timeout=30,
67
+
flush_at=32,
68
+
flush_interval=2.0,
69
+
),
70
+
)
71
+
```
72
+
73
+
`flush_at` and `flush_interval` control how often batches are sent. They do not increase the per-request read timeout. If you configure OpenTelemetry directly, the corresponding OTLP environment variable is `OTEL_EXPORTER_OTLP_TRACES_TIMEOUT` (or the generic `OTEL_EXPORTER_OTLP_TIMEOUT`), in seconds.
50
74
51
75
> **Security Note**: Never hardcode your secret keys in source code. Use environment variables or a secret manager. If your credentials have been exposed outside a secure environment, we recommend a full credential rotation immediately.
52
76
77
+
## Langfuse Data Model
78
+
79
+
Langfuse organizes telemetry as **Session > Trace > Observation**:
80
+
81
+
-**Session**: A conversation or workflow grouping that can contain many traces. `LangfuseConfig.session_id` sets this grouping attribute.
82
+
-**Trace**: A trace container for one request, user turn, or one-shot agent run. The trace owns the shared `trace_id` and groups the observation tree.
83
+
-**Observation**: A node inside a trace. Span / Generation / Event records are observation types with different payload shapes.
84
+
85
+
The distinction between a trace container and a root observation is important. A trace container is the Langfuse grouping object; the root observation is the first visible node in that trace's tree. In ecs-agent, `trace_id` identifies the trace container, `root_observation_id` is the conceptual root node for child observations, and `parent_observation_id` links each child observation to its parent. The current internal `TelemetryRecord(kind="trace")` represents a trace root record; if such a record also has `parent_observation_id`, the Langfuse adapter treats it as a child span observation rather than as a second top-level trace container.
86
+
53
87
## Langfuse Sessions
54
88
55
89
Langfuse Sessions require `session_id` as a trace-level session attribute, not only as `metadata.session_id`. The SDK v4 adapter creates observations with `start_as_current_observation(...)` and calls `propagate_attributes(session_id=...)` while the root observation is active, so Langfuse can group the complete trace chain in the Sessions UI.
-**Context Pressure**: Information about conversation compaction or windowing.
115
149
-**Scores**: Automated evaluation scores if provided.
116
150
117
-
For Langfuse SDK v4, completed ECS records are exported with the manual observation lifecycle. When the SDK exposes its OpenTelemetry-backed manual span hooks, the adapter creates the observation with the recorded operation `start_time` and then calls `end(end_time=...)` with the recorded operation end timestamp, both converted to Langfuse's nanosecond epoch format. This keeps LLM reasoning, tool execution, and subagent span durations aligned with the actual work rather than the telemetry export call duration, preventing the Langfuse UI from displaying zero-latency observations for operations that already completed before export. Older or test clients that do not expose historical OTel start hooks fall back to `start_observation(...)` plus `end(end_time=...)`. Active root observations still use `start_as_current_observation(...)` so `session_id` can be propagated while the trace context is current.
151
+
For Langfuse SDK v4, completed ECS records are exported with the public manual observation lifecycle by default: the adapter starts the observation when it exports the record, then calls `end(end_time=...)` with the recorded operation end timestamp. If you explicitly set `LangfuseConfig(enable_private_v4_historical_otel=True)`and your validated SDK version exposes the required private OpenTelemetry-backed span hooks, the adapter can also backdate the observation start using the recorded operation `start_time`; older, unsupported, or non-opted-in clients fall back to `start_observation(...)` plus `end(end_time=...)`. Active root observations still use `start_as_current_observation(...)` so `session_id` can be propagated while the trace context is current.
118
152
119
153
## Alerts and Monitoring
120
154
@@ -127,6 +161,7 @@ The integration follows a strict data privacy policy. While raw prompts, respons
127
161
-**Mandatory Redaction**: Sensitive patterns (like API keys or tokens) are automatically redacted from payloads.
128
162
-**Redaction Reports**: Exported metadata includes counts and names of applied redaction rules, but never the redacted content itself.
129
163
-**Model Names**: `LLM_MODEL` is intentionally not redacted because model identifiers drive Langfuse generation grouping and dashboard filters. Do not encode credentials, tenant secrets, or private data in model names.
164
+
-**Raw Payload Capture**: Redaction is not a general privacy filter. User prompts, tool arguments, tool results, and model outputs may contain business-sensitive data that does not look like a secret. Use `LangfuseConfig(capture_input=False, capture_output=False)` when raw content should not leave the process.
Copy file name to clipboardExpand all lines: examples/e2e/plan_and_task/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -191,7 +191,7 @@ Install the optional extra before enabling Langfuse for this example:
191
191
uv pip install -e ".[langfuse]"
192
192
```
193
193
194
-
When `PLAN_TASK_LANGFUSE` is enabled, `main.py` calls `install_plan_task_langfuse_observability()` after `build_plan_task_world(...)` creates the `World` and before `Runner.run(...)` starts. In interactive mode, every `UserInputReceivedEvent` starts a `user.turn` trace that covers the complete chain from that user input until the next user input or process exit: prompt normalization, retrieval/compaction, LLM generations, tool calls, subagent spans, retries, errors, context pressure, and completion scores all stay inside that turn trace. One-shot runs without interactive input keep the runner trace for backward compatibility. Completed LLM, tool, and subagent observations preserve the ECS-recorded start and end timestamps when exported to Langfuse SDK v4, so the Langfuse UI reports the actual operation latency instead of the near-zero telemetry export duration.
194
+
When `PLAN_TASK_LANGFUSE` is enabled, `main.py` calls `install_plan_task_langfuse_observability()` after `build_plan_task_world(...)` creates the `World` and before `Runner.run(...)` starts. In interactive mode, every `UserInputReceivedEvent` starts a `user.turn` trace that covers the complete chain from that user input until the next user input or process exit: prompt normalization, retrieval/compaction, LLM generations, tool calls, subagent spans, retries, errors, context pressure, and completion scores all stay inside that turn trace. One-shot runs without interactive input keep the runner trace for backward compatibility. Completed LLM, tool, and subagent observations export their ECS-recorded end timestamps through the Langfuse SDK v4 public lifecycle; preserving historical start timestamps requires explicitly validating your SDK version and enabling `LangfuseConfig(enable_private_v4_historical_otel=True)` because that path uses private SDK hooks. Raw prompts, tool arguments, and outputs are captured by default for backward compatibility; use `LangfuseConfig(capture_input=False, capture_output=False)` if raw content should not leave the process.
195
195
196
196
Subagents are exported as `subagent.<name>` spans inside the active `user.turn` trace. Their child-world LLM calls are exported as `generation` observations under that subagent span, and child-world tool/retrieval/API work is exported as child spans/events under the same turn trace rather than creating another top-level Langfuse trace. When a child-world generation requests a tool, that tool observation stays attached to the requesting generation so the Langfuse hierarchy shows the exact delegation chain.
0 commit comments