docs(langfuse): document trace naming and timeouts

MoveCloudROY · sisyphus-dev-ai · MoveCloudROY · commit be28eacd511f · 2026-05-09T22:19:00.000+08:00
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
diff --git a/README.md b/README.md
@@ -139,7 +139,7 @@ Mix 35+ components to build custom agents without inheritance bloat. The Entity-
 - **Tool Ecosystem** — Auto-discovery via `@tool` decorator, manual approval flows, secure `bwrap` sandboxing, and composable skills.
 - **MCP Integration** — Connect to external MCP tool servers via stdio, SSE, or HTTP transports with namespaced tool mapping.
 - **Prometheus Metrics**, Install low-cardinality runtime, LLM, tool, streaming, and runtime-control metrics on any `World` and expose them via render, ASGI/WSGI, or a standalone `/metrics` server.
-- **Langfuse Observability**, Capture full-fidelity traces, spans, and observations via `ecs-agent[langfuse]`. Install `install_langfuse_observability()` on any `World` to export user input, LLM generations, tool calls, retries, subagent runs, and errors to Langfuse. Supports mandatory redaction, one trace per interactive user turn (with one-shot run compatibility), nested `subagent.<name>` spans with child LLM/tool observations, tool calls that nest under the generation that requested them, recorded operation timing exported through Langfuse SDK v4 historical observation starts plus manual endings, readable model identifiers from `LLM_MODEL`, integer token usage, resilient background export, and Langfuse Sessions by propagating `session_id` as a trace-level attribute rather than metadata-only. See [`docs/features/langfuse.md`](docs/features/langfuse.md) for configuration via `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST`, plus live test commands (OpenAI/Anthropic) and skip behavior when credentials are missing. Credential rotation is recommended if keys are exposed.
+- **Langfuse Observability**, Capture traces, spans, and observations via `ecs-agent[langfuse]`. Install `install_langfuse_observability()` on any `World` to export user input, LLM generations, tool calls, retries, subagent runs, and errors to Langfuse; raw input and output capture remains enabled by default for backward compatibility and can be disabled with `LangfuseConfig(capture_input=False, capture_output=False)`. Supports mandatory redaction, one trace per interactive user turn (with one-shot run compatibility), nested `subagent.<name>` spans with child LLM/tool observations, tool calls that nest under the generation that requested them, recorded operation end timing through the Langfuse SDK v4 public lifecycle, optional private historical start-time backdating with `enable_private_v4_historical_otel=True`, readable model identifiers from `LLM_MODEL`, integer token usage, resilient background export, and Langfuse Sessions by propagating `session_id` as a trace-level attribute rather than metadata-only. See [`docs/features/langfuse.md`](docs/features/langfuse.md) for configuration via `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST`, plus live test commands (OpenAI/Anthropic) and skip behavior when credentials are missing. Credential rotation is recommended if keys are exposed.
 
 ## Architecture
 
@@ -415,7 +415,7 @@ See [`docs/`](docs/) for detailed guides:
 - [Models](docs/models.md), model selection, registry routing, and built-in model implementations
 - [Streaming](docs/features/streaming.md), SSE streaming setup and usage
 - [Prometheus Metrics](docs/features/metrics.md), low-cardinality metrics and `/metrics` exposure helpers
-- [Langfuse Observability](docs/features/langfuse.md), full-fidelity traces, spans, and observations
+- [Langfuse Observability](docs/features/langfuse.md), traces, spans, observations, raw capture controls, and optional historical timing
 - [Structured Output](docs/features/structured-output.md), Pydantic schema → JSON mode
 - [Serialization](docs/features/serialization.md), World state persistence
 - [Logging](docs/features/logging.md), structlog integration
diff --git a/docs/features/langfuse.md b/docs/features/langfuse.md
@@ -47,9 +47,43 @@ The integration uses the following environment variables for configuration:
 - `LANGFUSE_PUBLIC_KEY`: Your Langfuse project public key.
 - `LANGFUSE_SECRET_KEY`: Your Langfuse project secret key.
 - `LANGFUSE_HOST` or `LANGFUSE_BASE_URL`: The Langfuse API host.
+- `LANGFUSE_TIMEOUT`: Langfuse SDK HTTP timeout in seconds. This also controls the default HTTP OTLP span exporter timeout in Langfuse SDK v4.
+
+`LangfuseConfig` also exposes runtime safety controls:
+
+- `capture_input` / `capture_output`: Enabled by default for backward-compatible full-fidelity traces. Set either value to `False` to suppress raw inputs or outputs from Langfuse export while preserving metadata, timing, model, usage, and redaction reports.
+- `enable_private_v4_historical_otel`: Disabled by default. When enabled, the adapter may use Langfuse SDK v4 private OpenTelemetry hooks to backdate observation start times. Keep this off unless you have validated the exact Langfuse SDK version you run in production.
+- `timeout`: Optional Langfuse SDK HTTP timeout in seconds. Use `LangfuseConfig(timeout=30)` or `LANGFUSE_TIMEOUT` for slower self-hosted deployments.
+
+### Export timeout tuning
+
+Langfuse SDK v4 exports observations through the OpenTelemetry HTTP OTLP exporter. A log such as `Failed to export span batch ... Read timed out. (read timeout=...)` means the background span batch upload to the configured Langfuse host timed out; it does not indicate a failed agent run. For slower self-hosted endpoints, raise the SDK timeout explicitly:
+
+```python
+install_langfuse_observability(
+    world,
+    LangfuseConfig(
+        timeout=30,
+        flush_at=32,
+        flush_interval=2.0,
+    ),
+)
+```
+
+`flush_at` and `flush_interval` control how often batches are sent. They do not increase the per-request read timeout. If you configure OpenTelemetry directly, the corresponding OTLP environment variable is `OTEL_EXPORTER_OTLP_TRACES_TIMEOUT` (or the generic `OTEL_EXPORTER_OTLP_TIMEOUT`), in seconds.
 
 > **Security Note**: Never hardcode your secret keys in source code. Use environment variables or a secret manager. If your credentials have been exposed outside a secure environment, we recommend a full credential rotation immediately.
 
+## Langfuse Data Model
+
+Langfuse organizes telemetry as **Session > Trace > Observation**:
+
+- **Session**: A conversation or workflow grouping that can contain many traces. `LangfuseConfig.session_id` sets this grouping attribute.
+- **Trace**: A trace container for one request, user turn, or one-shot agent run. The trace owns the shared `trace_id` and groups the observation tree.
+- **Observation**: A node inside a trace. Span / Generation / Event records are observation types with different payload shapes.
+
+The distinction between a trace container and a root observation is important. A trace container is the Langfuse grouping object; the root observation is the first visible node in that trace's tree. In ecs-agent, `trace_id` identifies the trace container, `root_observation_id` is the conceptual root node for child observations, and `parent_observation_id` links each child observation to its parent. The current internal `TelemetryRecord(kind="trace")` represents a trace root record; if such a record also has `parent_observation_id`, the Langfuse adapter treats it as a child span observation rather than as a second top-level trace container.
+
 ## Langfuse Sessions
 
 Langfuse Sessions require `session_id` as a trace-level session attribute, not only as `metadata.session_id`. The SDK v4 adapter creates observations with `start_as_current_observation(...)` and calls `propagate_attributes(session_id=...)` while the root observation is active, so Langfuse can group the complete trace chain in the Sessions UI.
@@ -114,7 +148,7 @@ Observations captured include:
 - **Context Pressure**: Information about conversation compaction or windowing.
 - **Scores**: Automated evaluation scores if provided.
 
-For Langfuse SDK v4, completed ECS records are exported with the manual observation lifecycle. When the SDK exposes its OpenTelemetry-backed manual span hooks, the adapter creates the observation with the recorded operation `start_time` and then calls `end(end_time=...)` with the recorded operation end timestamp, both converted to Langfuse's nanosecond epoch format. This keeps LLM reasoning, tool execution, and subagent span durations aligned with the actual work rather than the telemetry export call duration, preventing the Langfuse UI from displaying zero-latency observations for operations that already completed before export. Older or test clients that do not expose historical OTel start hooks fall back to `start_observation(...)` plus `end(end_time=...)`. Active root observations still use `start_as_current_observation(...)` so `session_id` can be propagated while the trace context is current.
+For Langfuse SDK v4, completed ECS records are exported with the public manual observation lifecycle by default: the adapter starts the observation when it exports the record, then calls `end(end_time=...)` with the recorded operation end timestamp. If you explicitly set `LangfuseConfig(enable_private_v4_historical_otel=True)` and your validated SDK version exposes the required private OpenTelemetry-backed span hooks, the adapter can also backdate the observation start using the recorded operation `start_time`; older, unsupported, or non-opted-in clients fall back to `start_observation(...)` plus `end(end_time=...)`. Active root observations still use `start_as_current_observation(...)` so `session_id` can be propagated while the trace context is current.
 
 ## Alerts and Monitoring
 
@@ -127,6 +161,7 @@ The integration follows a strict data privacy policy. While raw prompts, respons
 - **Mandatory Redaction**: Sensitive patterns (like API keys or tokens) are automatically redacted from payloads.
 - **Redaction Reports**: Exported metadata includes counts and names of applied redaction rules, but never the redacted content itself.
 - **Model Names**: `LLM_MODEL` is intentionally not redacted because model identifiers drive Langfuse generation grouping and dashboard filters. Do not encode credentials, tenant secrets, or private data in model names.
+- **Raw Payload Capture**: Redaction is not a general privacy filter. User prompts, tool arguments, tool results, and model outputs may contain business-sensitive data that does not look like a secret. Use `LangfuseConfig(capture_input=False, capture_output=False)` when raw content should not leave the process.
 
 ## Telemetry Resilience
 
diff --git a/examples/e2e/plan_and_task/README.md b/examples/e2e/plan_and_task/README.md
@@ -191,7 +191,7 @@ Install the optional extra before enabling Langfuse for this example:
 uv pip install -e ".[langfuse]"
 ```
 
-When `PLAN_TASK_LANGFUSE` is enabled, `main.py` calls `install_plan_task_langfuse_observability()` after `build_plan_task_world(...)` creates the `World` and before `Runner.run(...)` starts. In interactive mode, every `UserInputReceivedEvent` starts a `user.turn` trace that covers the complete chain from that user input until the next user input or process exit: prompt normalization, retrieval/compaction, LLM generations, tool calls, subagent spans, retries, errors, context pressure, and completion scores all stay inside that turn trace. One-shot runs without interactive input keep the runner trace for backward compatibility. Completed LLM, tool, and subagent observations preserve the ECS-recorded start and end timestamps when exported to Langfuse SDK v4, so the Langfuse UI reports the actual operation latency instead of the near-zero telemetry export duration.
+When `PLAN_TASK_LANGFUSE` is enabled, `main.py` calls `install_plan_task_langfuse_observability()` after `build_plan_task_world(...)` creates the `World` and before `Runner.run(...)` starts. In interactive mode, every `UserInputReceivedEvent` starts a `user.turn` trace that covers the complete chain from that user input until the next user input or process exit: prompt normalization, retrieval/compaction, LLM generations, tool calls, subagent spans, retries, errors, context pressure, and completion scores all stay inside that turn trace. One-shot runs without interactive input keep the runner trace for backward compatibility. Completed LLM, tool, and subagent observations export their ECS-recorded end timestamps through the Langfuse SDK v4 public lifecycle; preserving historical start timestamps requires explicitly validating your SDK version and enabling `LangfuseConfig(enable_private_v4_historical_otel=True)` because that path uses private SDK hooks. Raw prompts, tool arguments, and outputs are captured by default for backward compatibility; use `LangfuseConfig(capture_input=False, capture_output=False)` if raw content should not leave the process.
 
 Subagents are exported as `subagent.<name>` spans inside the active `user.turn` trace. Their child-world LLM calls are exported as `generation` observations under that subagent span, and child-world tool/retrieval/API work is exported as child spans/events under the same turn trace rather than creating another top-level Langfuse trace. When a child-world generation requests a tool, that tool observation stays attached to the requesting generation so the Langfuse hierarchy shows the exact delegation chain.
 
diff --git a/tests/test_docs_langfuse_observability.py b/tests/test_docs_langfuse_observability.py
@@ -54,6 +54,21 @@ def test_langfuse_docs_mention_configuration() -> None:
         assert has_host, f"{source} missing LANGFUSE_HOST or LANGFUSE_BASE_URL alias policy"
 
 
+def test_langfuse_docs_describe_export_timeout_controls() -> None:
+    """Docs must explain how to tune self-hosted Langfuse export timeouts."""
+    docs = get_langfuse_docs_content()
+
+    required_phrases = [
+        "LangfuseConfig(timeout=",
+        "LANGFUSE_TIMEOUT",
+        "OTEL_EXPORTER_OTLP_TRACES_TIMEOUT",
+        "read timeout",
+        "flush_interval",
+    ]
+    for phrase in required_phrases:
+        assert phrase in docs, f"docs/features/langfuse.md missing timeout guidance: {phrase}"
+
+
 def test_langfuse_docs_describe_session_attribute_propagation() -> None:
     """Docs must explain Langfuse Sessions need trace-level session propagation."""
     readme = get_readme_content()
@@ -106,6 +121,40 @@ def test_langfuse_docs_mention_credential_rotation() -> None:
         assert "rotation" in content.lower() or "rotate" in content.lower(), f"{source} must mention credential rotation"
 
 
+def test_langfuse_docs_describe_capture_controls_and_private_otel_opt_in() -> None:
+    """Docs must keep Langfuse safety controls visible where behavior is advertised."""
+    readme = get_readme_content()
+    docs = get_langfuse_docs_content()
+    plan_task_readme = (PROJECT_ROOT / "examples" / "e2e" / "plan_and_task" / "README.md").read_text(
+        encoding="utf-8"
+    )
+
+    for content, source in [
+        (readme, "README"),
+        (docs, "docs/features/langfuse.md"),
+        (plan_task_readme, "examples/e2e/plan_and_task/README.md"),
+    ]:
+        assert "capture_input=False" in content, f"{source} missing capture_input opt-out guidance"
+        assert "capture_output=False" in content, f"{source} missing capture_output opt-out guidance"
+        assert "enable_private_v4_historical_otel" in content, f"{source} missing private OTel opt-in guidance"
+
+
+def test_langfuse_docs_explain_trace_and_observation_roots() -> None:
+    """Dedicated docs must distinguish Langfuse trace containers from root observations."""
+    docs = get_langfuse_docs_content()
+
+    required_phrases = [
+        "Session > Trace > Observation",
+        "trace container",
+        "root observation",
+        "root_observation_id",
+        "parent_observation_id",
+        "Span / Generation / Event",
+    ]
+    for phrase in required_phrases:
+        assert phrase in docs, f"docs/features/langfuse.md missing trace-root concept: {phrase}"
+
+
 def test_langfuse_docs_do_not_include_secret_values() -> None:
     """Docs must not contain secret values or assignment examples for secret env vars."""
     readme = get_readme_content()