Skip to content

Commit cc8c8d3

Browse files
committed
merge: integrate langfuse trace naming fixes
2 parents 1b3a3b3 + be28eac commit cc8c8d3

86 files changed

Lines changed: 15893 additions & 199 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,18 @@ Build modular, testable LLM agents by composing behavior from dataclass componen
1414
## Installation
1515

1616
```bash
17-
# Clone and install with uv
17+
# Stable version
18+
uv pip install ecs-agent
19+
# Develop version
1820
git clone https://github.com/MoveCloudROY/ecs-agent.git
1921
cd ecs-agent
2022
uv sync --group dev
2123
# Install with embeddings support (optional)
2224
uv pip install -e ".[embeddings]"
2325
# Install with MCP support (optional)
2426
uv pip install -e ".[mcp]"
27+
# Install with Langfuse observability (optional)
28+
uv pip install -e ".[langfuse]"
2529
```
2630

2731
> **Requires Python ≥ 3.11**
@@ -103,7 +107,7 @@ Mix 35+ components to build custom agents without inheritance bloat. The Entity-
103107

104108
### Scratchbook Artifact Registry
105109
- **`ArtifactRegistry`** — Canonical persistence layer for durable scratchbook records and mutable plan execution state.
106-
- **Canonical immutable records** — Tool and subagent outputs persist to `scratchbook/records/tool/tool_<uuid24>` and `scratchbook/records/subagent/subagent_<uuid24>`.
110+
- **Canonical immutable records** — Tool and subagent outputs persist to `scratchbook/records/tool/tool_<uuid24>` and `scratchbook/records/subagent/subagent_<uuid24>`. Tool result records are YAML documents with tool metadata under `metadata` and the full tool output under `content`.
107111
- **Canonical mutable plan state** — Plan markdown and Boulder machine state live at `scratchbook/<plan_slug>/plan.md` and `scratchbook/<plan_slug>/executes/boulder.json`.
108112
- **Trigger-to-Boulder lifecycle** — Plan-type script triggers create Boulder; planning/replanning/tool systems update it throughout execution.
109113
- **Inline payload policy** — Artifact inline content is populated only when UTF-8 payload size is `<= 2048` bytes. For larger results, `inline_content` is `None` and content is file-backed only. The persisted file always stores the full content — no truncation or summarisation.
@@ -134,6 +138,8 @@ Mix 35+ components to build custom agents without inheritance bloat. The Entity-
134138
- **Context Management** — Checkpoints (undo/resume), conversation compaction (XML system-prompt summaries), and memory windowing.
135139
- **Tool Ecosystem** — Auto-discovery via `@tool` decorator, manual approval flows, secure `bwrap` sandboxing, and composable skills.
136140
- **MCP Integration** — Connect to external MCP tool servers via stdio, SSE, or HTTP transports with namespaced tool mapping.
141+
- **Prometheus Metrics**, Install low-cardinality runtime, LLM, tool, streaming, and runtime-control metrics on any `World` and expose them via render, ASGI/WSGI, or a standalone `/metrics` server.
142+
- **Langfuse Observability**, Capture traces, spans, and observations via `ecs-agent[langfuse]`. Install `install_langfuse_observability()` on any `World` to export user input, LLM generations, tool calls, retries, subagent runs, and errors to Langfuse; raw input and output capture remains enabled by default for backward compatibility and can be disabled with `LangfuseConfig(capture_input=False, capture_output=False)`. Supports mandatory redaction, one trace per interactive user turn (with one-shot run compatibility), nested `subagent.<name>` spans with child LLM/tool observations, tool calls that nest under the generation that requested them, recorded operation end timing through the Langfuse SDK v4 public lifecycle, optional private historical start-time backdating with `enable_private_v4_historical_otel=True`, readable model identifiers from `LLM_MODEL`, integer token usage, resilient background export, and Langfuse Sessions by propagating `session_id` as a trace-level attribute rather than metadata-only. See [`docs/features/langfuse.md`](docs/features/langfuse.md) for configuration via `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST`, plus live test commands (OpenAI/Anthropic) and skip behavior when credentials are missing. Credential rotation is recommended if keys are exposed.
137143

138144
## Architecture
139145

@@ -292,6 +298,42 @@ uv run python examples/chat_agent.py
292298

293299
Model setup, registry-based construction, supported protocols, and model ID rules are documented in [`docs/models.md`](docs/models.md).
294300

301+
## Prometheus Metrics
302+
303+
Install metrics on a `World` before running agents, then expose the same private registry through whichever deployment shape fits your service:
304+
305+
```python
306+
from ecs_agent.core import Runner, World
307+
from ecs_agent.metrics import (
308+
install_prometheus_metrics,
309+
make_metrics_asgi_app,
310+
make_metrics_wsgi_app,
311+
render_metrics,
312+
start_metrics_server,
313+
)
314+
315+
world = World()
316+
metrics = install_prometheus_metrics(world)
317+
318+
# Direct scrape payload for tests, CLIs, or custom handlers.
319+
body = render_metrics(metrics)
320+
321+
# Framework adapters.
322+
asgi_app = make_metrics_asgi_app(metrics) # mount at /metrics in an ASGI app
323+
wsgi_app = make_metrics_wsgi_app(metrics) # mount at /metrics in a WSGI app
324+
325+
# Standalone endpoint helper.
326+
handle = start_metrics_server(9100, addr="127.0.0.1", metrics=metrics)
327+
try:
328+
await Runner().run(world, max_ticks=3)
329+
finally:
330+
handle.close(timeout=5)
331+
```
332+
333+
The exposition uses `ecs_agent_*` metric families such as `ecs_agent_runs_total`, `ecs_agent_llm_invocations_total`, `ecs_agent_tool_calls_total`, and `ecs_agent_stream_events_total`. Labels are intentionally low-cardinality (`status`, `system`, `provider`, `model`, `operation`, `tool`, and similar bounded values); IDs, raw prompts/responses, tool arguments/results, paths, API keys, and tokens are never accepted as labels. See [`docs/features/metrics.md`](docs/features/metrics.md) for the complete metric contract, endpoint modes, install/uninstall behavior, and live smoke test instructions.
334+
335+
To try the feature in real dashboards, run the local Prometheus + Grafana demo in [`examples/prometheus/`](examples/prometheus/). It exposes ecs-agent metrics at `127.0.0.1:9100/metrics`, starts Prometheus and Grafana with Docker Compose, and provisions an `ecs-agent Overview` dashboard at `http://localhost:3000`.
336+
295337
## Development
296338

297339
### Tests
@@ -321,6 +363,11 @@ LLM_API_KEY="$LLM_API_KEY" \
321363

322364
LLM_API_KEY="$LLM_API_KEY" \
323365
uv run pytest tests/live/test_compaction_live.py -v
366+
367+
LLM_API_KEY="$LLM_API_KEY" \
368+
LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1 \
369+
LLM_MODEL=qwen3.5-flash \
370+
uv run pytest tests/live/test_prometheus_metrics_live.py -v
324371
```
325372

326373
See `tests/live/` for the available live suites.
@@ -367,6 +414,8 @@ See [`docs/`](docs/) for detailed guides:
367414
- [Systems](docs/systems.md), Built-in systems and configuration details
368415
- [Models](docs/models.md), model selection, registry routing, and built-in model implementations
369416
- [Streaming](docs/features/streaming.md), SSE streaming setup and usage
417+
- [Prometheus Metrics](docs/features/metrics.md), low-cardinality metrics and `/metrics` exposure helpers
418+
- [Langfuse Observability](docs/features/langfuse.md), traces, spans, observations, raw capture controls, and optional historical timing
370419
- [Structured Output](docs/features/structured-output.md), Pydantic schema → JSON mode
371420
- [Serialization](docs/features/serialization.md), World state persistence
372421
- [Logging](docs/features/logging.md), structlog integration

docs/api-reference.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ The following types and classes are re-exported for convenience:
1414
- `RetryModel` from `ecs_agent.providers.retry_model`
1515
- `WorldSerializer` from `ecs_agent.serialization`
1616
- `configure_logging`, `get_logger` from `ecs_agent.logging`
17+
- `PrometheusMetrics`, `install_prometheus_metrics`, `uninstall_prometheus_metrics`, `render_metrics`, `make_metrics_asgi_app`, `make_metrics_wsgi_app`, `start_metrics_server` from `ecs_agent.metrics`
1718
- `StreamingComponent`, `CheckpointComponent`, `CompactionConfigComponent`, `ConversationArchiveComponent`, `RunnerStateComponent`, `UserInputComponent` from `ecs_agent.components`
1819
- `ClaudeModel` from `ecs_agent.providers.claude_model`
1920
- `LiteLLMModel` from `ecs_agent.providers.litellm_model`
@@ -27,6 +28,34 @@ The following types and classes are re-exported for convenience:
2728
- `AgentSpec`, `validate_agent_spec`, `discover_agent_sources`, `load_json_agents`, `load_markdown_agent`, `resolve_agent_specs`, `compile_agent_specs`, `resolve_prompt_file` from `ecs_agent.dsl`
2829

2930

31+
---
32+
33+
## ecs_agent.metrics
34+
35+
Prometheus metrics are available through `ecs_agent.metrics` and re-exported from `ecs_agent`.
36+
37+
```python
38+
from ecs_agent.metrics import (
39+
PrometheusMetrics,
40+
install_prometheus_metrics,
41+
uninstall_prometheus_metrics,
42+
render_metrics,
43+
make_metrics_asgi_app,
44+
make_metrics_wsgi_app,
45+
start_metrics_server,
46+
)
47+
```
48+
49+
- `PrometheusMetrics(registry: CollectorRegistry | None = None)`: owns an isolated Prometheus registry and the fixed `ecs_agent_*` collectors.
50+
- `install_prometheus_metrics(world: World | None = None, *, registry: CollectorRegistry | None = None, metrics: PrometheusMetrics | None = None) -> PrometheusMetrics`: creates a recorder and, when a world is provided, subscribes it to that world's event bus idempotently.
51+
- `uninstall_prometheus_metrics(world: World) -> PrometheusMetrics | None`: removes the recorder's event-bus subscriptions from a world and returns the removed recorder, or `None` if metrics were not installed.
52+
- `render_metrics(metrics: PrometheusMetrics | CollectorRegistry | None = None) -> bytes`: renders Prometheus text format from the provided metrics surface or registry.
53+
- `make_metrics_asgi_app(metrics: PrometheusMetrics | CollectorRegistry | None = None)`: returns a framework-free ASGI callable suitable for `/metrics`.
54+
- `make_metrics_wsgi_app(metrics: PrometheusMetrics | CollectorRegistry | None = None)`: returns a framework-free WSGI callable suitable for `/metrics`.
55+
- `start_metrics_server(port: int, *, addr: str = "0.0.0.0", metrics: PrometheusMetrics | CollectorRegistry | None = None)`: starts a standalone metrics HTTP server and returns a cleanup handle.
56+
57+
The metric contract is intentionally fixed and low-cardinality. Use [`docs/features/metrics.md`](features/metrics.md) for the complete metric list, label allowlist, endpoint examples, install/uninstall semantics, and label safety policy.
58+
3059
---
3160

3261
## ecs_agent.types
@@ -760,6 +789,38 @@ class ScratchbookIndexer:
760789
def lookup_by_category(self, category: str) -> list[dict[str, Any]]: ...
761790
```
762791

792+
### ToolResultsSink
793+
```python
794+
class ToolResultsSink:
795+
def __init__(self, registry: ArtifactRegistry): ...
796+
def persist_tool_result(
797+
self,
798+
tool_call_id: str,
799+
tool_name: str,
800+
result: str,
801+
arguments: dict[str, Any] | None = None,
802+
) -> ArtifactPersistResult: ...
803+
def read_tool_result(self, stable_id: str) -> dict[str, Any] | None: ...
804+
```
805+
806+
`ToolResultsSink` persists each tool call result through `ArtifactRegistry` as an
807+
immutable YAML record under `scratchbook/records/tool/tool_<uuid24>`. The record
808+
separates tool metadata from the full tool output:
809+
810+
```yaml
811+
metadata:
812+
tool_call_id: call_abc123
813+
tool_name: get_weather
814+
timestamp: "2026-01-01T00:00:00.000000+00:00"
815+
arguments:
816+
city: Paris
817+
content: sunny in Paris
818+
```
819+
820+
Small records may also populate `ArtifactPersistResult.inline_content` according
821+
to the shared 2 KB UTF-8 threshold; the persisted file always contains the full
822+
YAML document.
823+
763824
---
764825

765826
## ecs_agent.dsl

0 commit comments

Comments
 (0)