You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -103,7 +107,7 @@ Mix 35+ components to build custom agents without inheritance bloat. The Entity-
103
107
104
108
### Scratchbook Artifact Registry
105
109
-**`ArtifactRegistry`** — Canonical persistence layer for durable scratchbook records and mutable plan execution state.
106
-
-**Canonical immutable records** — Tool and subagent outputs persist to `scratchbook/records/tool/tool_<uuid24>` and `scratchbook/records/subagent/subagent_<uuid24>`.
110
+
-**Canonical immutable records** — Tool and subagent outputs persist to `scratchbook/records/tool/tool_<uuid24>` and `scratchbook/records/subagent/subagent_<uuid24>`. Tool result records are YAML documents with tool metadata under `metadata` and the full tool output under `content`.
107
111
-**Canonical mutable plan state** — Plan markdown and Boulder machine state live at `scratchbook/<plan_slug>/plan.md` and `scratchbook/<plan_slug>/executes/boulder.json`.
108
112
-**Trigger-to-Boulder lifecycle** — Plan-type script triggers create Boulder; planning/replanning/tool systems update it throughout execution.
109
113
-**Inline payload policy** — Artifact inline content is populated only when UTF-8 payload size is `<= 2048` bytes. For larger results, `inline_content` is `None` and content is file-backed only. The persisted file always stores the full content — no truncation or summarisation.
@@ -134,6 +138,8 @@ Mix 35+ components to build custom agents without inheritance bloat. The Entity-
-**Tool Ecosystem** — Auto-discovery via `@tool` decorator, manual approval flows, secure `bwrap` sandboxing, and composable skills.
136
140
-**MCP Integration** — Connect to external MCP tool servers via stdio, SSE, or HTTP transports with namespaced tool mapping.
141
+
-**Prometheus Metrics**, Install low-cardinality runtime, LLM, tool, streaming, and runtime-control metrics on any `World` and expose them via render, ASGI/WSGI, or a standalone `/metrics` server.
142
+
- **Langfuse Observability**, Capture traces, spans, and observations via `ecs-agent[langfuse]`. Install `install_langfuse_observability()` on any `World` to export user input, LLM generations, tool calls, retries, subagent runs, and errors to Langfuse; raw input and output capture remains enabled by default for backward compatibility and can be disabled with `LangfuseConfig(capture_input=False, capture_output=False)`. Supports mandatory redaction, one trace per interactive user turn (with one-shot run compatibility), nested `subagent.<name>` spans with child LLM/tool observations, tool calls that nest under the generation that requested them, recorded operation end timing through the Langfuse SDK v4 public lifecycle, optional private historical start-time backdating with `enable_private_v4_historical_otel=True`, readable model identifiers from `LLM_MODEL`, integer token usage, resilient background export, and Langfuse Sessions by propagating `session_id` as a trace-level attribute rather than metadata-only. See [`docs/features/langfuse.md`](docs/features/langfuse.md) for configuration via `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, and `LANGFUSE_HOST`, plus live test commands (OpenAI/Anthropic) and skip behavior when credentials are missing. Credential rotation is recommended if keys are exposed.
137
143
138
144
## Architecture
139
145
@@ -292,6 +298,42 @@ uv run python examples/chat_agent.py
292
298
293
299
Model setup, registry-based construction, supported protocols, and model ID rules are documented in [`docs/models.md`](docs/models.md).
294
300
301
+
## Prometheus Metrics
302
+
303
+
Install metrics on a `World` before running agents, then expose the same private registry through whichever deployment shape fits your service:
304
+
305
+
```python
306
+
from ecs_agent.core import Runner, World
307
+
from ecs_agent.metrics import (
308
+
install_prometheus_metrics,
309
+
make_metrics_asgi_app,
310
+
make_metrics_wsgi_app,
311
+
render_metrics,
312
+
start_metrics_server,
313
+
)
314
+
315
+
world = World()
316
+
metrics = install_prometheus_metrics(world)
317
+
318
+
# Direct scrape payload for tests, CLIs, or custom handlers.
319
+
body = render_metrics(metrics)
320
+
321
+
# Framework adapters.
322
+
asgi_app = make_metrics_asgi_app(metrics) # mount at /metrics in an ASGI app
323
+
wsgi_app = make_metrics_wsgi_app(metrics) # mount at /metrics in a WSGI app
The exposition uses `ecs_agent_*` metric families such as `ecs_agent_runs_total`, `ecs_agent_llm_invocations_total`, `ecs_agent_tool_calls_total`, and `ecs_agent_stream_events_total`. Labels are intentionally low-cardinality (`status`, `system`, `provider`, `model`, `operation`, `tool`, and similar bounded values); IDs, raw prompts/responses, tool arguments/results, paths, API keys, and tokens are never accepted as labels. See [`docs/features/metrics.md`](docs/features/metrics.md) for the complete metric contract, endpoint modes, install/uninstall behavior, and live smoke test instructions.
334
+
335
+
To try the feature in real dashboards, run the local Prometheus + Grafana demo in [`examples/prometheus/`](examples/prometheus/). It exposes ecs-agent metrics at `127.0.0.1:9100/metrics`, starts Prometheus and Grafana with Docker Compose, and provisions an `ecs-agent Overview` dashboard at `http://localhost:3000`.
336
+
295
337
## Development
296
338
297
339
### Tests
@@ -321,6 +363,11 @@ LLM_API_KEY="$LLM_API_KEY" \
321
363
322
364
LLM_API_KEY="$LLM_API_KEY" \
323
365
uv run pytest tests/live/test_compaction_live.py -v
Copy file name to clipboardExpand all lines: docs/api-reference.md
+61Lines changed: 61 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,7 @@ The following types and classes are re-exported for convenience:
14
14
-`RetryModel` from `ecs_agent.providers.retry_model`
15
15
-`WorldSerializer` from `ecs_agent.serialization`
16
16
-`configure_logging`, `get_logger` from `ecs_agent.logging`
17
+
-`PrometheusMetrics`, `install_prometheus_metrics`, `uninstall_prometheus_metrics`, `render_metrics`, `make_metrics_asgi_app`, `make_metrics_wsgi_app`, `start_metrics_server` from `ecs_agent.metrics`
17
18
-`StreamingComponent`, `CheckpointComponent`, `CompactionConfigComponent`, `ConversationArchiveComponent`, `RunnerStateComponent`, `UserInputComponent` from `ecs_agent.components`
18
19
-`ClaudeModel` from `ecs_agent.providers.claude_model`
19
20
-`LiteLLMModel` from `ecs_agent.providers.litellm_model`
@@ -27,6 +28,34 @@ The following types and classes are re-exported for convenience:
27
28
-`AgentSpec`, `validate_agent_spec`, `discover_agent_sources`, `load_json_agents`, `load_markdown_agent`, `resolve_agent_specs`, `compile_agent_specs`, `resolve_prompt_file` from `ecs_agent.dsl`
28
29
29
30
31
+
---
32
+
33
+
## ecs_agent.metrics
34
+
35
+
Prometheus metrics are available through `ecs_agent.metrics` and re-exported from `ecs_agent`.
36
+
37
+
```python
38
+
from ecs_agent.metrics import (
39
+
PrometheusMetrics,
40
+
install_prometheus_metrics,
41
+
uninstall_prometheus_metrics,
42
+
render_metrics,
43
+
make_metrics_asgi_app,
44
+
make_metrics_wsgi_app,
45
+
start_metrics_server,
46
+
)
47
+
```
48
+
49
+
-`PrometheusMetrics(registry: CollectorRegistry | None = None)`: owns an isolated Prometheus registry and the fixed `ecs_agent_*` collectors.
50
+
-`install_prometheus_metrics(world: World | None = None, *, registry: CollectorRegistry | None = None, metrics: PrometheusMetrics | None = None) -> PrometheusMetrics`: creates a recorder and, when a world is provided, subscribes it to that world's event bus idempotently.
51
+
-`uninstall_prometheus_metrics(world: World) -> PrometheusMetrics | None`: removes the recorder's event-bus subscriptions from a world and returns the removed recorder, or `None` if metrics were not installed.
52
+
-`render_metrics(metrics: PrometheusMetrics | CollectorRegistry | None = None) -> bytes`: renders Prometheus text format from the provided metrics surface or registry.
53
+
-`make_metrics_asgi_app(metrics: PrometheusMetrics | CollectorRegistry | None = None)`: returns a framework-free ASGI callable suitable for `/metrics`.
54
+
-`make_metrics_wsgi_app(metrics: PrometheusMetrics | CollectorRegistry | None = None)`: returns a framework-free WSGI callable suitable for `/metrics`.
55
+
-`start_metrics_server(port: int, *, addr: str = "0.0.0.0", metrics: PrometheusMetrics | CollectorRegistry | None = None)`: starts a standalone metrics HTTP server and returns a cleanup handle.
56
+
57
+
The metric contract is intentionally fixed and low-cardinality. Use [`docs/features/metrics.md`](features/metrics.md) for the complete metric list, label allowlist, endpoint examples, install/uninstall semantics, and label safety policy.
0 commit comments