Skip to content

Commit bc30023

Browse files
committed
Document MCP tool-call observability contract
1 parent e854d37 commit bc30023

1 file changed

Lines changed: 54 additions & 1 deletion

File tree

CLAUDE.md

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,17 @@ This project uses **loguru** for structured JSON logging. All logs go to **stder
172172

173173
7. **Use `logger.configure(patcher=...)` for global context injection** (like OTel trace_id). Do NOT pass `patcher` to `logger.add()` — loguru 0.7.x does not support it there.
174174

175+
8. **Tool-call failures are warnings with full structured arguments.** Do not log MCP
176+
tool-call failures as `error` unless the whole server process is failing. A bad
177+
tool call is recoverable input for the model loop, same as backend agent tools.
178+
Log it as `logger.warning(..., tool_arguments={...})` or with `.bind(tool_arguments=...)`.
179+
Do not redact `tool_arguments` in the logger path; the purpose is to recover the
180+
exact failing invocation. Authorization headers remain masked via `log_api_request()`.
181+
182+
9. **Tool-call lifecycle logs are debug.** The per-tool "started" and "completed"
183+
messages from `ObservabilityMiddleware` must be `logger.debug`, not `logger.info`.
184+
Info-level logs should not be emitted for every agent step/tool step.
185+
175186
### OTel Trace Correlation
176187

177188
Every log record automatically gets `trace_id` and `span_id` injected by `_otel_patcher` (registered via `logger.configure`). The `ObservabilityMiddleware` also uses `logger.contextualize(trace_id=..., tool=...)` so all logs within a tool call carry the correlation ID. Do not duplicate this — it's automatic.
@@ -194,13 +205,55 @@ The `ObservabilityMiddleware` creates a span per tool call with these attributes
194205

195206
On errors, the span gets `StatusCode.ERROR` + `record_exception()`. Do not add redundant span creation inside tool functions — the middleware handles it.
196207

208+
#### Required MCP observability fix pattern
209+
210+
When touching MCP tool observability, update both the generic middleware and the
211+
tool-specific body:
212+
213+
- In `src/middleware/observability_middleware.py`, extract tool arguments from
214+
the incoming `tools/call` message (FastMCP currently exposes this through the
215+
message payload; keep the extraction defensive). Add them to the log context,
216+
e.g. `with logger.contextualize(trace_id=trace_id, tool=tool_name,
217+
tool_arguments=tool_arguments): ...`.
218+
- Change middleware lifecycle logs:
219+
- `logger.info("Tool call started...")` -> `logger.debug(...)`
220+
- `logger.info("Tool call completed...")` -> `logger.debug(...)`
221+
- `logger.error("Tool call failed...")` -> `logger.warning(...,
222+
tool_arguments=tool_arguments)`
223+
- Keep OTel span semantics unchanged: failed tool calls should still set
224+
`StatusCode.ERROR` and `record_exception(exc)` because tracing represents the
225+
tool invocation outcome, while logs use Warning to avoid misclassifying a
226+
recoverable model/tool-call error as a server crash.
227+
- In each tool body, log in-band validation failures before raising `ToolError`.
228+
Include all tool parameters in `tool_arguments`.
229+
230+
Concrete example: `src/tools/artifact_relationships.py::get_artifact_relationships`
231+
must log these branches as Warning with full arguments:
232+
233+
```python
234+
tool_arguments = {
235+
"identifier": identifier,
236+
"profile": profile,
237+
"max_count_per_type": max_count_per_type,
238+
}
239+
```
240+
241+
- missing/empty `identifier`
242+
- `max_count_per_type` outside `1..1000`
243+
- unsupported `profile` fallback branch
244+
- backend `HTTPStatusError` / unexpected exception before delegating to
245+
`handle_api_error(...)`
246+
247+
The API request/response helpers stay `Debug` and keep their existing masking
248+
rules. Do not put raw response bodies into warning logs.
249+
197250
### Adding New Tools — Observability Checklist
198251

199252
When adding a new tool, ensure:
200253
1. The tool receives `ctx: Context` as its first argument (required for lifespan context and logging)
201254
2. API requests include all four `X-CodeAlive-*` headers: `Integration`, `Tool`, `Client`, plus `Authorization`
202255
3. Call `log_api_request()` before and `log_api_response()` after the HTTP call
203-
4. Errors go through `handle_api_error(ctx, e, "description", method=_TOOL_NAME)` — this ensures the `[tool_name]` prefix in error messages
256+
4. Errors are logged as Warning with full `tool_arguments` before they go through `handle_api_error(ctx, e, "description", method=_TOOL_NAME)` — this ensures the `[tool_name]` prefix in error messages and preserves the exact failed call in logs
204257
5. The middleware automatically wraps the tool in an OTel span — no manual span creation needed
205258

206259
## Tool Response Conventions

0 commit comments

Comments
 (0)