|
| 1 | +# CI Debugging Refactor: Inspector as an Automated MCP Server Debugging Tool |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +Transform the MCP Inspector CLI into a CI-first debugging tool that AI agents (Claude, etc.) can use to programmatically test, validate, and diagnose MCP servers — without a browser UI. |
| 6 | + |
| 7 | +This is **not** another general-purpose MCP client. For interactive command-line use of MCP servers, use [mcpc](https://github.com/apify/mcp-cli). Inspector's CLI is the **debugging companion**: structured diagnostics, batch debugging workflows, and CI-clean semantics. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Current State |
| 12 | + |
| 13 | +The Inspector CLI (`--cli` mode) supports single-method invocations across three transports (stdio, SSE, Streamable HTTP): |
| 14 | + |
| 15 | +| Method | Implemented | Tested | Notes | |
| 16 | +| -------------------------- | ----------- | ------ | ----------------------------------------------- | |
| 17 | +| `tools/list` | ✓ | ✓ | | |
| 18 | +| `tools/call` | ✓ | ✓ | Fetches schema first for type coercion (2 RPCs) | |
| 19 | +| `resources/list` | ✓ | ✗ | Zero test coverage | |
| 20 | +| `resources/read` | ✓ | stdio | Only tested over stdio | |
| 21 | +| `resources/templates/list` | ✓ | ✗ | Zero test coverage | |
| 22 | +| `prompts/list` | ✓ | ✓ | | |
| 23 | +| `prompts/get` | ✓ | ✓ | | |
| 24 | +| `logging/setLevel` | ✓ | HTTP | Sets level but discards all log notifications | |
| 25 | +| `ping` | ✗ | | | |
| 26 | +| `discover` | ✗ | | | |
| 27 | +| `completion/complete` | ✗ | | | |
| 28 | + |
| 29 | +**Key architectural limitations:** |
| 30 | + |
| 31 | +1. **One method per process** — each invocation connects, runs one call, disconnects. Stdio servers respawn every time. |
| 32 | +2. **No structured output envelope** — raw JSON on stdout, bare strings on stderr. No programmatic error categorization. |
| 33 | +3. **Server logs discarded** — debug logging is enabled on connect via `logging/setLevel`, but the `logging/message` notifications are never captured. |
| 34 | +4. **No capability gating** — methods are dispatched without checking server capabilities. Failures are ambiguous. |
| 35 | +5. **Exit code ambiguity** — server-side errors (`isError: true`) exit 0; only client-side failures exit non-zero. Invisible to CI pipelines using `set -e`. |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## Differentiation vs. mcpc |
| 40 | + |
| 41 | +| Feature | mcpc (apify) | Inspector CLI (this refactor) | |
| 42 | +| ---------------------- | ------------------------- | ---------------------------------------- | |
| 43 | +| Primary audience | Interactive CLI users | AI agents and CI pipelines | |
| 44 | +| Output format | Raw MCP JSON (`--json`) | Structured envelope with diagnostics | |
| 45 | +| Error handling | Undocumented exit codes | Typed error taxonomy, `--fail-on-error` | |
| 46 | +| Session model | Persistent named sessions | One-shot (default) + batch script mode | |
| 47 | +| Capability discovery | Implicit per-method | Explicit `discover` command | |
| 48 | +| Server log capture | ✗ | ✓ Buffer `logging/message` notifications | |
| 49 | +| Sampling/elicitation | ✗ | ✓ Reject + capture for inspection | |
| 50 | +| Batch workflows | Shell scripts | JSON script with `onError` control flow | |
| 51 | +| CI exit code semantics | ✗ | ✓ `--fail-on-error` | |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## Design Decisions |
| 56 | + |
| 57 | +### 1. Invocation Model: One-Shot + Batch Script |
| 58 | + |
| 59 | +**Default** remains one-shot (backward compatible): connect, run one method, output result, disconnect. |
| 60 | + |
| 61 | +**New**: `--script <file>` flag accepts a JSON array of operations executed sequentially on a single persistent connection. |
| 62 | + |
| 63 | +```json |
| 64 | +[ |
| 65 | + { "method": "discover" }, |
| 66 | + { |
| 67 | + "method": "tools/call", |
| 68 | + "toolName": "echo", |
| 69 | + "toolArgs": { "message": "hello" }, |
| 70 | + "onError": "continue" |
| 71 | + }, |
| 72 | + { "method": "resources/list", "onError": "stop" }, |
| 73 | + { |
| 74 | + "method": "resources/read", |
| 75 | + "uri": "demo://example", |
| 76 | + "onError": "skip-to:5" |
| 77 | + }, |
| 78 | + { "method": "ping" } |
| 79 | +] |
| 80 | +``` |
| 81 | + |
| 82 | +**`onError` control flow** (per step): |
| 83 | + |
| 84 | +| Value | Behavior | |
| 85 | +| ------------- | --------------------------------------------- | |
| 86 | +| `"stop"` | Abort script, return results so far (default) | |
| 87 | +| `"continue"` | Record the error, proceed to next step | |
| 88 | +| `"skip-to:N"` | Jump to step index N on error (0-based) | |
| 89 | + |
| 90 | +**Rationale**: Claude expresses the whole debugging plan declaratively in one tool call. No shell scripting, no jq parsing between steps. |
| 91 | + |
| 92 | +### 2. Structured Output Envelope |
| 93 | + |
| 94 | +Enabled via `--structured` flag. Raw JSON output remains the default for backward compatibility. |
| 95 | + |
| 96 | +```json |
| 97 | +{ |
| 98 | + "structuredVersion": 1, |
| 99 | + "success": true, |
| 100 | + "method": "tools/call", |
| 101 | + "durationMs": 234, |
| 102 | + "result": { "content": [{ "type": "text", "text": "Echo: hello" }] }, |
| 103 | + "error": null, |
| 104 | + "logs": [ |
| 105 | + { |
| 106 | + "level": "debug", |
| 107 | + "message": "tool echo invoked", |
| 108 | + "timestamp": "2026-01-30T12:00:00.000Z" |
| 109 | + } |
| 110 | + ] |
| 111 | +} |
| 112 | +``` |
| 113 | + |
| 114 | +In script mode, the top level becomes an array of these envelopes (one per step). |
| 115 | + |
| 116 | +**Error taxonomy** (mutually exclusive `error.category`): |
| 117 | + |
| 118 | +| Category | Meaning | |
| 119 | +| ------------- | -------------------------------------------------------------------- | |
| 120 | +| `transport` | Could not connect — bad URL, subprocess crash, ECONNREFUSED, timeout | |
| 121 | +| `capability` | Server does not support the requested method | |
| 122 | +| `protocol` | Malformed JSON-RPC, handshake failure | |
| 123 | +| `application` | Tool/resource/prompt returned an error in its content | |
| 124 | +| `validation` | Client-side failure — missing required arg, bad metadata | |
| 125 | + |
| 126 | +### 3. Server Log Capture |
| 127 | + |
| 128 | +Register a `logging/message` notification handler on every connection. Buffer all log messages for the session lifetime. Include them in the structured output envelope. |
| 129 | + |
| 130 | +In non-structured mode, emit captured logs to stderr. |
| 131 | + |
| 132 | +**Rationale**: The CLI already enables debug-level logging on connect. Discarding the notifications is an existing bug — fixing it is the single highest-value diagnostic change. |
| 133 | + |
| 134 | +### 4. `discover` Command |
| 135 | + |
| 136 | +A pseudo-method that connects once and returns the full server shape: |
| 137 | + |
| 138 | +```json |
| 139 | +{ |
| 140 | + "serverInfo": { "name": "my-server", "version": "1.0.0" }, |
| 141 | + "capabilities": { |
| 142 | + "tools": true, |
| 143 | + "resources": true, |
| 144 | + "prompts": false, |
| 145 | + "logging": true, |
| 146 | + "completions": false |
| 147 | + }, |
| 148 | + "tools": [...], |
| 149 | + "resources": [...], |
| 150 | + "prompts": [] |
| 151 | +} |
| 152 | +``` |
| 153 | + |
| 154 | +Runs: `initialize` → read capabilities → conditionally call `tools/list`, `resources/list`, `prompts/list`. One connection, one output. |
| 155 | + |
| 156 | +### 5. Exit Code Contract |
| 157 | + |
| 158 | +| Flag | Server `isError: true` | Client validation error | Transport error | |
| 159 | +| ----------------- | ---------------------- | ----------------------- | --------------- | |
| 160 | +| (default) | exit 0 | exit 1 | exit 1 | |
| 161 | +| `--fail-on-error` | exit 1 | exit 1 | exit 1 | |
| 162 | + |
| 163 | +**Rationale**: Backward compatible by default. CI pipelines opt into strict semantics explicitly. |
| 164 | + |
| 165 | +### 6. Sampling / Elicitation Policy |
| 166 | + |
| 167 | +Default policy: **reject/decline all** server-initiated requests. The incoming request payloads are captured and included in the structured output envelope so the caller can inspect what the server attempted. |
| 168 | + |
| 169 | +Rationale: No user to approve in a headless tool. Reject is safe. Capture provides visibility. |
| 170 | + |
| 171 | +### 7. Capability Gating |
| 172 | + |
| 173 | +Before dispatching any method, check that the server's `initialize` response advertises the relevant capability. If not, fail immediately with a `capability` category error. |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## Implementation Phases |
| 178 | + |
| 179 | +### Phase 1 — Debugging Primitives |
| 180 | + |
| 181 | +**New files:** |
| 182 | + |
| 183 | +- `cli/src/output.ts` — Output envelope formatting, error categorization |
| 184 | +- `cli/src/discover.ts` — Capability discovery logic |
| 185 | + |
| 186 | +**Modified files:** |
| 187 | + |
| 188 | +- `cli/src/index.ts` — Add `discover` and `ping` methods; add `--structured` and `--fail-on-error` flags; add capability gating before dispatch |
| 189 | +- `cli/src/client/connection.ts` — Register `logging/message` notification handler; expose captured logs |
| 190 | +- `cli/src/error-handler.ts` — Produce categorized `StructuredError` objects |
| 191 | +- `.github/workflows/main.yml` — Add CLI test step (currently only in narrow `cli_tests.yml`) |
| 192 | + |
| 193 | +**New tests:** |
| 194 | + |
| 195 | +- `cli/__tests__/ci-debugging.test.ts` — Covers: `discover`, structured output, log capture, exit codes, `resources/list` (zero coverage today), `resources/templates/list` (zero coverage today), SSE transport success paths |
| 196 | + |
| 197 | +### Phase 2 — Batch Debugging Workflows |
| 198 | + |
| 199 | +**New files:** |
| 200 | + |
| 201 | +- `cli/src/script.ts` — Script parser, validator, and sequential executor with `onError` control flow |
| 202 | + |
| 203 | +**Modified files:** |
| 204 | + |
| 205 | +- `cli/src/index.ts` — Wire `--script` flag into the dispatch path |
| 206 | + |
| 207 | +**New tests:** |
| 208 | + |
| 209 | +- Multi-operation script over stdio |
| 210 | +- `onError` control flow (stop, continue, skip-to) |
| 211 | +- Malformed script validation |
| 212 | + |
| 213 | +### Phase 3 — Advanced Diagnostics |
| 214 | + |
| 215 | +- Sampling/elicitation capture (reject + include in envelope) |
| 216 | +- `completion/complete` subcommand |
| 217 | +- `--watch <duration>` notification capture mode |
| 218 | + |
| 219 | +--- |
| 220 | + |
| 221 | +## Out of Scope |
| 222 | + |
| 223 | +- Browser-based OAuth flows (require human interaction; use mcpc for these) |
| 224 | +- Full MCP server lifecycle management (we connect to servers, not manage them) |
| 225 | +- Interactive shell (mcpc does this) |
| 226 | +- Performance optimization of the double-RPC in `tools/call` |
| 227 | +- Streaming output during long-running tool calls (delivered atomically on completion) |
| 228 | + |
| 229 | +--- |
| 230 | + |
| 231 | +## File Layout After Refactor |
| 232 | + |
| 233 | +``` |
| 234 | +cli/ |
| 235 | +├── src/ |
| 236 | +│ ├── cli.ts # Entry point (unchanged) |
| 237 | +│ ├── index.ts # Main dispatch — refactored with capability gating, new flags |
| 238 | +│ ├── transport.ts # Transport factory (unchanged) |
| 239 | +│ ├── error-handler.ts # Produces StructuredError with category |
| 240 | +│ ├── output.ts # NEW: envelope formatting, structured/raw modes |
| 241 | +│ ├── discover.ts # NEW: capability discovery + list enumeration |
| 242 | +│ ├── script.ts # NEW (Phase 2): script parser and executor |
| 243 | +│ └── client/ |
| 244 | +│ ├── connection.ts # Log capture via logging/message handler |
| 245 | +│ ├── tools.ts # Unchanged |
| 246 | +│ ├── resources.ts # Unchanged |
| 247 | +│ └── prompts.ts # Unchanged |
| 248 | +└── __tests__/ |
| 249 | + ├── ci-debugging.test.ts # NEW: CI-focused coverage |
| 250 | + └── helpers/ # Unchanged |
| 251 | +``` |
0 commit comments