mock-server
diff --git a/‎changelog.md‎
Lines changed: 1 addition & 0 deletions b/‎changelog.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/code/llm-mocking.md‎
Lines changed: 13 additions & 1 deletion b/‎docs/code/llm-mocking.md‎
Lines changed: 13 additions & 1 deletion
diff --git a/‎docs/plans/mockserver-llm-mocking.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/plans/mockserver-llm-mocking.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎jekyll-www.mock-server.com/mock_server/ai_mcp_tools.html‎
Lines changed: 70 additions & 3 deletions b/‎jekyll-www.mock-server.com/mock_server/ai_mcp_tools.html‎
Lines changed: 70 additions & 3 deletions
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- Added two MCP tools for **agent-run analysis and tool-call assertions**, both backed by a new deterministic `org.mockserver.llm.analysis.AgentRunAnalyzer` that reconstructs an agent run by decoding the LLM requests MockServer recorded. `verify_tool_call` asserts that an agent called a named tool a given number of times (`atLeast`/`atMost`, with an optional regex over the tool-call arguments); `explain_agent_run` summarises the run's structure (message and assistant-turn counts, the ordered tool-call sequence, tool results, and the latest message role). Read-only and offline — no LLM call. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
 - Added a **runtime-LLM client SPI** (`org.mockserver.llm.client`) that lets MockServer call a real LLM you already run, as the foundation for opt-in features such as drift detection and exploratory semantic matching. Mirrors the existing codec registry: an `LlmClient` per provider (Ollama, OpenAI, OpenAI Responses, Azure OpenAI, Anthropic, Gemini, Bedrock) registered in `LlmClientRegistry`, an immutable `LlmBackend` config (with the API key redacted in logs), and a three-layer `LlmBackendResolver` (provider env vars → `mockserver.llmProvider`/`llmApiKey`/`llmModel`/`llmBaseUrl` → named-backends JSON via `mockserver.llmBackendsConfig`). All runtime-LLM use goes through `LlmCompletionService`, which is **off unless a backend is configured**, **fails closed** on any timeout/error/non-2xx (never flipping a deterministic result), and caches per normalised prompt for reproducibility. Ollama is the reference backend (no key, local); Bedrock builds the Anthropic-on-Bedrock request and relies on the `headers` escape hatch pending automatic SigV4 signing. See the configuration properties page and `docs/code/llm-mocking.md`.
 - LLM conversation mocks can now opt into deterministic **prompt normalisation** before the `latestMessageContains` / `latestMessageMatches` predicates are evaluated, so a match is not blocked by cosmetic differences in dynamically-assembled agent prompts. A new `normalization` block on `conversationPredicates` (also exposed per-turn in the `create_llm_conversation` MCP tool and the dashboard conversation wizard) supports collapsing whitespace, lowercasing, sorting JSON object keys, dropping built-in volatile values (ISO-8601 timestamps, UUIDs, `req_`/`msg_`/`call_` ids), and dropping named JSON fields. Normalisation is pure and idempotent — it never makes a test flaky — and has no effect unless a text predicate is set. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
 - DataFaker (`net.datafaker:datafaker:2.5.4`) is now bundled as a template helper. A single shared `Faker` instance is exposed as `faker` in all three response-template engines (Velocity, Mustache, JavaScript) via `TemplateFunctions.BUILT_IN_HELPERS`, giving templates access to 250+ realistic-fake-data providers (`faker.name().firstName()`, `faker.internet().emailAddress()`, `faker.address().city()`, etc.). The instance is thread-safe and produces fresh random values on each call. See the consumer docs (response templates page) for the full provider list and per-engine syntax. Java 17 unlocked this — DataFaker 2.x requires Java 17; the previous Java 11 floor pinned us to the abandoned 1.9.0 line.
 
@@ -162,8 +162,19 @@ Two MCP tools expose the LLM mocking feature to agents:
 |------|-------------|
 | `mock_llm_completion` | Creates a single LLM expectation from provider, path, text, tool calls, usage |
 | `create_llm_conversation` | Creates a multi-turn conversation with scenario state chain, optional isolation, and an optional per-turn `match.normalization` object |
+| `verify_tool_call` | Asserts an agent called a named tool `atLeast`/`atMost` times (optional args regex), by decoding recorded LLM requests |
+| `explain_agent_run` | Summarises a recorded agent run: turn/tool-call sequence, tool results, latest role |
 
-Both validate provider availability against `ProviderCodecRegistry` at registration time.
+The first two validate provider availability against `ProviderCodecRegistry` at registration time. The analysis tools delegate to `org.mockserver.llm.analysis.AgentRunAnalyzer`.
+
+## Agent-run analysis
+
+`AgentRunAnalyzer` (`org.mockserver.llm.analysis`) is a deterministic, read-only inspector. Given the LLM requests MockServer recorded (retrieved via the normal request log), it decodes each with the provider's `ProviderCodec` and treats the **richest** conversation (most messages — the latest dialogue snapshot) as the canonical run. From that it derives:
+
+- `inspectToolCalls(requests, provider, toolName, argsRegex)` → count + matched tool calls (powers `verify_tool_call`).
+- `summarise(requests, provider)` → message count, assistant-turn count, ordered tool-call name sequence, tool-result keys, latest message role (powers `explain_agent_run`).
+
+No LLM is called and no network is used — it reads the structure the codecs already produce, so assertions are reproducible. The MCP tools are thin wrappers that retrieve recorded requests (`/mockserver/retrieve?type=REQUESTS`) and format the analyzer's output. The dashboard surfacing of this analysis is the correlated call-graph view (roadmap item #11).
 
 ## Dashboard Rendering
 
@@ -334,3 +345,4 @@ Key source files under `mockserver/mockserver-core/src/main/java/org/mockserver/
 | `llm/client/LlmBackendResolver.java` | Three-layer backend resolution (env / properties / named JSON) |
 | `llm/client/LlmCompletionService.java` | Orchestrator: off-unless-configured, fail-closed, cached |
 | `llm/client/LlmTransport.java` + `NettyHttpClientLlmTransport.java` | Transport seam over `NettyHttpClient` |
+| `llm/analysis/AgentRunAnalyzer.java` | Deterministic read-only agent-run inspection (tool-call counts, run summary) |
@@ -15,8 +15,8 @@ The original RFC (RFC-1 LLM Response Builder + RFC-2 Stateful Scripted Conversat
 |---|---|---|
 | 1 | LLM response builder (`llmMock`) — RFC-1 | ✅ Shipped (M0–M5) |
 | 2 | Stateful scripted conversations — RFC-2 Layer B | ✅ Shipped (M2) |
-| 3 | Tool-call assertions (`verify_tool_call`) | ❌ Not started |
-| 4 | Agent-run / LLM-session analysis (`explain_agent_run`) | ❌ Not started |
+| 3 | Tool-call assertions (`verify_tool_call`) | ✅ Shipped — `verify_tool_call` MCP tool over `AgentRunAnalyzer` (decodes recorded requests; asserts a named tool was called atLeast/atMost times, optional args regex) |
+| 4 | Agent-run / LLM-session analysis (`explain_agent_run`) | ✅ Shipped — `explain_agent_run` MCP tool (turn/tool-call sequence, tool results, latest role). UI surfacing is item #11 (call-graph view) |
 
 ### Tier 2 — high value
 
 
@@ -1,5 +1,6 @@
 ---
 title: MCP Tools Reference
+description: Full reference for every MCP tool exposed by MockServer, including create_expectation, verify_request, retrieve, debug, OpenAPI, and LLM mocking tools.
 shortTitle: MCP Tools Reference
 layout: page
 pageOrder: 2
@@ -45,6 +46,8 @@ <h2>Tool Overview</h2>
         <tr><td><a href="#load_expectations_from_file"><code>load_expectations_from_file</code></a></td><td>Load expectations from a fixture file for replay</td><td>High</td></tr>
         <tr><td><a href="#mock_llm_completion"><code>mock_llm_completion</code></a></td><td>Create a single-turn LLM completion expectation for any supported provider</td><td>High</td></tr>
         <tr><td><a href="#create_llm_conversation"><code>create_llm_conversation</code></a></td><td>Create a multi-turn scripted LLM conversation with optional per-session isolation</td><td>High</td></tr>
+        <tr><td><a href="#verify_tool_call"><code>verify_tool_call</code></a></td><td>Assert an agent called a named tool, from recorded LLM requests</td><td>High</td></tr>
+        <tr><td><a href="#explain_agent_run"><code>explain_agent_run</code></a></td><td>Summarise a recorded agent run (turns, tool-call sequence)</td><td>High</td></tr>
         <tr><td><a href="#raw_expectation"><code>raw_expectation</code></a></td><td>Full expectation JSON passthrough</td><td>Low</td></tr>
         <tr><td><a href="#raw_retrieve"><code>raw_retrieve</code></a></td><td>Full retrieve with correlation ID filtering</td><td>Low</td></tr>
         <tr><td><a href="#raw_verify"><code>raw_verify</code></a></td><td>Full verification JSON</td><td>Low</td></tr>
@@ -536,7 +539,7 @@ <h3>stop_server</h3>
 
 <h3>create_expectation_from_openapi</h3>
 
-<p>Generate mock expectations from an <a target="_blank" href="https://swagger.io/docs/specification/basic-structure/">OpenAPI v3</a> specification. MockServer will create one expectation per operation in the specification, using example responses where available.</p>
+<p>Generate mock expectations from an <a target="_blank" href="https://swagger.io/docs/specification/basic-structure/" rel="noopener noreferrer">OpenAPI v3</a> specification. MockServer will create one expectation per operation in the specification, using example responses where available.</p>
 
 <table>
     <thead>
@@ -1093,6 +1096,70 @@ <h3>create_llm_conversation</h3>
 
 <p>The <code>scenarioName</code> in the response is auto-generated and encodes the isolation key. The <code>states</code> array shows the scenario state progression: <code>Started</code> &rarr; <code>turn_1</code> &rarr; <code>__done</code>. Each concurrent session identified by a distinct <code>x-session-id</code> header value advances through its own copy of this state chain.</p>
 
+<a id="verify_tool_call" class="anchor" href="#verify_tool_call">&nbsp;</a>
+
+<h3>verify_tool_call</h3>
+
+<p>Assert that an agent called a particular tool, by decoding the LLM requests MockServer recorded and inspecting the assistant tool calls in the conversation. Deterministic and read-only &mdash; it does not call any LLM. Useful for testing that your agent decided to use the expected tool (and, optionally, with the expected arguments).</p>
+
+<table>
+    <thead>
+        <tr><th>Parameter</th><th>Type</th><th>Required</th><th>Description</th></tr>
+    </thead>
+    <tbody>
+        <tr><td><code>provider</code></td><td>string</td><td>Yes</td><td>LLM provider whose recorded requests to inspect (e.g. <code>ANTHROPIC</code>, <code>OPENAI</code>)</td></tr>
+        <tr><td><code>toolName</code></td><td>string</td><td>Yes</td><td>Name of the tool the agent should have called</td></tr>
+        <tr><td><code>path</code></td><td>string</td><td>No</td><td>Restrict to requests on this path (e.g. <code>/v1/messages</code>)</td></tr>
+        <tr><td><code>argumentsRegex</code></td><td>string</td><td>No</td><td>Java regex matched against the tool call's argument JSON</td></tr>
+        <tr><td><code>atLeast</code></td><td>integer</td><td>No</td><td>Minimum matching calls required (default 1)</td></tr>
+        <tr><td><code>atMost</code></td><td>integer</td><td>No</td><td>Maximum matching calls allowed</td></tr>
+    </tbody>
+</table>
+
+<p>The result reports <code>count</code> (matching tool calls found) and <code>satisfied</code> (whether the count met the <code>atLeast</code>/<code>atMost</code> constraints); when not satisfied it includes a human-readable <code>message</code>.</p>
+
+<pre class="prettyprint code"><code class="code">{
+  "jsonrpc": "2.0",
+  "id": 40,
+  "method": "tools/call",
+  "params": {
+    "name": "verify_tool_call",
+    "arguments": {
+      "provider": "ANTHROPIC",
+      "path": "/v1/messages",
+      "toolName": "get_weather",
+      "argumentsRegex": "Paris",
+      "atLeast": 1
+    }
+  }
+}</code></pre>
+
+<a id="explain_agent_run" class="anchor" href="#explain_agent_run">&nbsp;</a>
+
+<h3>explain_agent_run</h3>
+
+<p>Summarise an agent run reconstructed from recorded LLM requests &mdash; a quick way to see what an agent did without reading raw request bodies. Returns the message count, the number of assistant turns, the ordered sequence of tool-call names (<code>toolCallSequence</code>), the tool-use IDs a result was returned for (<code>toolResultsFor</code>, e.g. <code>"toolu_1"</code>), and the role of the latest message. Deterministic and read-only.</p>
+
+<table>
+    <thead>
+        <tr><th>Parameter</th><th>Type</th><th>Required</th><th>Description</th></tr>
+    </thead>
+    <tbody>
+        <tr><td><code>provider</code></td><td>string</td><td>Yes</td><td>LLM provider whose recorded requests to summarise</td></tr>
+        <tr><td><code>path</code></td><td>string</td><td>No</td><td>Restrict to requests on this path</td></tr>
+    </tbody>
+</table>
+
+<pre class="prettyprint code"><code class="code">{
+  "jsonrpc": "2.0",
+  "id": 41,
+  "method": "tools/call",
+  "params": {
+    "name": "explain_agent_run",
+    "arguments": { "provider": "ANTHROPIC", "path": "/v1/messages" }
+  }
+}</code></pre>
+
 <a id="low_level_tools" class="anchor" href="#low_level_tools">&nbsp;</a>
 
 <h2>Low-Level Tools</h2>
@@ -1110,7 +1177,7 @@ <h3>raw_expectation</h3>
         <tr><th>Parameter</th><th>Type</th><th>Required</th><th>Description</th></tr>
     </thead>
     <tbody>
-        <tr><td><code>expectation</code></td><td>object</td><td>Yes</td><td>Full expectation JSON as defined in the <a target="_blank" href="https://app.swaggerhub.com/apis/jamesdbloom/mock-server-openapi">REST API specification</a></td></tr>
+        <tr><td><code>expectation</code></td><td>object</td><td>Yes</td><td>Full expectation JSON as defined in the <a target="_blank" href="https://app.swaggerhub.com/apis/jamesdbloom/mock-server-openapi" rel="noopener noreferrer">REST API specification</a></td></tr>
     </tbody>
 </table>
 
@@ -1196,7 +1263,7 @@ <h3>raw_verify</h3>
         <tr><th>Parameter</th><th>Type</th><th>Required</th><th>Description</th></tr>
     </thead>
     <tbody>
-        <tr><td><code>verification</code></td><td>object</td><td>Yes</td><td>Full verification JSON as defined in the <a target="_blank" href="https://app.swaggerhub.com/apis/jamesdbloom/mock-server-openapi">REST API specification</a></td></tr>
+        <tr><td><code>verification</code></td><td>object</td><td>Yes</td><td>Full verification JSON as defined in the <a target="_blank" href="https://app.swaggerhub.com/apis/jamesdbloom/mock-server-openapi" rel="noopener noreferrer">REST API specification</a></td></tr>
     </tbody>
 </table>