You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(llm): tool-call assertions and agent-run analysis MCP tools
Add deterministic, read-only analysis of an agent run reconstructed from the
LLM requests MockServer recorded — no LLM call, fully reproducible.
- AgentRunAnalyzer (mockserver-core, org.mockserver.llm.analysis): decodes
recorded requests with the provider codec, treats the richest conversation
(most messages = latest snapshot) as canonical, and exposes inspectToolCalls
(count assistant tool calls by name + optional args regex) and summarise
(message/assistant-turn counts, ordered tool-call sequence, tool-result IDs,
latest message role). Pure and offline.
- verify_tool_call MCP tool: assert an agent called a named tool atLeast/atMost
times, optionally with arguments matching a regex.
- explain_agent_run MCP tool: structural summary of a recorded run.
Both retrieve recorded requests via /mockserver/retrieve and delegate to
AgentRunAnalyzer; validate provider + params (atMost >= atLeast).
The dashboard surfacing of this analysis is roadmap item #11 (correlated
call-graph view), which builds on AgentRunAnalyzer — so this phase is backend +
MCP + docs.
Docs: docs/code/llm-mocking.md (Agent-run analysis section + tools + source
refs), consumer AI/MCP tools page (two new tool sections), roadmap status,
changelog.
Tests: 7 AgentRunAnalyzerTest (counts, args-regex filter, run summary, richest
snapshot, empty/non-decodable, tool-result correlation) + 6 LlmMcpToolsTest
(verify satisfied/unsatisfied, args filter, missing toolName, explain with and
without recorded conversation). Core + netty tests green.
Copy file name to clipboardExpand all lines: changelog.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
## [Unreleased]
8
8
9
9
### Added
10
+
- Added two MCP tools for **agent-run analysis and tool-call assertions**, both backed by a new deterministic `org.mockserver.llm.analysis.AgentRunAnalyzer` that reconstructs an agent run by decoding the LLM requests MockServer recorded. `verify_tool_call` asserts that an agent called a named tool a given number of times (`atLeast`/`atMost`, with an optional regex over the tool-call arguments); `explain_agent_run` summarises the run's structure (message and assistant-turn counts, the ordered tool-call sequence, tool results, and the latest message role). Read-only and offline — no LLM call. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
10
11
- Added a **runtime-LLM client SPI** (`org.mockserver.llm.client`) that lets MockServer call a real LLM you already run, as the foundation for opt-in features such as drift detection and exploratory semantic matching. Mirrors the existing codec registry: an `LlmClient` per provider (Ollama, OpenAI, OpenAI Responses, Azure OpenAI, Anthropic, Gemini, Bedrock) registered in `LlmClientRegistry`, an immutable `LlmBackend` config (with the API key redacted in logs), and a three-layer `LlmBackendResolver` (provider env vars → `mockserver.llmProvider`/`llmApiKey`/`llmModel`/`llmBaseUrl` → named-backends JSON via `mockserver.llmBackendsConfig`). All runtime-LLM use goes through `LlmCompletionService`, which is **off unless a backend is configured**, **fails closed** on any timeout/error/non-2xx (never flipping a deterministic result), and caches per normalised prompt for reproducibility. Ollama is the reference backend (no key, local); Bedrock builds the Anthropic-on-Bedrock request and relies on the `headers` escape hatch pending automatic SigV4 signing. See the configuration properties page and `docs/code/llm-mocking.md`.
11
12
- LLM conversation mocks can now opt into deterministic **prompt normalisation** before the `latestMessageContains` / `latestMessageMatches` predicates are evaluated, so a match is not blocked by cosmetic differences in dynamically-assembled agent prompts. A new `normalization` block on `conversationPredicates` (also exposed per-turn in the `create_llm_conversation` MCP tool and the dashboard conversation wizard) supports collapsing whitespace, lowercasing, sorting JSON object keys, dropping built-in volatile values (ISO-8601 timestamps, UUIDs, `req_`/`msg_`/`call_` ids), and dropping named JSON fields. Normalisation is pure and idempotent — it never makes a test flaky — and has no effect unless a text predicate is set. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
12
13
- DataFaker (`net.datafaker:datafaker:2.5.4`) is now bundled as a template helper. A single shared `Faker` instance is exposed as `faker` in all three response-template engines (Velocity, Mustache, JavaScript) via `TemplateFunctions.BUILT_IN_HELPERS`, giving templates access to 250+ realistic-fake-data providers (`faker.name().firstName()`, `faker.internet().emailAddress()`, `faker.address().city()`, etc.). The instance is thread-safe and produces fresh random values on each call. See the consumer docs (response templates page) for the full provider list and per-engine syntax. Java 17 unlocked this — DataFaker 2.x requires Java 17; the previous Java 11 floor pinned us to the abandoned 1.9.0 line.
Copy file name to clipboardExpand all lines: docs/code/llm-mocking.md
+13-1Lines changed: 13 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -162,8 +162,19 @@ Two MCP tools expose the LLM mocking feature to agents:
162
162
|------|-------------|
163
163
|`mock_llm_completion`| Creates a single LLM expectation from provider, path, text, tool calls, usage |
164
164
|`create_llm_conversation`| Creates a multi-turn conversation with scenario state chain, optional isolation, and an optional per-turn `match.normalization` object |
165
+
|`verify_tool_call`| Asserts an agent called a named tool `atLeast`/`atMost` times (optional args regex), by decoding recorded LLM requests |
166
+
|`explain_agent_run`| Summarises a recorded agent run: turn/tool-call sequence, tool results, latest role |
165
167
166
-
Both validate provider availability against `ProviderCodecRegistry` at registration time.
168
+
The first two validate provider availability against `ProviderCodecRegistry` at registration time. The analysis tools delegate to `org.mockserver.llm.analysis.AgentRunAnalyzer`.
169
+
170
+
## Agent-run analysis
171
+
172
+
`AgentRunAnalyzer` (`org.mockserver.llm.analysis`) is a deterministic, read-only inspector. Given the LLM requests MockServer recorded (retrieved via the normal request log), it decodes each with the provider's `ProviderCodec` and treats the **richest** conversation (most messages — the latest dialogue snapshot) as the canonical run. From that it derives:
-`summarise(requests, provider)` → message count, assistant-turn count, ordered tool-call name sequence, tool-result keys, latest message role (powers `explain_agent_run`).
176
+
177
+
No LLM is called and no network is used — it reads the structure the codecs already produce, so assertions are reproducible. The MCP tools are thin wrappers that retrieve recorded requests (`/mockserver/retrieve?type=REQUESTS`) and format the analyzer's output. The dashboard surfacing of this analysis is the correlated call-graph view (roadmap item #11).
167
178
168
179
## Dashboard Rendering
169
180
@@ -334,3 +345,4 @@ Key source files under `mockserver/mockserver-core/src/main/java/org/mockserver/
| 3 | Tool-call assertions (`verify_tool_call`) |✅ Shipped — `verify_tool_call` MCP tool over `AgentRunAnalyzer` (decodes recorded requests; asserts a named tool was called atLeast/atMost times, optional args regex)|
Copy file name to clipboardExpand all lines: jekyll-www.mock-server.com/mock_server/ai_mcp_tools.html
+70-3Lines changed: 70 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,6 @@
1
1
---
2
2
title: MCP Tools Reference
3
+
description: Full reference for every MCP tool exposed by MockServer, including create_expectation, verify_request, retrieve, debug, OpenAPI, and LLM mocking tools.
3
4
shortTitle: MCP Tools Reference
4
5
layout: page
5
6
pageOrder: 2
@@ -45,6 +46,8 @@ <h2>Tool Overview</h2>
45
46
<tr><td><ahref="#load_expectations_from_file"><code>load_expectations_from_file</code></a></td><td>Load expectations from a fixture file for replay</td><td>High</td></tr>
46
47
<tr><td><ahref="#mock_llm_completion"><code>mock_llm_completion</code></a></td><td>Create a single-turn LLM completion expectation for any supported provider</td><td>High</td></tr>
47
48
<tr><td><ahref="#create_llm_conversation"><code>create_llm_conversation</code></a></td><td>Create a multi-turn scripted LLM conversation with optional per-session isolation</td><td>High</td></tr>
49
+
<tr><td><ahref="#verify_tool_call"><code>verify_tool_call</code></a></td><td>Assert an agent called a named tool, from recorded LLM requests</td><td>High</td></tr>
50
+
<tr><td><ahref="#explain_agent_run"><code>explain_agent_run</code></a></td><td>Summarise a recorded agent run (turns, tool-call sequence)</td><td>High</td></tr>
<p>Generate mock expectations from an <atarget="_blank" href="https://swagger.io/docs/specification/basic-structure/">OpenAPI v3</a> specification. MockServer will create one expectation per operation in the specification, using example responses where available.</p>
542
+
<p>Generate mock expectations from an <atarget="_blank" href="https://swagger.io/docs/specification/basic-structure/"rel="noopener noreferrer">OpenAPI v3</a> specification. MockServer will create one expectation per operation in the specification, using example responses where available.</p>
<p>The <code>scenarioName</code> in the response is auto-generated and encodes the isolation key. The <code>states</code> array shows the scenario state progression: <code>Started</code> → <code>turn_1</code> → <code>__done</code>. Each concurrent session identified by a distinct <code>x-session-id</code> header value advances through its own copy of this state chain.</p>
<p>Assert that an agent called a particular tool, by decoding the LLM requests MockServer recorded and inspecting the assistant tool calls in the conversation. Deterministic and read-only — it does not call any LLM. Useful for testing that your agent decided to use the expected tool (and, optionally, with the expected arguments).</p>
<tr><td><code>provider</code></td><td>string</td><td>Yes</td><td>LLM provider whose recorded requests to inspect (e.g. <code>ANTHROPIC</code>, <code>OPENAI</code>)</td></tr>
1111
+
<tr><td><code>toolName</code></td><td>string</td><td>Yes</td><td>Name of the tool the agent should have called</td></tr>
1112
+
<tr><td><code>path</code></td><td>string</td><td>No</td><td>Restrict to requests on this path (e.g. <code>/v1/messages</code>)</td></tr>
1113
+
<tr><td><code>argumentsRegex</code></td><td>string</td><td>No</td><td>Java regex matched against the tool call's argument JSON</td></tr>
<p>The result reports <code>count</code> (matching tool calls found) and <code>satisfied</code> (whether the count met the <code>atLeast</code>/<code>atMost</code> constraints); when not satisfied it includes a human-readable <code>message</code>.</p>
<p>Summarise an agent run reconstructed from recorded LLM requests — a quick way to see what an agent did without reading raw request bodies. Returns the message count, the number of assistant turns, the ordered sequence of tool-call names (<code>toolCallSequence</code>), the tool-use IDs a result was returned for (<code>toolResultsFor</code>, e.g. <code>"toolu_1"</code>), and the role of the latest message. Deterministic and read-only.</p>
<tr><td><code>expectation</code></td><td>object</td><td>Yes</td><td>Full expectation JSON as defined in the <atarget="_blank" href="https://app.swaggerhub.com/apis/jamesdbloom/mock-server-openapi">REST API specification</a></td></tr>
1180
+
<tr><td><code>expectation</code></td><td>object</td><td>Yes</td><td>Full expectation JSON as defined in the <atarget="_blank" href="https://app.swaggerhub.com/apis/jamesdbloom/mock-server-openapi"rel="noopener noreferrer">REST API specification</a></td></tr>
<tr><td><code>verification</code></td><td>object</td><td>Yes</td><td>Full verification JSON as defined in the <atarget="_blank" href="https://app.swaggerhub.com/apis/jamesdbloom/mock-server-openapi">REST API specification</a></td></tr>
1266
+
<tr><td><code>verification</code></td><td>object</td><td>Yes</td><td>Full verification JSON as defined in the <atarget="_blank" href="https://app.swaggerhub.com/apis/jamesdbloom/mock-server-openapi"rel="noopener noreferrer">REST API specification</a></td></tr>
0 commit comments