Skip to content

Commit 54d32c8

Browse files
committed
feat(llm): complete VCR toolkit — strict mode, body redaction, replay normalisation
Finish the LLM fixture record/replay toolkit (U4 cassette manager was already shipped) with three operational additions, exposed via config + MCP. - Body-field redaction: FixtureRedactor gains an optional JSON body-field list (recursive, case-insensitive, on copies); redacts named fields in recorded request/response bodies. Driven by record_llm_fixtures `redactBodyFields` or the mockserver.fixtureBodyRedactFields config. Non-JSON bodies untouched; default (header-only) behaviour unchanged. Reuses FixtureRedactor .defaultSensitiveHeaders() so the header list cannot diverge. - Strict VCR mode: load_expectations_from_file `strict` (or mockserver.llmVcrStrict) registers a lowest-priority catch-all per cassette path (stable id, so reloads upsert rather than accumulate) returning HTTP 599 so an unmatched request fails loudly. Error body built via the ObjectMapper (no JSON injection from path values). - Replay normalisation: load_expectations_from_file `normalizeRequestBodyFields` drops volatile JSON fields (case-insensitive, objects + arrays) from each recorded request body and rewrites the matcher to JsonBody ONLY_MATCHING_FIELDS, so per-run ids/timestamps do not block replay. Docs: docs/code/llm-mocking.md (VCR section + config rows + source ref), consumer configuration-properties and AI/MCP tools pages, roadmap status, changelog. Tests: FixtureRedactorTest body-redaction cases (request/response/nested/ non-JSON/default-untouched) + LlmMcpToolsTest strict-mode and case-insensitive replay-normalisation (asserts the dropped field is gone from active expectations). Core + netty tests green.
1 parent 8b40cf2 commit 54d32c8

10 files changed

Lines changed: 433 additions & 10 deletions

File tree

changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77
## [Unreleased]
88

99
### Added
10+
- Completed the **VCR (record/replay) toolkit** for LLM fixtures with three additions. (1) **Strict mode**`load_expectations_from_file` accepts `strict` (or set `mockserver.llmVcrStrict`), which registers a low-priority catch-all per cassette path so a request matching no recorded fixture returns HTTP 599 instead of silently falling through. (2) **Body-field redaction**`record_llm_fixtures` accepts `redactBodyFields` (or set `mockserver.fixtureBodyRedactFields`) to redact named JSON fields from recorded request/response bodies, complementing the existing header redaction. (3) **Replay field normalisation**`load_expectations_from_file` accepts `normalizeRequestBodyFields` to drop volatile JSON fields from each recorded request body and match the remainder loosely (ignoring extra fields), so per-run values (request ids, timestamps) do not block replay. These are operational settings exposed via config and MCP. See the AI/MCP tools and configuration properties pages.
1011
- Added declarative **LLM fault/chaos profiles** for resilience testing, attachable to any mock LLM response (`mock_llm_completion`, each `create_llm_conversation` turn, the Java `LlmConversationBuilder`, and raw expectation JSON via a `chaos` block). Supports probabilistic provider errors (e.g. 429/529 with a `Retry-After` header), mid-stream truncation of an SSE stream (keep a leading fraction of events), and appending a malformed (broken-JSON) SSE chunk. Errors are deterministic at probability 0.0/1.0 and reproducible at fractional probabilities via a `seed`; truncation and malformed-SSE are always deterministic. A new `LLM_CHAOS_INJECTED_COUNT` metric tracks injections. The dashboard conversation wizard exposes the profile per turn. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
1112
- Added two MCP tools for **agent-run analysis and tool-call assertions**, both backed by a new deterministic `org.mockserver.llm.analysis.AgentRunAnalyzer` that reconstructs an agent run by decoding the LLM requests MockServer recorded. `verify_tool_call` asserts that an agent called a named tool a given number of times (`atLeast`/`atMost`, with an optional regex over the tool-call arguments); `explain_agent_run` summarises the run's structure (message and assistant-turn counts, the ordered tool-call sequence, tool results, and the latest message role). Read-only and offline — no LLM call. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
1213
- Added a **runtime-LLM client SPI** (`org.mockserver.llm.client`) that lets MockServer call a real LLM you already run, as the foundation for opt-in features such as drift detection and exploratory semantic matching. Mirrors the existing codec registry: an `LlmClient` per provider (Ollama, OpenAI, OpenAI Responses, Azure OpenAI, Anthropic, Gemini, Bedrock) registered in `LlmClientRegistry`, an immutable `LlmBackend` config (with the API key redacted in logs), and a three-layer `LlmBackendResolver` (provider env vars → `mockserver.llmProvider`/`llmApiKey`/`llmModel`/`llmBaseUrl` → named-backends JSON via `mockserver.llmBackendsConfig`). All runtime-LLM use goes through `LlmCompletionService`, which is **off unless a backend is configured**, **fails closed** on any timeout/error/non-2xx (never flipping a deterministic result), and caches per normalised prompt for reproducibility. Ollama is the reference backend (no key, local); Bedrock builds the Anthropic-on-Bedrock request and relies on the `headers` escape hatch pending automatic SigV4 signing. See the configuration properties page and `docs/code/llm-mocking.md`.

docs/code/llm-mocking.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -300,11 +300,22 @@ Adding a provider = implement `LlmClient` + one `register(...)` line — the sam
300300

301301
This SPI is never on the deterministic assertion/matching path. The features that consume it (drift detection, semantic matching) are tracked in `docs/plans/mockserver-llm-mocking.md`.
302302

303+
## VCR (record / replay)
304+
305+
LLM/MCP traffic forwarded through MockServer can be snapshotted to committable fixture files and replayed deterministically:
306+
307+
- **Record**`record_llm_fixtures` (MCP) converts recorded request/response pairs (including SSE) into expectations via `SseAwareExpectationConverter`, then `FixtureRedactor` masks sensitive **headers** and — when `redactBodyFields` / `mockserver.fixtureBodyRedactFields` is set — named **JSON body fields** (recursively, value → `***REDACTED***`).
308+
- **Replay**`load_expectations_from_file` (MCP) loads the fixture as active expectations. Two replay aids: **strict mode** (`strict` param or `mockserver.llmVcrStrict`) registers a lowest-priority (`Integer.MIN_VALUE`) catch-all per cassette path returning HTTP 599 so an unmatched request fails loudly; **replay normalisation** (`normalizeRequestBodyFields`) drops volatile JSON fields from each recorded request body and rewrites the matcher to `JsonBody` with `MatchType.ONLY_MATCHING_FIELDS`, so per-run values do not block the match.
309+
310+
These are operational settings (config + MCP, for CI/automation), not dashboard controls.
311+
303312
## Configuration
304313

305314
| Property | Default | Range | Description |
306315
|----------|---------|-------|-------------|
307316
| `mockserver.maxLlmConversationBodySize` | `1048576` (1 MiB) | 16384 - 67108864 | Maximum request body size for conversation matcher parsing |
317+
| `mockserver.fixtureBodyRedactFields` | _(unset)_ || Comma-separated JSON field names redacted from recorded fixture bodies |
318+
| `mockserver.llmVcrStrict` | `false` || Strict VCR mode: unmatched requests on a cassette path return HTTP 599 |
308319
| `mockserver.llmProvider` | _(unset)_ || Default runtime-LLM backend provider (enables runtime-LLM features) |
309320
| `mockserver.llmApiKey` | _(unset)_ || API key for the default backend (secret; redacted in logs) |
310321
| `mockserver.llmModel` | _(provider default)_ || Model for the default backend |
@@ -358,3 +369,4 @@ Key source files under `mockserver/mockserver-core/src/main/java/org/mockserver/
358369
| `llm/analysis/AgentRunAnalyzer.java` | Deterministic read-only agent-run inspection (tool-call counts, run summary) |
359370
| `model/LlmChaosProfile.java` | Fault/chaos profile carried on `HttpLlmResponse` |
360371
| `mock/action/http/HttpLlmResponseActionHandler.java` | Encodes LLM responses and applies chaos (error / truncation / malformed SSE) |
372+
| `fixture/FixtureRedactor.java` | Masks sensitive headers and (optional) JSON body fields when recording fixtures |

docs/plans/mockserver-llm-mocking.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The original RFC (RFC-1 LLM Response Builder + RFC-2 Stateful Scripted Conversat
2424
|---|---|---|
2525
| 5 | Token/cost analytics + budget assertions | ✅ Shipped (U3 — token/cost rollup tile + session inspector) |
2626
| 6 | LLM fault/chaos profiles (429/529 + Retry-After, mid-stream truncation, malformed SSE, probabilistic error rates) | ✅ Shipped — `LlmChaosProfile` on `HttpLlmResponse`, applied in `HttpLlmResponseActionHandler` (+ dispatcher); MCP `chaos` on `mock_llm_completion` and per conversation turn; dashboard wizard control; `LLM_CHAOS_INJECTED_COUNT` metric |
27-
| 7 | VCR mode + strict mode + body redaction + field normalisation | 🟡 Partial — cassette manager shipped in U4; strict-mode, body redaction, and field normalisation still open |
27+
| 7 | VCR mode + strict mode + body redaction + field normalisation | ✅ ShippedU4 cassette manager + strict mode (`mockserver.llmVcrStrict` / `load` `strict`, 599 catch-all), body-field redaction (`FixtureRedactor` + `mockserver.fixtureBodyRedactFields` / `record` `redactBodyFields`), replay normalisation (`load` `normalizeRequestBodyFields`) |
2828

2929
### Tier 3 — valuable / specialised
3030

jekyll-www.mock-server.com/mock_server/ai_mcp_tools.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -807,6 +807,7 @@ <h3>record_llm_fixtures</h3>
807807
<tr><td><code>path</code></td><td>string</td><td>Yes</td><td>File path to write the fixture JSON to. The directory must exist.</td></tr>
808808
<tr><td><code>host</code></td><td>string</td><td>No</td><td>Only include recorded traffic whose request host matches this value</td></tr>
809809
<tr><td><code>requestPath</code></td><td>string</td><td>No</td><td>Only include recorded traffic whose request path matches this value</td></tr>
810+
<tr><td><code>redactBodyFields</code></td><td>array of string</td><td>No</td><td>JSON field names whose values are redacted from recorded request/response bodies (in addition to sensitive headers and the <code>mockserver.fixtureBodyRedactFields</code> config). Non-JSON bodies are left intact.</td></tr>
810811
</tbody>
811812
</table>
812813

@@ -853,6 +854,8 @@ <h3>load_expectations_from_file</h3>
853854
</thead>
854855
<tbody>
855856
<tr><td><code>path</code></td><td>string</td><td>Yes</td><td>File path to the fixture JSON file to load. Must contain expectations in MockServer's standard JSON format.</td></tr>
857+
<tr><td><code>strict</code></td><td>boolean</td><td>No</td><td>Strict VCR mode: register a low-priority catch-all per cassette path so a request matching no recorded fixture returns HTTP <code>599</code> instead of falling through. Defaults to the <code>mockserver.llmVcrStrict</code> config.</td></tr>
858+
<tr><td><code>normalizeRequestBodyFields</code></td><td>array of string</td><td>No</td><td>JSON field names to drop from each recorded request body on load; the remaining fields are matched loosely (extra fields in the incoming request are ignored), so volatile per-run values do not block replay.</td></tr>
856859
</tbody>
857860
</table>
858861

jekyll-www.mock-server.com/mock_server/configuration_properties.html

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -863,6 +863,31 @@ <h2>Streaming Proxy Configuration:</h2>
863863
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.llmProvider="OPENAI" -Dmockserver.llmApiKey="sk-..."</code></pre>
864864
</div>
865865

866+
<button id="button_configuration_llm_vcr" class="accordion title"><strong>LLM Fixture (VCR) Recording &amp; Replay</strong></button>
867+
<div class="panel title">
868+
<p>Controls for recording LLM/MCP traffic to committable fixture files and replaying it deterministically (see the <a href="/mock_server/ai_mcp_tools.html"><code>record_llm_fixtures</code> and <code>load_expectations_from_file</code></a> MCP tools).</p>
869+
<p><strong>Body field redaction</strong> &mdash; comma-separated JSON field names whose values are redacted from recorded request/response bodies, in addition to the always-redacted sensitive headers. Empty by default.</p>
870+
<p>Type: <span class="keyword">String</span> Default: <span class="this_value">unset</span></p>
871+
<p>Java Code:</p>
872+
<pre class="prettyprint lang-java code"><code class="code">ConfigurationProperties.fixtureBodyRedactFields(String commaSeparatedFields)</code></pre>
873+
<p>System Property:</p>
874+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.fixtureBodyRedactFields=...</code></pre>
875+
<p>Environment Variable:</p>
876+
<pre class="code" style="padding: 2px;"><code class="code">MOCKSERVER_FIXTURE_BODY_REDACT_FIELDS=...</code></pre>
877+
<p>Example:</p>
878+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.fixtureBodyRedactFields="api_key,password,token"</code></pre>
879+
<p><strong>Strict VCR mode</strong> &mdash; when <code>true</code>, loading a fixture registers a low-priority catch-all per cassette path so a request matching no recorded entry returns HTTP <code>599</code> rather than falling through. Useful for catching un-recorded calls in tests. Default <code>false</code> (can also be set per call via the <code>load_expectations_from_file</code> <code>strict</code> parameter).</p>
880+
<p>Type: <span class="keyword">boolean</span> Default: <span class="this_value">false</span></p>
881+
<p>Java Code:</p>
882+
<pre class="prettyprint lang-java code"><code class="code">ConfigurationProperties.llmVcrStrict(boolean strict)</code></pre>
883+
<p>System Property:</p>
884+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.llmVcrStrict=...</code></pre>
885+
<p>Environment Variable:</p>
886+
<pre class="code" style="padding: 2px;"><code class="code">MOCKSERVER_LLM_VCR_STRICT=...</code></pre>
887+
<p>Example:</p>
888+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.llmVcrStrict="true"</code></pre>
889+
</div>
890+
866891
<button id="button_configuration_stream_idle_timeout_seconds" class="accordion title"><strong>Streaming Response Idle Timeout</strong></button>
867892
<div class="panel title">
868893
<p>The maximum time in seconds a streaming response connection may be idle (no chunk received from the upstream server) before MockServer closes it and logs the captured portion as truncated. This replaces the fixed global socket timeout for streaming responses, which would otherwise terminate long-lived LLM completions. The timeout resets on every chunk received, so a slow-but-active stream is never cut off prematurely.</p>

mockserver/mockserver-core/src/main/java/org/mockserver/configuration/ConfigurationProperties.java

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,8 @@ public class ConfigurationProperties {
9191
private static final String MOCKSERVER_LLM_BASE_URL = "mockserver.llmBaseUrl";
9292
private static final String MOCKSERVER_LLM_BACKENDS_CONFIG = "mockserver.llmBackendsConfig";
9393
private static final String MOCKSERVER_LLM_REQUEST_TIMEOUT_MILLIS = "mockserver.llmRequestTimeoutMillis";
94+
private static final String MOCKSERVER_FIXTURE_BODY_REDACT_FIELDS = "mockserver.fixtureBodyRedactFields";
95+
private static final String MOCKSERVER_LLM_VCR_STRICT = "mockserver.llmVcrStrict";
9496
private static final String MOCKSERVER_USE_SEMICOLON_AS_QUERY_PARAMETER_SEPARATOR = "mockserver.useSemicolonAsQueryParameterSeparator";
9597
private static final String MOCKSERVER_ASSUME_ALL_REQUESTS_ARE_HTTP = "mockserver.assumeAllRequestsAreHttp";
9698
private static final String MOCKSERVER_HTTP2_ENABLED = "mockserver.http2Enabled";
@@ -1040,6 +1042,32 @@ public static void llmRequestTimeoutMillis(long millis) {
10401042
setProperty(MOCKSERVER_LLM_REQUEST_TIMEOUT_MILLIS, "" + millis);
10411043
}
10421044

1045+
/**
1046+
* Comma-separated JSON field names whose values are redacted from recorded
1047+
* fixture request/response bodies (in addition to the always-redacted
1048+
* sensitive headers). Empty by default. Used by {@code record_llm_fixtures}.
1049+
*/
1050+
public static String fixtureBodyRedactFields() {
1051+
return readPropertyHierarchically(PROPERTIES, MOCKSERVER_FIXTURE_BODY_REDACT_FIELDS, "MOCKSERVER_FIXTURE_BODY_REDACT_FIELDS", "");
1052+
}
1053+
1054+
public static void fixtureBodyRedactFields(String fields) {
1055+
setProperty(MOCKSERVER_FIXTURE_BODY_REDACT_FIELDS, fields);
1056+
}
1057+
1058+
/**
1059+
* When true, loading LLM fixtures in strict VCR mode registers a low-priority
1060+
* catch-all per cassette path so a request that matches no recorded entry
1061+
* fails loudly (HTTP 599) instead of falling through. Default false.
1062+
*/
1063+
public static boolean llmVcrStrict() {
1064+
return Boolean.parseBoolean(readPropertyHierarchically(PROPERTIES, MOCKSERVER_LLM_VCR_STRICT, "MOCKSERVER_LLM_VCR_STRICT", "" + false));
1065+
}
1066+
1067+
public static void llmVcrStrict(boolean strict) {
1068+
setProperty(MOCKSERVER_LLM_VCR_STRICT, "" + strict);
1069+
}
1070+
10431071
public static long regexMatchingTimeoutMillis() {
10441072
return readLongProperty(MOCKSERVER_REGEX_MATCHING_TIMEOUT_MILLIS, "MOCKSERVER_REGEX_MATCHING_TIMEOUT_MILLIS", 5000L);
10451073
}

0 commit comments

Comments
 (0)