You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(llm): declarative fault/chaos profiles for LLM responses
Add LlmChaosProfile, attachable to any HttpLlmResponse, for agent resilience
testing: probabilistic provider errors (e.g. 429/529 with Retry-After),
mid-stream SSE truncation, and a malformed (broken-JSON) SSE chunk.
- LlmChaosProfile model (nullable fields + TruncateMode enum); carried on
HttpLlmResponse, round-trips via DTO/serializer/schema.
- HttpLlmResponseActionHandler.chaosErrorResponseOrNull builds the error
response (status + Retry-After); applyStreamingChaos truncates the SSE event
list and/or appends a malformed chunk. HttpActionHandler checks the error
first and returns it on the normal (non-streaming) path even for a would-be
stream — a provider error is a plain HTTP response, not SSE.
- Determinism: error decision deterministic at probability 0.0/1.0 and
reproducible at fractional probability via seed; truncation and malformed-SSE
always deterministic. New LLM_CHAOS_INJECTED_COUNT metric (single Metrics
instance reused; incremented only when chaos actually applies).
- Wired through the client TurnBuilder/LlmConversationBuilder, the
mock_llm_completion and per-turn create_llm_conversation MCP tools, and the
dashboard conversation wizard (error status / Retry-After / probability /
truncate mode + fraction / malformed SSE / seed).
Docs: docs/code/llm-mocking.md (chaos section + source refs), consumer AI/MCP
tools page (chaos field tables), roadmap status, changelog.
Tests: 11 HttpLlmResponseActionHandlerChaosTest (error boundaries, seeded
determinism, truncation fraction/default, malformed append, combine) + chaos
round-trip + 2 MCP chaos tests + 5 UI codegen chaos tests. Core, netty, and UI
gates green.
Copy file name to clipboardExpand all lines: changelog.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
## [Unreleased]
8
8
9
9
### Added
10
+
- Added declarative **LLM fault/chaos profiles** for resilience testing, attachable to any mock LLM response (`mock_llm_completion`, each `create_llm_conversation` turn, the Java `LlmConversationBuilder`, and raw expectation JSON via a `chaos` block). Supports probabilistic provider errors (e.g. 429/529 with a `Retry-After` header), mid-stream truncation of an SSE stream (keep a leading fraction of events), and appending a malformed (broken-JSON) SSE chunk. Errors are deterministic at probability 0.0/1.0 and reproducible at fractional probabilities via a `seed`; truncation and malformed-SSE are always deterministic. A new `LLM_CHAOS_INJECTED_COUNT` metric tracks injections. The dashboard conversation wizard exposes the profile per turn. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
10
11
- Added two MCP tools for **agent-run analysis and tool-call assertions**, both backed by a new deterministic `org.mockserver.llm.analysis.AgentRunAnalyzer` that reconstructs an agent run by decoding the LLM requests MockServer recorded. `verify_tool_call` asserts that an agent called a named tool a given number of times (`atLeast`/`atMost`, with an optional regex over the tool-call arguments); `explain_agent_run` summarises the run's structure (message and assistant-turn counts, the ordered tool-call sequence, tool results, and the latest message role). Read-only and offline — no LLM call. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
11
12
- Added a **runtime-LLM client SPI** (`org.mockserver.llm.client`) that lets MockServer call a real LLM you already run, as the foundation for opt-in features such as drift detection and exploratory semantic matching. Mirrors the existing codec registry: an `LlmClient` per provider (Ollama, OpenAI, OpenAI Responses, Azure OpenAI, Anthropic, Gemini, Bedrock) registered in `LlmClientRegistry`, an immutable `LlmBackend` config (with the API key redacted in logs), and a three-layer `LlmBackendResolver` (provider env vars → `mockserver.llmProvider`/`llmApiKey`/`llmModel`/`llmBaseUrl` → named-backends JSON via `mockserver.llmBackendsConfig`). All runtime-LLM use goes through `LlmCompletionService`, which is **off unless a backend is configured**, **fails closed** on any timeout/error/non-2xx (never flipping a deterministic result), and caches per normalised prompt for reproducibility. Ollama is the reference backend (no key, local); Bedrock builds the Anthropic-on-Bedrock request and relies on the `headers` escape hatch pending automatic SigV4 signing. See the configuration properties page and `docs/code/llm-mocking.md`.
12
13
- LLM conversation mocks can now opt into deterministic **prompt normalisation** before the `latestMessageContains` / `latestMessageMatches` predicates are evaluated, so a match is not blocked by cosmetic differences in dynamically-assembled agent prompts. A new `normalization` block on `conversationPredicates` (also exposed per-turn in the `create_llm_conversation` MCP tool and the dashboard conversation wizard) supports collapsing whitespace, lowercasing, sorting JSON object keys, dropping built-in volatile values (ISO-8601 timestamps, UUIDs, `req_`/`msg_`/`call_` ids), and dropping named JSON fields. Normalisation is pure and idempotent — it never makes a test flaky — and has no effect unless a text predicate is set. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
Copy file name to clipboardExpand all lines: docs/code/llm-mocking.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -167,6 +167,16 @@ Two MCP tools expose the LLM mocking feature to agents:
167
167
168
168
The first two validate provider availability against `ProviderCodecRegistry` at registration time. The analysis tools delegate to `org.mockserver.llm.analysis.AgentRunAnalyzer`.
169
169
170
+
## Fault / chaos injection
171
+
172
+
`LlmChaosProfile` (`org.mockserver.model`) attaches a fault profile to any `HttpLlmResponse` for resilience testing. Applied by `HttpLlmResponseActionHandler`:
173
+
174
+
-**Probabilistic error** — `chaosErrorResponseOrNull(...)` returns an error `HttpResponse` (`errorStatus` + optional `Retry-After`) when triggered. An `errorStatus` with no `errorProbability` always fires; a fractional probability draws once (reproducible via `seed`). `HttpActionHandler` checks this first and, if present, returns the error on the normal (non-streaming) path — a provider error is a plain HTTP response, not an SSE stream, even for a would-be streaming completion.
175
+
-**Mid-stream truncation** — `applyStreamingChaos(...)` keeps a leading `truncateAtFraction` of the SSE events (default 0.5) so the stream ends early.
176
+
-**Malformed SSE** — appends a deliberately broken-JSON chunk so the client must handle a corrupt event.
177
+
178
+
Truncation and malformed-SSE are fully deterministic; the error path is deterministic at probability 0.0/1.0. Each injection increments the `LLM_CHAOS_INJECTED_COUNT` metric. The profile round-trips as the top-level `chaos` field on `HttpLlmResponse` (alongside `completion`, `embedding`, and `conversationPredicates`) and is exposed per turn in the dashboard wizard and via the `chaos` MCP parameter.
179
+
170
180
## Agent-run analysis
171
181
172
182
`AgentRunAnalyzer` (`org.mockserver.llm.analysis`) is a deterministic, read-only inspector. Given the LLM requests MockServer recorded (retrieved via the normal request log), it decodes each with the provider's `ProviderCodec` and treats the **richest** conversation (most messages — the latest dialogue snapshot) as the canonical run. From that it derives:
@@ -346,3 +356,5 @@ Key source files under `mockserver/mockserver-core/src/main/java/org/mockserver/
| 6 | LLM fault/chaos profiles (429/529 + Retry-After, mid-stream truncation, malformed SSE, probabilistic error rates) |✅ Shipped — `LlmChaosProfile` on `HttpLlmResponse`, applied in `HttpLlmResponseActionHandler` (+ dispatcher); MCP `chaos` on `mock_llm_completion` and per conversation turn; dashboard wizard control; `LLM_CHAOS_INJECTED_COUNT` metric|
27
27
| 7 | VCR mode + strict mode + body redaction + field normalisation | 🟡 Partial — cassette manager shipped in U4; strict-mode, body redaction, and field normalisation still open |
Copy file name to clipboardExpand all lines: jekyll-www.mock-server.com/mock_server/ai_mcp_tools.html
+22Lines changed: 22 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -907,9 +907,29 @@ <h3>mock_llm_completion</h3>
907
907
<tr><td><code>stopReason</code></td><td>string</td><td>No</td><td>Stop reason to encode in the provider format (e.g. <code>end_turn</code>, <code>tool_use</code>, <code>stop</code>)</td></tr>
908
908
<tr><td><code>usage</code></td><td>object</td><td>No</td><td>Token usage. Accepts <code>inputTokens</code> (integer) and <code>outputTokens</code> (integer).</td></tr>
909
909
<tr><td><code>streaming</code></td><td>boolean</td><td>No</td><td>When <code>true</code>, the response is delivered as a Server-Sent Events stream. Defaults to <code>false</code>.</td></tr>
910
+
<tr><td><code>chaos</code></td><td>object</td><td>No</td><td>Optional fault/chaos profile for resilience testing (see the table below). Also accepted per turn in <ahref="#create_llm_conversation"><code>create_llm_conversation</code></a>.</td></tr>
<tr><td><code>errorStatus</code></td><td>integer</td><td>HTTP error status to return instead of a normal response (e.g. <code>429</code>, <code>529</code>). Fires every time unless <code>errorProbability</code> is set. A provider error is returned as a normal HTTP response even for a streaming completion.</td></tr>
922
+
<tr><td><code>retryAfter</code></td><td>string</td><td>Value for the <code>Retry-After</code> header on an injected error (e.g. <code>"30"</code>).</td></tr>
923
+
<tr><td><code>errorProbability</code></td><td>number</td><td>Probability 0.0–1.0 of injecting the error. <code>1.0</code> (or omitted with <code>errorStatus</code> set) always fires; <code>0.0</code> never does. Fractional values are non-deterministic unless <code>seed</code> is set.</td></tr>
924
+
<tr><td><code>truncateMode</code></td><td>string</td><td><code>NONE</code> or <code>MID_STREAM</code>. <code>MID_STREAM</code> truncates a streaming response after a leading fraction of events.</td></tr>
925
+
<tr><td><code>truncateAtFraction</code></td><td>number</td><td>Fraction 0.0–1.0 of SSE events to keep before truncating (default <code>0.5</code>).</td></tr>
926
+
<tr><td><code>malformedSse</code></td><td>boolean</td><td>Append a malformed (broken-JSON) SSE chunk so the client must handle a corrupt event.</td></tr>
927
+
<tr><td><code>seed</code></td><td>integer</td><td>Makes a fractional <code>errorProbability</code> reproducible.</td></tr>
928
+
</tbody>
929
+
</table>
930
+
931
+
<p>Chaos is deterministic for truncation, malformed SSE, and an <code>errorProbability</code> of 0.0 or 1.0 — safe for repeatable tests. Use a fractional probability (optionally with a <code>seed</code>) only when you intend flakiness.</p>
932
+
913
933
<p><strong>Example request (Anthropic text completion):</strong></p>
<p>Each turn may also carry an optional <code>chaos</code> object (a sibling of <code>match</code> and <code>response</code>) with the same fields as the <ahref="#mock_llm_completion"><code>mock_llm_completion</code></a><code>chaos</code> profile, to inject faults into that turn's response.</p>
1057
+
1036
1058
<p><strong>Example request (2-turn conversation isolated by session header):</strong></p>
0 commit comments