Skip to content

Commit 8b40cf2

Browse files
committed
feat(llm): declarative fault/chaos profiles for LLM responses
Add LlmChaosProfile, attachable to any HttpLlmResponse, for agent resilience testing: probabilistic provider errors (e.g. 429/529 with Retry-After), mid-stream SSE truncation, and a malformed (broken-JSON) SSE chunk. - LlmChaosProfile model (nullable fields + TruncateMode enum); carried on HttpLlmResponse, round-trips via DTO/serializer/schema. - HttpLlmResponseActionHandler.chaosErrorResponseOrNull builds the error response (status + Retry-After); applyStreamingChaos truncates the SSE event list and/or appends a malformed chunk. HttpActionHandler checks the error first and returns it on the normal (non-streaming) path even for a would-be stream — a provider error is a plain HTTP response, not SSE. - Determinism: error decision deterministic at probability 0.0/1.0 and reproducible at fractional probability via seed; truncation and malformed-SSE always deterministic. New LLM_CHAOS_INJECTED_COUNT metric (single Metrics instance reused; incremented only when chaos actually applies). - Wired through the client TurnBuilder/LlmConversationBuilder, the mock_llm_completion and per-turn create_llm_conversation MCP tools, and the dashboard conversation wizard (error status / Retry-After / probability / truncate mode + fraction / malformed SSE / seed). Docs: docs/code/llm-mocking.md (chaos section + source refs), consumer AI/MCP tools page (chaos field tables), roadmap status, changelog. Tests: 11 HttpLlmResponseActionHandlerChaosTest (error boundaries, seeded determinism, truncation fraction/default, malformed append, combine) + chaos round-trip + 2 MCP chaos tests + 5 UI codegen chaos tests. Core, netty, and UI gates green.
1 parent 9ce24e3 commit 8b40cf2

21 files changed

Lines changed: 824 additions & 7 deletions

File tree

changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77
## [Unreleased]
88

99
### Added
10+
- Added declarative **LLM fault/chaos profiles** for resilience testing, attachable to any mock LLM response (`mock_llm_completion`, each `create_llm_conversation` turn, the Java `LlmConversationBuilder`, and raw expectation JSON via a `chaos` block). Supports probabilistic provider errors (e.g. 429/529 with a `Retry-After` header), mid-stream truncation of an SSE stream (keep a leading fraction of events), and appending a malformed (broken-JSON) SSE chunk. Errors are deterministic at probability 0.0/1.0 and reproducible at fractional probabilities via a `seed`; truncation and malformed-SSE are always deterministic. A new `LLM_CHAOS_INJECTED_COUNT` metric tracks injections. The dashboard conversation wizard exposes the profile per turn. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
1011
- Added two MCP tools for **agent-run analysis and tool-call assertions**, both backed by a new deterministic `org.mockserver.llm.analysis.AgentRunAnalyzer` that reconstructs an agent run by decoding the LLM requests MockServer recorded. `verify_tool_call` asserts that an agent called a named tool a given number of times (`atLeast`/`atMost`, with an optional regex over the tool-call arguments); `explain_agent_run` summarises the run's structure (message and assistant-turn counts, the ordered tool-call sequence, tool results, and the latest message role). Read-only and offline — no LLM call. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
1112
- Added a **runtime-LLM client SPI** (`org.mockserver.llm.client`) that lets MockServer call a real LLM you already run, as the foundation for opt-in features such as drift detection and exploratory semantic matching. Mirrors the existing codec registry: an `LlmClient` per provider (Ollama, OpenAI, OpenAI Responses, Azure OpenAI, Anthropic, Gemini, Bedrock) registered in `LlmClientRegistry`, an immutable `LlmBackend` config (with the API key redacted in logs), and a three-layer `LlmBackendResolver` (provider env vars → `mockserver.llmProvider`/`llmApiKey`/`llmModel`/`llmBaseUrl` → named-backends JSON via `mockserver.llmBackendsConfig`). All runtime-LLM use goes through `LlmCompletionService`, which is **off unless a backend is configured**, **fails closed** on any timeout/error/non-2xx (never flipping a deterministic result), and caches per normalised prompt for reproducibility. Ollama is the reference backend (no key, local); Bedrock builds the Anthropic-on-Bedrock request and relies on the `headers` escape hatch pending automatic SigV4 signing. See the configuration properties page and `docs/code/llm-mocking.md`.
1213
- LLM conversation mocks can now opt into deterministic **prompt normalisation** before the `latestMessageContains` / `latestMessageMatches` predicates are evaluated, so a match is not blocked by cosmetic differences in dynamically-assembled agent prompts. A new `normalization` block on `conversationPredicates` (also exposed per-turn in the `create_llm_conversation` MCP tool and the dashboard conversation wizard) supports collapsing whitespace, lowercasing, sorting JSON object keys, dropping built-in volatile values (ISO-8601 timestamps, UUIDs, `req_`/`msg_`/`call_` ids), and dropping named JSON fields. Normalisation is pure and idempotent — it never makes a test flaky — and has no effect unless a text predicate is set. See the AI/MCP tools page and `docs/code/llm-mocking.md`.

docs/code/llm-mocking.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,16 @@ Two MCP tools expose the LLM mocking feature to agents:
167167

168168
The first two validate provider availability against `ProviderCodecRegistry` at registration time. The analysis tools delegate to `org.mockserver.llm.analysis.AgentRunAnalyzer`.
169169

170+
## Fault / chaos injection
171+
172+
`LlmChaosProfile` (`org.mockserver.model`) attaches a fault profile to any `HttpLlmResponse` for resilience testing. Applied by `HttpLlmResponseActionHandler`:
173+
174+
- **Probabilistic error**`chaosErrorResponseOrNull(...)` returns an error `HttpResponse` (`errorStatus` + optional `Retry-After`) when triggered. An `errorStatus` with no `errorProbability` always fires; a fractional probability draws once (reproducible via `seed`). `HttpActionHandler` checks this first and, if present, returns the error on the normal (non-streaming) path — a provider error is a plain HTTP response, not an SSE stream, even for a would-be streaming completion.
175+
- **Mid-stream truncation**`applyStreamingChaos(...)` keeps a leading `truncateAtFraction` of the SSE events (default 0.5) so the stream ends early.
176+
- **Malformed SSE** — appends a deliberately broken-JSON chunk so the client must handle a corrupt event.
177+
178+
Truncation and malformed-SSE are fully deterministic; the error path is deterministic at probability 0.0/1.0. Each injection increments the `LLM_CHAOS_INJECTED_COUNT` metric. The profile round-trips as the top-level `chaos` field on `HttpLlmResponse` (alongside `completion`, `embedding`, and `conversationPredicates`) and is exposed per turn in the dashboard wizard and via the `chaos` MCP parameter.
179+
170180
## Agent-run analysis
171181

172182
`AgentRunAnalyzer` (`org.mockserver.llm.analysis`) is a deterministic, read-only inspector. Given the LLM requests MockServer recorded (retrieved via the normal request log), it decodes each with the provider's `ProviderCodec` and treats the **richest** conversation (most messages — the latest dialogue snapshot) as the canonical run. From that it derives:
@@ -346,3 +356,5 @@ Key source files under `mockserver/mockserver-core/src/main/java/org/mockserver/
346356
| `llm/client/LlmCompletionService.java` | Orchestrator: off-unless-configured, fail-closed, cached |
347357
| `llm/client/LlmTransport.java` + `NettyHttpClientLlmTransport.java` | Transport seam over `NettyHttpClient` |
348358
| `llm/analysis/AgentRunAnalyzer.java` | Deterministic read-only agent-run inspection (tool-call counts, run summary) |
359+
| `model/LlmChaosProfile.java` | Fault/chaos profile carried on `HttpLlmResponse` |
360+
| `mock/action/http/HttpLlmResponseActionHandler.java` | Encodes LLM responses and applies chaos (error / truncation / malformed SSE) |

docs/plans/mockserver-llm-mocking.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The original RFC (RFC-1 LLM Response Builder + RFC-2 Stateful Scripted Conversat
2323
| # | Item | Status |
2424
|---|---|---|
2525
| 5 | Token/cost analytics + budget assertions | ✅ Shipped (U3 — token/cost rollup tile + session inspector) |
26-
| 6 | LLM fault/chaos profiles (429/529 + Retry-After, mid-stream truncation, malformed SSE, probabilistic error rates) | ❌ Not started (was U6, ~8–12 days) |
26+
| 6 | LLM fault/chaos profiles (429/529 + Retry-After, mid-stream truncation, malformed SSE, probabilistic error rates) | ✅ Shipped — `LlmChaosProfile` on `HttpLlmResponse`, applied in `HttpLlmResponseActionHandler` (+ dispatcher); MCP `chaos` on `mock_llm_completion` and per conversation turn; dashboard wizard control; `LLM_CHAOS_INJECTED_COUNT` metric |
2727
| 7 | VCR mode + strict mode + body redaction + field normalisation | 🟡 Partial — cassette manager shipped in U4; strict-mode, body redaction, and field normalisation still open |
2828

2929
### Tier 3 — valuable / specialised

jekyll-www.mock-server.com/mock_server/ai_mcp_tools.html

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -907,9 +907,29 @@ <h3>mock_llm_completion</h3>
907907
<tr><td><code>stopReason</code></td><td>string</td><td>No</td><td>Stop reason to encode in the provider format (e.g. <code>end_turn</code>, <code>tool_use</code>, <code>stop</code>)</td></tr>
908908
<tr><td><code>usage</code></td><td>object</td><td>No</td><td>Token usage. Accepts <code>inputTokens</code> (integer) and <code>outputTokens</code> (integer).</td></tr>
909909
<tr><td><code>streaming</code></td><td>boolean</td><td>No</td><td>When <code>true</code>, the response is delivered as a Server-Sent Events stream. Defaults to <code>false</code>.</td></tr>
910+
<tr><td><code>chaos</code></td><td>object</td><td>No</td><td>Optional fault/chaos profile for resilience testing (see the table below). Also accepted per turn in <a href="#create_llm_conversation"><code>create_llm_conversation</code></a>.</td></tr>
910911
</tbody>
911912
</table>
912913

914+
<p><strong><code>chaos</code> fields</strong> (all optional):</p>
915+
916+
<table>
917+
<thead>
918+
<tr><th>Field</th><th>Type</th><th>Description</th></tr>
919+
</thead>
920+
<tbody>
921+
<tr><td><code>errorStatus</code></td><td>integer</td><td>HTTP error status to return instead of a normal response (e.g. <code>429</code>, <code>529</code>). Fires every time unless <code>errorProbability</code> is set. A provider error is returned as a normal HTTP response even for a streaming completion.</td></tr>
922+
<tr><td><code>retryAfter</code></td><td>string</td><td>Value for the <code>Retry-After</code> header on an injected error (e.g. <code>"30"</code>).</td></tr>
923+
<tr><td><code>errorProbability</code></td><td>number</td><td>Probability 0.0&ndash;1.0 of injecting the error. <code>1.0</code> (or omitted with <code>errorStatus</code> set) always fires; <code>0.0</code> never does. Fractional values are non-deterministic unless <code>seed</code> is set.</td></tr>
924+
<tr><td><code>truncateMode</code></td><td>string</td><td><code>NONE</code> or <code>MID_STREAM</code>. <code>MID_STREAM</code> truncates a streaming response after a leading fraction of events.</td></tr>
925+
<tr><td><code>truncateAtFraction</code></td><td>number</td><td>Fraction 0.0&ndash;1.0 of SSE events to keep before truncating (default <code>0.5</code>).</td></tr>
926+
<tr><td><code>malformedSse</code></td><td>boolean</td><td>Append a malformed (broken-JSON) SSE chunk so the client must handle a corrupt event.</td></tr>
927+
<tr><td><code>seed</code></td><td>integer</td><td>Makes a fractional <code>errorProbability</code> reproducible.</td></tr>
928+
</tbody>
929+
</table>
930+
931+
<p>Chaos is deterministic for truncation, malformed SSE, and an <code>errorProbability</code> of 0.0 or 1.0 &mdash; safe for repeatable tests. Use a fractional probability (optionally with a <code>seed</code>) only when you intend flakiness.</p>
932+
913933
<p><strong>Example request (Anthropic text completion):</strong></p>
914934

915935
<pre class="prettyprint code"><code class="code">{
@@ -1033,6 +1053,8 @@ <h3>create_llm_conversation</h3>
10331053
</tbody>
10341054
</table>
10351055

1056+
<p>Each turn may also carry an optional <code>chaos</code> object (a sibling of <code>match</code> and <code>response</code>) with the same fields as the <a href="#mock_llm_completion"><code>mock_llm_completion</code></a> <code>chaos</code> profile, to inject faults into that turn's response.</p>
1057+
10361058
<p><strong>Example request (2-turn conversation isolated by session header):</strong></p>
10371059

10381060
<pre class="prettyprint code"><code class="code">{

mockserver-ui/src/__tests__/conversationCodegen.test.ts

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,3 +305,61 @@ describe('latestMessageMatches (regex predicate)', () => {
305305
expect(draft.turns[0]!.predicates.latestMessageMatches).toBe('weather.*paris');
306306
});
307307
});
308+
309+
describe('chaos profile', () => {
310+
function chaosDraft(): ConversationDraft {
311+
const draft = baseDraft();
312+
draft.turns[0]!.chaos = {
313+
errorStatus: 429,
314+
retryAfter: '30',
315+
errorProbability: 1.0,
316+
truncateMode: 'MID_STREAM',
317+
truncateAtFraction: 0.5,
318+
malformedSse: true,
319+
seed: 7,
320+
};
321+
return draft;
322+
}
323+
324+
it('emits withChaos in Java', () => {
325+
const java = conversationToJava(chaosDraft());
326+
expect(java).toContain('.withChaos(');
327+
expect(java).toContain('.withErrorStatus(429)');
328+
expect(java).toContain('.withTruncateMode(org.mockserver.model.LlmChaosProfile.TruncateMode.MID_STREAM)');
329+
expect(java).toContain('.withSeed(7L)');
330+
});
331+
332+
it('emits chaos object in JSON httpLlmResponse', () => {
333+
const json = JSON.parse(conversationToJson(chaosDraft()));
334+
const chaos = json[0].httpLlmResponse.chaos;
335+
expect(chaos.errorStatus).toBe(429);
336+
expect(chaos.malformedSse).toBe(true);
337+
});
338+
339+
it('emits chaos object in MCP turn', () => {
340+
const args = conversationToMcpArgs(chaosDraft());
341+
const turns = args['turns'] as Array<Record<string, unknown>>;
342+
const chaos = turns[0]!['chaos'] as Record<string, unknown>;
343+
expect(chaos['errorStatus']).toBe(429);
344+
expect(chaos['truncateMode']).toBe('MID_STREAM');
345+
});
346+
347+
it('round-trips chaos through draftFromScenarioExpectations', () => {
348+
const json = JSON.parse(conversationToJson(chaosDraft())) as Array<Record<string, unknown>>;
349+
const { draft } = draftFromScenarioExpectations(
350+
json.map((value, i) => ({ key: `k${i}`, value })),
351+
);
352+
expect(draft.turns[0]!.chaos?.errorStatus).toBe(429);
353+
expect(draft.turns[0]!.chaos?.malformedSse).toBe(true);
354+
});
355+
356+
it('omits NONE truncateMode from wire output', () => {
357+
const draft = baseDraft();
358+
draft.turns[0]!.chaos = { truncateMode: 'NONE', errorStatus: 500 };
359+
const args = conversationToMcpArgs(draft);
360+
const turns = args['turns'] as Array<Record<string, unknown>>;
361+
const chaos = turns[0]!['chaos'] as Record<string, unknown>;
362+
expect(chaos['truncateMode']).toBeUndefined();
363+
expect(chaos['errorStatus']).toBe(500);
364+
});
365+
});

mockserver-ui/src/components/ConversationWizardStep2.tsx

Lines changed: 95 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import Collapse from '@mui/material/Collapse';
1414
import AddIcon from '@mui/icons-material/Add';
1515
import DeleteIcon from '@mui/icons-material/Delete';
1616
import PredicatePills from './PredicatePills';
17-
import type { TurnDraft, TurnMatchPredicates, TurnResponse, NormalizationDraft } from '../lib/conversationCodegen';
17+
import type { TurnDraft, TurnMatchPredicates, TurnResponse, NormalizationDraft, ChaosDraft } from '../lib/conversationCodegen';
1818
import type { ToolCallDraft } from '../lib/expectationFromCapture';
1919

2020
// ---------------------------------------------------------------------------
@@ -88,6 +88,21 @@ export default function ConversationWizardStep2({ turns, onTurnsChange }: Step2P
8888
[turns, updatePredicates],
8989
);
9090

91+
const toggleChaos = useCallback(
92+
(index: number, enabled: boolean) => {
93+
updateTurn(index, { chaos: enabled ? {} : undefined });
94+
},
95+
[updateTurn],
96+
);
97+
98+
const updateChaos = useCallback(
99+
(index: number, partial: Partial<ChaosDraft>) => {
100+
const turn = turns[index]!;
101+
updateTurn(index, { chaos: { ...(turn.chaos ?? {}), ...partial } });
102+
},
103+
[turns, updateTurn],
104+
);
105+
91106
const updateToolCall = useCallback(
92107
(turnIndex: number, tcIndex: number, partial: Partial<ToolCallDraft>) => {
93108
const turn = turns[turnIndex]!;
@@ -354,6 +369,85 @@ export default function ConversationWizardStep2({ turns, onTurnsChange }: Step2P
354369
/>
355370
</Box>
356371
</Box>
372+
373+
{/* Fault / chaos injection (resilience testing) */}
374+
<FormControlLabel
375+
control={
376+
<Switch
377+
checked={turn.chaos != null}
378+
onChange={(e) => toggleChaos(i, e.target.checked)}
379+
size="small"
380+
/>
381+
}
382+
label="Inject fault / chaos"
383+
sx={{ '& .MuiFormControlLabel-label': { fontSize: '0.75rem' }, mt: 0.5 }}
384+
/>
385+
<Collapse in={turn.chaos != null} unmountOnExit>
386+
<Box sx={{ pl: 1.5, mb: 1, display: 'flex', flexWrap: 'wrap', gap: 1, alignItems: 'center' }}>
387+
<TextField
388+
label="Error status"
389+
size="small"
390+
type="number"
391+
value={turn.chaos?.errorStatus ?? ''}
392+
onChange={(e) => updateChaos(i, { errorStatus: e.target.value === '' ? undefined : parseInt(e.target.value, 10) })}
393+
sx={{ width: 110 }}
394+
/>
395+
<TextField
396+
label="Retry-After"
397+
size="small"
398+
value={turn.chaos?.retryAfter ?? ''}
399+
onChange={(e) => updateChaos(i, { retryAfter: e.target.value || undefined })}
400+
sx={{ width: 110 }}
401+
/>
402+
<TextField
403+
label="Error prob (0-1)"
404+
size="small"
405+
type="number"
406+
value={turn.chaos?.errorProbability ?? ''}
407+
onChange={(e) => updateChaos(i, { errorProbability: e.target.value === '' ? undefined : parseFloat(e.target.value) })}
408+
sx={{ width: 130 }}
409+
/>
410+
<TextField
411+
label="Truncate"
412+
size="small"
413+
select
414+
value={turn.chaos?.truncateMode ?? 'NONE'}
415+
onChange={(e) => updateChaos(i, { truncateMode: e.target.value as ChaosDraft['truncateMode'] })}
416+
sx={{ width: 130 }}
417+
>
418+
<MenuItem value="NONE">None</MenuItem>
419+
<MenuItem value="MID_STREAM">Mid-stream</MenuItem>
420+
</TextField>
421+
<TextField
422+
label="Truncate frac"
423+
size="small"
424+
type="number"
425+
value={turn.chaos?.truncateAtFraction ?? ''}
426+
onChange={(e) => updateChaos(i, { truncateAtFraction: e.target.value === '' ? undefined : parseFloat(e.target.value) })}
427+
sx={{ width: 120 }}
428+
/>
429+
<FormControlLabel
430+
control={
431+
<Checkbox
432+
size="small"
433+
checked={turn.chaos?.malformedSse === true}
434+
onChange={(e) => updateChaos(i, { malformedSse: e.target.checked })}
435+
/>
436+
}
437+
label="Malformed SSE"
438+
sx={{ '& .MuiFormControlLabel-label': { fontSize: '0.75rem' } }}
439+
/>
440+
<TextField
441+
label="Seed"
442+
size="small"
443+
type="number"
444+
value={turn.chaos?.seed ?? ''}
445+
onChange={(e) => updateChaos(i, { seed: e.target.value === '' ? undefined : parseInt(e.target.value, 10) })}
446+
sx={{ width: 100 }}
447+
helperText="reproducible prob"
448+
/>
449+
</Box>
450+
</Collapse>
357451
</CardContent>
358452
</Card>
359453
))}

0 commit comments

Comments
 (0)