Skip to content

Commit 51ee2b1

Browse files
authored
feat: model-aware fixture recording with date-suffix normalization (#187)
## Summary - **Record model in fixture match criteria** for all requests (not just fal.ai), preventing fixture collisions when apps make multiple LLM calls with the same user message but different models - **Date-suffix normalization** strips version dates from model names (e.g., `claude-opus-4-20250514` → `claude-opus-4`) so fixtures survive provider version bumps - **Dash+digit boundary matching** in the router: `claude-opus-4` matches `claude-opus-4-20250514` but `gpt-4` does NOT match `gpt-4o` and `gpt-4o` does NOT match `gpt-4o-mini` - **Drift detection metadata** records `systemHash` and `toolsHash` (8-char SHA-256) alongside fixtures for detecting prompt/tool changes since recording - **`recordFullModelVersion`** config option and CLI flag to disable normalization - **Documentation** updates for recording behavior, config reference, and CLI ## Why Issue #185 — when an app triggers multiple distinct LLM calls from a single user message (e.g., Opus+tools for chat, Haiku for title gen, Haiku for suggestions), recorded fixtures collide because `buildFixtureMatch()` only keyed on `userMessage`/`turnIndex`/`hasToolResult`. Adding `model` to match criteria disambiguates the calls. ## Test plan - [ ] `normalizeModelName` strips dates correctly across all provider formats - [ ] Router dash+digit boundary: `gpt-4` ✗ `gpt-4o`, `gpt-4o` ✗ `gpt-4o-mini`, `claude-opus-4` ✓ `claude-opus-4-20250514` - [ ] Recorder writes normalized model + metadata to fixture files - [ ] `recordFullModelVersion: true` preserves full model string - [ ] Old fixtures without model field still match (backwards compatible) - [ ] Integration test: 3 distinct LLM calls produce distinguishable fixtures Closes #185
2 parents c40da2d + b257210 commit 51ee2b1

12 files changed

Lines changed: 679 additions & 12 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
## [Unreleased]
44

5+
### Added
6+
7+
- **Model-aware fixture recording** — recorded fixtures now include the model name in match criteria, preventing collisions when an app makes multiple LLM calls with the same user message but different models. Model names are normalized by stripping date/version suffixes (e.g., `claude-opus-4-20250514``claude-opus-4`) so fixtures survive version bumps. Disable with `recordFullModelVersion: true`. ([#185](https://github.com/CopilotKit/aimock/issues/185))
8+
- **Drift detection metadata** — recorded fixtures include `systemHash` and `toolsHash` in a `metadata` block for detecting system prompt or tool definition changes since recording.
9+
- **Prefix model matching** — fixture router uses `startsWith` for string model matching, so `model: "claude-opus-4"` matches any `claude-opus-4-*` version.
10+
511
## [1.22.1] - 2026-05-12
612

713
### Fixed

docs/aimock-cli/index.html

Lines changed: 71 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ <h3>Config Fields</h3>
137137
LLMock configuration. Accepts <code>fixtures</code>, <code>latency</code>,
138138
<code>chunkSize</code>, <code>logLevel</code>, <code>validateOnLoad</code>,
139139
<code>metrics</code>, <code>strict</code>, <code>chaos</code>,
140-
<code>streamingProfile</code>.
140+
<code>streamingProfile</code>, <code>record</code>.
141141
</td>
142142
</tr>
143143
<tr>
@@ -151,6 +151,76 @@ <h3>Config Fields</h3>
151151
</tbody>
152152
</table>
153153

154+
<h3>Recording Config (<code>llm.record</code>)</h3>
155+
<p>
156+
When <code>llm.record</code> is present, aimock proxies unmatched requests to real
157+
providers and saves the responses as fixtures. See
158+
<a href="/record-replay">Record &amp; Replay</a> for full details.
159+
</p>
160+
161+
<table class="endpoint-table">
162+
<thead>
163+
<tr>
164+
<th>Option</th>
165+
<th>Type</th>
166+
<th>Default</th>
167+
<th>Description</th>
168+
</tr>
169+
</thead>
170+
<tbody>
171+
<tr>
172+
<td><code>providers</code></td>
173+
<td>object</td>
174+
<td>&mdash;</td>
175+
<td>
176+
Map of provider names to upstream URLs or API keys (e.g.
177+
<code>{ "openai": "sk-..." }</code>)
178+
</td>
179+
</tr>
180+
<tr>
181+
<td><code>fixturePath</code></td>
182+
<td>string</td>
183+
<td><code>./fixtures/recorded</code></td>
184+
<td>Directory where recorded fixtures are saved</td>
185+
</tr>
186+
<tr>
187+
<td><code>proxyOnly</code></td>
188+
<td>boolean</td>
189+
<td><code>false</code></td>
190+
<td>Proxy without saving fixtures to disk or caching in memory</td>
191+
</tr>
192+
<tr>
193+
<td><code>recordFullModelVersion</code></td>
194+
<td>boolean</td>
195+
<td><code>false</code></td>
196+
<td>
197+
Record the exact model string without stripping date/version suffixes. Use when
198+
tests depend on exact model version matching. See
199+
<a href="/record-replay#model-aware-recording">Model-Aware Recording</a>
200+
</td>
201+
</tr>
202+
</tbody>
203+
</table>
204+
205+
<div class="code-block">
206+
<div class="code-block-header">
207+
Recording config example <span class="lang-tag">json</span>
208+
</div>
209+
<pre><code>{
210+
<span class="prop">"llm"</span>: {
211+
<span class="prop">"fixtures"</span>: <span class="str">"./fixtures"</span>,
212+
<span class="prop">"record"</span>: {
213+
<span class="prop">"providers"</span>: {
214+
<span class="prop">"openai"</span>: <span class="str">"https://api.openai.com"</span>,
215+
<span class="prop">"anthropic"</span>: <span class="str">"https://api.anthropic.com"</span>
216+
},
217+
<span class="prop">"fixturePath"</span>: <span class="str">"./fixtures/recorded"</span>,
218+
<span class="prop">"recordFullModelVersion"</span>: <span class="kw">false</span>
219+
}
220+
}
221+
}</code></pre>
222+
</div>
223+
154224
<h2>CLI Flags</h2>
155225
<table class="endpoint-table">
156226
<thead>

docs/record-replay/index.html

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,106 @@ <h2>Fixture Auto-Generation</h2>
465465
fixture is saved to disk with a warning but not registered in memory.
466466
</p>
467467

468+
<h2 id="model-aware-recording">Model-Aware Recording</h2>
469+
<p>
470+
When recording fixtures, aimock automatically includes the model name in match criteria.
471+
This prevents collisions when your app makes multiple LLM calls with the same user message
472+
but different models (e.g., Opus for chat + Haiku for title generation).
473+
</p>
474+
<p>
475+
Model names are normalized by stripping date/version suffixes so fixtures survive provider
476+
version bumps:
477+
</p>
478+
479+
<table class="endpoint-table">
480+
<thead>
481+
<tr>
482+
<th>Request Model</th>
483+
<th>Recorded As</th>
484+
</tr>
485+
</thead>
486+
<tbody>
487+
<tr>
488+
<td><code>claude-opus-4-20250514</code></td>
489+
<td><code>claude-opus-4</code></td>
490+
</tr>
491+
<tr>
492+
<td><code>gpt-4o-2024-08-06</code></td>
493+
<td><code>gpt-4o</code></td>
494+
</tr>
495+
<tr>
496+
<td><code>claude-3-5-sonnet-20241022</code></td>
497+
<td><code>claude-3-5-sonnet</code></td>
498+
</tr>
499+
<tr>
500+
<td><code>llama3.1</code></td>
501+
<td><code>llama3.1</code> (no date suffix &mdash; unchanged)</td>
502+
</tr>
503+
</tbody>
504+
</table>
505+
506+
<p>
507+
Matching uses prefix comparison, so <code>model: "claude-opus-4"</code> in a fixture
508+
matches requests for <code>claude-opus-4-20250514</code>,
509+
<code>claude-opus-4-20250915</code>, or any future version.
510+
</p>
511+
512+
<p>
513+
To record the full model version instead (disabling normalization), set
514+
<code>recordFullModelVersion</code> to <code>true</code> in the recording config:
515+
</p>
516+
517+
<div class="code-block">
518+
<div class="code-block-header">
519+
Disable model normalization <span class="lang-tag">json</span>
520+
</div>
521+
<pre><code>{
522+
<span class="key">"llm"</span>: {
523+
<span class="key">"record"</span>: {
524+
<span class="key">"providers"</span>: { <span class="key">"openai"</span>: <span class="str">"sk-..."</span> },
525+
<span class="key">"recordFullModelVersion"</span>: <span class="kw">true</span>
526+
}
527+
}
528+
}</code></pre>
529+
</div>
530+
531+
<p>Or programmatically:</p>
532+
533+
<div class="code-block">
534+
<div class="code-block-header">Programmatic usage <span class="lang-tag">ts</span></div>
535+
<pre><code><span class="op">mock</span>.<span class="fn">enableRecording</span>({
536+
<span class="prop">providers</span>: { <span class="prop">openai</span>: <span class="str">"https://api.openai.com"</span> },
537+
<span class="prop">fixturePath</span>: <span class="str">"./fixtures/recorded"</span>,
538+
<span class="prop">recordFullModelVersion</span>: <span class="kw">true</span>,
539+
});</code></pre>
540+
</div>
541+
542+
<h2>Drift Detection Metadata</h2>
543+
<p>
544+
Recorded fixtures include a <code>metadata</code> block with hashes of the system prompt
545+
and tool definitions at recording time. These are informational only &mdash; not used for
546+
matching &mdash; and help you detect when your prompts or tools have changed since the
547+
fixture was recorded.
548+
</p>
549+
550+
<div class="code-block">
551+
<div class="code-block-header">
552+
Recorded fixture with metadata <span class="lang-tag">json</span>
553+
</div>
554+
<pre><code>{
555+
<span class="key">"match"</span>: { <span class="key">"userMessage"</span>: <span class="str">"hello"</span>, <span class="key">"model"</span>: <span class="str">"claude-opus-4"</span> },
556+
<span class="key">"metadata"</span>: { <span class="key">"systemHash"</span>: <span class="str">"a7f3c291"</span>, <span class="key">"toolsHash"</span>: <span class="str">"e4b12d08"</span> },
557+
<span class="key">"response"</span>: { <span class="key">"content"</span>: <span class="str">"Hi there!"</span> }
558+
}</code></pre>
559+
</div>
560+
561+
<p>
562+
When you re-record a fixture and the hashes differ from the previous version, it signals
563+
that your application&rsquo;s prompts or tool definitions have evolved. This is useful for
564+
auditing fixture freshness &mdash; if the hashes don&rsquo;t match, the recorded response
565+
may no longer reflect what the real provider would return for the current prompt.
566+
</p>
567+
468568
<h2 id="snapshot-style-recording">Snapshot-Style Recording</h2>
469569
<p>
470570
When the <code>X-Test-Id</code> header is present on a request, aimock uses

src/__tests__/model-utils.test.ts

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
import { describe, it, expect } from "vitest";
2+
import { normalizeModelName } from "../model-utils.js";
3+
4+
describe("normalizeModelName", () => {
5+
it("strips 8-digit date suffix", () => {
6+
expect(normalizeModelName("claude-opus-4-20250514")).toBe("claude-opus-4");
7+
expect(normalizeModelName("claude-3-5-sonnet-20241022")).toBe("claude-3-5-sonnet");
8+
expect(normalizeModelName("gpt-4o-mini-20240718")).toBe("gpt-4o-mini");
9+
});
10+
11+
it("strips YYYY-MM-DD date suffix", () => {
12+
expect(normalizeModelName("gpt-4o-2024-08-06")).toBe("gpt-4o");
13+
expect(normalizeModelName("gpt-4-turbo-2024-04-09")).toBe("gpt-4-turbo");
14+
});
15+
16+
it("strips Bedrock version suffix after date", () => {
17+
expect(normalizeModelName("anthropic.claude-3-5-sonnet-20241022-v2:0")).toBe(
18+
"anthropic.claude-3-5-sonnet",
19+
);
20+
});
21+
22+
it("leaves models without date suffix unchanged", () => {
23+
expect(normalizeModelName("gpt-4o")).toBe("gpt-4o");
24+
expect(normalizeModelName("gpt-4o-mini")).toBe("gpt-4o-mini");
25+
expect(normalizeModelName("llama3.1")).toBe("llama3.1");
26+
expect(normalizeModelName("gemini-2.5-pro")).toBe("gemini-2.5-pro");
27+
expect(normalizeModelName("fal-ai/flux/dev")).toBe("fal-ai/flux/dev");
28+
});
29+
30+
it("leaves undefined/empty unchanged", () => {
31+
expect(normalizeModelName(undefined)).toBeUndefined();
32+
expect(normalizeModelName("")).toBe("");
33+
});
34+
35+
it("respects skip flag", () => {
36+
expect(normalizeModelName("claude-opus-4-20250514", true)).toBe("claude-opus-4-20250514");
37+
});
38+
});

0 commit comments

Comments
 (0)