Skip to content

Commit 614576c

Browse files
tombeckenhamclaude
andauthored
test(e2e): fix flaky multimodal + approval-flow specs (#818) (#819)
The multimodal and approval-flow E2E specs intermittently failed with empty assistant responses (surfaced as `chatStream fatal`). Root cause was a test-harness race, not aimock or the library — every request the harness actually sent succeeded. Multimodal: `sendMessageWithImage` typed into a controlled React input and then attached the image, which auto-sends using that input's value. Under CPU load `pressSequentially` dropped leading characters, so the prompt reached aimock truncated (e.g. "cribe this image") and 404'd as "No fixture matched"; and React state could lag the committed DOM value so the auto-send fired with empty text and dispatched no request at all. - helpers: type until the full prompt is committed, then retry the typing + attach until the send actually fires (user bubble renders). - ChatUI: the image auto-send reads the live input DOM value instead of possibly-stale React state. Approval-flow: `runTest` treated the optimistic user-message bump as "run started", returning before any real stream activity — a stalled run then timed out waiting for an approval that never appeared. - runTest: require real stream activity (loading on, a tool call, completion, or an assistant message) before returning, and retry the click otherwise. Verified: multimodal 200/200 with 0 flaky at 20x (4 workers, retries=2) and at 6 workers/retries=0; approval-flow 450/450 at 25x/8 workers/retries=0; full E2E suite green. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent df40512 commit 614576c

3 files changed

Lines changed: 74 additions & 23 deletions

File tree

testing/e2e/src/components/ChatUI.tsx

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ export function ChatUI({
4040
}: ChatUIProps) {
4141
const [input, setInput] = useState('')
4242
const messagesRef = useRef<HTMLDivElement>(null)
43+
const inputRef = useRef<HTMLInputElement>(null)
4344

4445
useEffect(() => {
4546
if (messagesRef.current) {
@@ -203,14 +204,20 @@ export function ChatUI({
203204
className="text-xs text-gray-400"
204205
onChange={(e) => {
205206
const file = e.target.files?.[0]
206-
if (file && input.trim() && onSendMessageWithImage) {
207-
onSendMessageWithImage(input.trim(), file)
207+
// Read the prompt from the live input DOM value rather than the
208+
// `input` React state. Attaching a file auto-sends, and under
209+
// load a controlled input's state can lag the committed DOM
210+
// value — reading state here would send an empty/partial prompt.
211+
const text = (inputRef.current?.value ?? input).trim()
212+
if (file && text && onSendMessageWithImage) {
213+
onSendMessageWithImage(text, file)
208214
setInput('')
209215
}
210216
}}
211217
/>
212218
)}
213219
<input
220+
ref={inputRef}
214221
data-testid="chat-input"
215222
type="text"
216223
value={input}

testing/e2e/tests/helpers.ts

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,31 @@ export async function sendMessageWithImage(
3737
imagePath: string,
3838
) {
3939
const input = page.getByTestId('chat-input')
40-
await input.click()
41-
await input.pressSequentially(text, { delay: 30 })
42-
// Wait for React state to settle before attaching file
43-
await page.waitForTimeout(200)
44-
await page.getByTestId('image-attachment-input').setInputFiles(imagePath)
40+
const fileInput = page.getByTestId('image-attachment-input')
41+
const userMessages = page.getByTestId('user-message')
42+
43+
// Attaching the image auto-sends, using the prompt currently in the chat
44+
// input, and the matched aimock fixture keys on the exact user text. A
45+
// *controlled* React input is fragile here under CPU load (CI, parallel
46+
// workers) in two ways: typing char-by-char can drop characters, leaving a
47+
// truncated value like "cribe this image" (which 404s as "No fixture
48+
// matched" → empty `chatStream fatal`); and the attach's onChange can land
49+
// before the typed value is committed, dispatching nothing at all. So drive
50+
// the interaction to its observable outcome — the user bubble rendering —
51+
// retrying both the typing and the attach until the send actually fires with
52+
// the full prompt. A redundant re-attach is harmless: the client ignores a
53+
// second send while the first is still streaming.
54+
await expect(async () => {
55+
await input.click()
56+
await input.fill('')
57+
await input.pressSequentially(text, { delay: 15 })
58+
// Confirm the full prompt is committed before attaching.
59+
expect(await input.inputValue()).toBe(text)
60+
// Reset the selection so re-attaching the same path re-fires onChange.
61+
await fileInput.setInputFiles([])
62+
await fileInput.setInputFiles(imagePath)
63+
await expect(userMessages.first()).toBeVisible({ timeout: 2_000 })
64+
}).toPass({ timeout: 15_000, intervals: [250, 500, 1000] })
4565
}
4666

4767
export async function waitForResponse(page: Page, timeout = 15_000) {

testing/e2e/tests/tools-test/helpers.ts

Lines changed: 40 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -108,23 +108,47 @@ export async function runTest(page: Page): Promise<void> {
108108
for (let attempt = 0; attempt < 5; attempt++) {
109109
const baselineMessageCount = await readMessageCount()
110110
await page.click('#run-test-button')
111-
await page.waitForTimeout(300)
112111

113-
const started = await page.evaluate((baseline) => {
114-
const metadata = document.getElementById('test-metadata')
115-
if (metadata?.getAttribute('data-is-loading') === 'true') {
116-
return true
117-
}
118-
119-
const text =
120-
document.getElementById('messages-json-content')?.textContent || '[]'
121-
try {
122-
const parsed = JSON.parse(text)
123-
return Array.isArray(parsed) && parsed.length > baseline
124-
} catch {
125-
return false
126-
}
127-
}, baselineMessageCount)
112+
// A run "starts" only when real stream activity appears — not when the
113+
// optimistic user message lands. Clicking adds one user message
114+
// synchronously (baseline + 1); that alone must NOT count as started, or a
115+
// stalled run (the click registered but the stream produced nothing) would
116+
// be reported as started and the test would later time out waiting for an
117+
// approval / completion that never comes. Real activity is: loading turned
118+
// on, a tool call appeared, the test completed, or a *second* message (the
119+
// assistant response) was added beyond the optimistic user message. Poll
120+
// briefly so a slow-but-real run under CI load isn't mistaken for a stall.
121+
const started = await page
122+
.waitForFunction(
123+
(baseline) => {
124+
const metadata = document.getElementById('test-metadata')
125+
if (metadata?.getAttribute('data-is-loading') === 'true') return true
126+
if (
127+
parseInt(
128+
metadata?.getAttribute('data-tool-call-count') || '0',
129+
10,
130+
) > 0
131+
)
132+
return true
133+
if (metadata?.getAttribute('data-test-complete') === 'true')
134+
return true
135+
const text =
136+
document.getElementById('messages-json-content')?.textContent ||
137+
'[]'
138+
try {
139+
const parsed = JSON.parse(text)
140+
// > baseline + 1: the assistant message arrived (a real response),
141+
// not just the optimistic user message.
142+
return Array.isArray(parsed) && parsed.length > baseline + 1
143+
} catch {
144+
return false
145+
}
146+
},
147+
baselineMessageCount,
148+
{ timeout: 2000 },
149+
)
150+
.then(() => true)
151+
.catch(() => false)
128152

129153
if (started) {
130154
return

0 commit comments

Comments
 (0)