Commit b5830ee
authored
test(world-state): make delayed-close fork queue-cleanup wait deterministic (#24106)
## Why the merge train was dequeued
PR #24053 (`merge-train/spartan-v5` → `v5-next`) was dequeued from the
merge queue because its `merge-queue-heavy` CI run failed. Tracing the
failure:
- CI3 merge_group run `27568575541` → shard `x4-full`
(`ci-full-no-test-cache`) failed.
- The failing test was
`yarn-project/world-state/src/native/native_world_state.test.ts` (`code:
1`), specifically:
> `NativeWorldState › Pending and Proven chain › does not fail when a
delayed-close fork is destroyed by a reorg before its close fires`
- Failing assertion (`native_world_state.test.ts:966`):
```
expect((ws as any).instance.queues.has(forkId)).toBe(false); //
Expected: false, Received: true
```
The `warnSpy` assertion on the preceding line passed; only the
queue-cleanup assertion failed.
This test and the world-state instance/facade code it exercises are
**not modified by this train** — the test already exists unchanged in
`v5-next`. So this is a **pre-existing timing flake** that happened to
fail in this grind and blocked the train.
## Root cause
The delayed-close path is correct and always eventually cleans up:
- `asyncDispose()` schedules `sleep(closeDelayMs).then(() => close())`
(fire-and-forget).
- `close()` → `doClose()` → `instance.call(DELETE_FORK)`, and the
per-fork queue entry is deleted in `call()`'s `finally` regardless of
whether the native side rejects with `Fork not found` (which it does
here, because the reorg/unwind already destroyed the native fork).
The test then waited a **fixed** `sleep(closeDelayMs * 3)` (3s) before
asserting the queue was gone. Under a loaded merge-queue grind, the
scheduled close (fired at +`closeDelayMs`) plus its async `DELETE_FORK`
round-trip and `queue.stop()` can take longer than that fixed margin, so
the queue entry was still present when the assertion ran — Received
`true`.
## Fix
Replace the racy fixed sleep with a deterministic wait on the exact
asserted condition:
```ts
await retryUntil(() => !(ws as any).instance.queues.has(forkId), 'delayed-close fork queue cleanup', 30, 0.1);
```
This polls until the queue entry is removed (the cleanup runs inside the
same close path), with a 30s timeout. It preserves the regression
intent: if cleanup genuinely stopped happening, `retryUntil` throws a
clear `Timeout awaiting delayed-close fork queue cleanup` instead of a
bare `toBe(false)`. The production code is unchanged; no test is skipped
or weakened. The now-unused `sleep` import is swapped for `retryUntil`.
## Verification
This is a test-only, single-statement timing change. I was unable to run
`./bootstrap.sh ci` in this environment — the workspace has no
`node_modules` and no barretenberg/native `world_state_napi` build
artifacts, so exercising this native test would require a multi-hour C++
+ TS build that isn't proportionate to a wait-deterministically change.
The change is type-checked by inspection: `retryUntil(fn, name,
timeoutSec, intervalSec)` matches the foundation signature, and the
import path (`@aztec/foundation/retry`) is the one used across
yarn-project. CI on this PR will validate it.
---
*Created by
[claudebox](https://claudebox.work/v2/sessions/6fb41d03fbbd7aaa) ·
group: `slackbot`*1 parent 092c30f commit b5830ee
1 file changed
Lines changed: 5 additions & 2 deletions
Lines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
960 | 960 | | |
961 | 961 | | |
962 | 962 | | |
963 | | - | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
964 | 967 | | |
965 | 968 | | |
966 | 969 | | |
| |||
0 commit comments