test(world-state): make delayed-close fork queue-cleanup wait deterministic (#24106)

AztecBot · web-flow · commit b5830eeed7fb · 2026-06-15T21:19:09.000+01:00
## Why the merge train was dequeued PR #24053 (`merge-train/spartan-v5` → `v5-next`) was dequeued from the merge queue because its `merge-queue-heavy` CI run failed. Tracing the failure: - CI3 merge_group run `27568575541` → shard `x4-full` (`ci-full-no-test-cache`) failed. - The failing test was `yarn-project/world-state/src/native/native_world_state.test.ts` (`code: 1`), specifically: > `NativeWorldState › Pending and Proven chain › does not fail when a delayed-close fork is destroyed by a reorg before its close fires` - Failing assertion (`native_world_state.test.ts:966`): ``` expect((ws as any).instance.queues.has(forkId)).toBe(false); // Expected: false, Received: true ``` The `warnSpy` assertion on the preceding line passed; only the queue-cleanup assertion failed. This test and the world-state instance/facade code it exercises are **not modified by this train** — the test already exists unchanged in `v5-next`. So this is a **pre-existing timing flake** that happened to fail in this grind and blocked the train. ## Root cause The delayed-close path is correct and always eventually cleans up: - `asyncDispose()` schedules `sleep(closeDelayMs).then(() => close())` (fire-and-forget). - `close()` → `doClose()` → `instance.call(DELETE_FORK)`, and the per-fork queue entry is deleted in `call()`'s `finally` regardless of whether the native side rejects with `Fork not found` (which it does here, because the reorg/unwind already destroyed the native fork). The test then waited a **fixed** `sleep(closeDelayMs * 3)` (3s) before asserting the queue was gone. Under a loaded merge-queue grind, the scheduled close (fired at +`closeDelayMs`) plus its async `DELETE_FORK` round-trip and `queue.stop()` can take longer than that fixed margin, so the queue entry was still present when the assertion ran — Received `true`. ## Fix Replace the racy fixed sleep with a deterministic wait on the exact asserted condition: ```ts await retryUntil(() => !(ws as any).instance.queues.has(forkId), 'delayed-close fork queue cleanup', 30, 0.1); ``` This polls until the queue entry is removed (the cleanup runs inside the same close path), with a 30s timeout. It preserves the regression intent: if cleanup genuinely stopped happening, `retryUntil` throws a clear `Timeout awaiting delayed-close fork queue cleanup` instead of a bare `toBe(false)`. The production code is unchanged; no test is skipped or weakened. The now-unused `sleep` import is swapped for `retryUntil`. ## Verification This is a test-only, single-statement timing change. I was unable to run `./bootstrap.sh ci` in this environment — the workspace has no `node_modules` and no barretenberg/native `world_state_napi` build artifacts, so exercising this native test would require a multi-hour C++ + TS build that isn't proportionate to a wait-deterministically change. The change is type-checked by inspection: `retryUntil(fn, name, timeoutSec, intervalSec)` matches the foundation signature, and the import path (`@aztec/foundation/retry`) is the one used across yarn-project. CI on this PR will validate it. --- *Created by [claudebox](https://claudebox.work/v2/sessions/6fb41d03fbbd7aaa) · group: `slackbot`*
diff --git a/yarn-project/world-state/src/native/native_world_state.test.ts b/yarn-project/world-state/src/native/native_world_state.test.ts
@@ -14,7 +14,7 @@ import { timesAsync } from '@aztec/foundation/collection';
 import { randomBytes } from '@aztec/foundation/crypto/random';
 import { Fr } from '@aztec/foundation/curves/bn254';
 import { EthAddress } from '@aztec/foundation/eth-address';
-import { sleep } from '@aztec/foundation/sleep';
+import { retryUntil } from '@aztec/foundation/retry';
 import type { SiblingPath } from '@aztec/foundation/trees';
 import { PublicDataWrite } from '@aztec/stdlib/avm';
 import { L2Block } from '@aztec/stdlib/block';
@@ -960,7 +960,10 @@ describe('NativeWorldState', () => {
       await ws.unwindBlocks(BlockNumber.fromBigInt(2n));
       await expect(delayedFork.getSiblingPath(MerkleTreeId.NULLIFIER_TREE, 0n)).rejects.toThrow('Fork not found');
 
-      await sleep(closeDelayMs * 3);
+      // The delayed close fires after closeDelayMs and then asynchronously deletes the per-fork queue entry
+      // once its DELETE_FORK round-trip settles. Under a loaded CI grind that cleanup can take longer than a
+      // fixed sleep, so wait for it deterministically instead of racing closeDelayMs * 3.
+      await retryUntil(() => !(ws as any).instance.queues.has(forkId), 'delayed-close fork queue cleanup', 30, 0.1);
 
       expect(warnSpy).not.toHaveBeenCalled();
       expect((ws as any).instance.queues.has(forkId)).toBe(false);