Skip to content

test(world-state): make delayed-close fork queue-cleanup wait deterministic#24106

Merged
PhilWindle merged 1 commit into
merge-train/spartan-v5from
cb/fix-flaky-delayed-close-fork-queue
Jun 15, 2026
Merged

test(world-state): make delayed-close fork queue-cleanup wait deterministic#24106
PhilWindle merged 1 commit into
merge-train/spartan-v5from
cb/fix-flaky-delayed-close-fork-queue

Conversation

@AztecBot

Copy link
Copy Markdown
Collaborator

Why the merge train was dequeued

PR #24053 (merge-train/spartan-v5v5-next) was dequeued from the merge queue because its merge-queue-heavy CI run failed. Tracing the failure:

  • CI3 merge_group run 27568575541 → shard x4-full (ci-full-no-test-cache) failed.
  • The failing test was yarn-project/world-state/src/native/native_world_state.test.ts (code: 1), specifically:

    NativeWorldState › Pending and Proven chain › does not fail when a delayed-close fork is destroyed by a reorg before its close fires

  • Failing assertion (native_world_state.test.ts:966):
    expect((ws as any).instance.queues.has(forkId)).toBe(false);  // Expected: false, Received: true
    
    The warnSpy assertion on the preceding line passed; only the queue-cleanup assertion failed.

This test and the world-state instance/facade code it exercises are not modified by this train — the test already exists unchanged in v5-next. So this is a pre-existing timing flake that happened to fail in this grind and blocked the train.

Root cause

The delayed-close path is correct and always eventually cleans up:

  • asyncDispose() schedules sleep(closeDelayMs).then(() => close()) (fire-and-forget).
  • close()doClose()instance.call(DELETE_FORK), and the per-fork queue entry is deleted in call()'s finally regardless of whether the native side rejects with Fork not found (which it does here, because the reorg/unwind already destroyed the native fork).

The test then waited a fixed sleep(closeDelayMs * 3) (3s) before asserting the queue was gone. Under a loaded merge-queue grind, the scheduled close (fired at +closeDelayMs) plus its async DELETE_FORK round-trip and queue.stop() can take longer than that fixed margin, so the queue entry was still present when the assertion ran — Received true.

Fix

Replace the racy fixed sleep with a deterministic wait on the exact asserted condition:

await retryUntil(() => !(ws as any).instance.queues.has(forkId), 'delayed-close fork queue cleanup', 30, 0.1);

This polls until the queue entry is removed (the cleanup runs inside the same close path), with a 30s timeout. It preserves the regression intent: if cleanup genuinely stopped happening, retryUntil throws a clear Timeout awaiting delayed-close fork queue cleanup instead of a bare toBe(false). The production code is unchanged; no test is skipped or weakened. The now-unused sleep import is swapped for retryUntil.

Verification

This is a test-only, single-statement timing change. I was unable to run ./bootstrap.sh ci in this environment — the workspace has no node_modules and no barretenberg/native world_state_napi build artifacts, so exercising this native test would require a multi-hour C++ + TS build that isn't proportionate to a wait-deterministically change. The change is type-checked by inspection: retryUntil(fn, name, timeoutSec, intervalSec) matches the foundation signature, and the import path (@aztec/foundation/retry) is the one used across yarn-project. CI on this PR will validate it.


Created by claudebox · group: slackbot

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jun 15, 2026
@AztecBot

Copy link
Copy Markdown
Collaborator Author

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/92066922fecb4633�92066922fecb46338;;�): yarn-project/kv-store/scripts/run_test.sh src/bench/sqlite-opfs-encrypted/map_bench.test.ts (2s) (code: 0)

@PhilWindle PhilWindle marked this pull request as ready for review June 15, 2026 20:19
@PhilWindle PhilWindle merged commit b5830ee into merge-train/spartan-v5 Jun 15, 2026
38 of 46 checks passed
@PhilWindle PhilWindle deleted the cb/fix-flaky-delayed-close-fork-queue branch June 15, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants