e2e-tests: Fix bulletin_fetch flake by staging the relay snapshot per-validator by BigTava · Pull Request #3249 · paritytech/smoldot

BigTava · 2026-05-08T16:33:28Z

Problem

bulletin_fetch flakes intermittently with a panic in zombienet-provider's archive.unpack().unwrap():

called `Result::unwrap()` on an `Err` value: Custom { kind: UnexpectedEof,
  error: TarError { desc: "failed to unpack `data/chains/.../db/full/000008.log`", ... } }

Reproduced 3/5 in a rerun loop on a single commit. Failure path was inside zombienet's local copy/read/extract pipeline.

Cause

alice and bob both passed the same relay tarball path to with_db_snapshot(...). zombienet-provider keys its internal extraction cache by sha256(path_string), so the two validators landed in the same cache slot and raced writing/reading the same intermediate <hash>.tgz in the namespace dir. When the read won the race against an in-progress write, tar parsing hit UnexpectedEof mid-entry.

…lake Alice and bob both passed the same relay tarball path to with_db_snapshot. zombienet-provider keys its extraction cache by sha256(path), so they raced on the same intermediate file and one validator panicked mid-extract with UnexpectedEof. Per-validator copies hash to distinct cache slots.

skunert · 2026-05-12T16:52:34Z

Thanks! Can you maybe unify this with the other prefetchers we have in the tests?

I think the proper solution would be for zombiened-sdk to handle this gracefully, it seems totally normal to have multiple nodes reference the same snapshot. Opened an issue on zombienet-sdk, in the meantime we should keep the workarounds.

cc @pepoviola

pepoviola · 2026-05-13T08:13:35Z

Hi @BigTava / @skunert, thanks for reporting this issue. Looks like in native provider we took the decision of not copy the snap per node and we can have the described race condition when multiple nodes use the same snap. I easy workaround to use until I draft a new release with this fixed is to set in the global settings spawn_concurrency = 1, so this will make the spawning logic sequential an should works as expected.

I will work in the issue and ping you for bumping the version of zombienet when is ready.

Thx!

skunert · 2026-05-13T08:21:59Z

Okay then lets use spawn_concurrency = 1 for now, thanks!

BigTava mentioned this pull request May 8, 2026

e2e-tests: Stage relay snapshot per-validator (fix bulletin_fetch flake) #3248

Closed

2 tasks

BigTava marked this pull request as draft May 8, 2026 16:42

BigTava force-pushed the tiago-fix-flaky-bulletin-e2e-test branch from bb1088c to 40b4755 Compare May 8, 2026 17:05

skunert mentioned this pull request May 12, 2026

Multiple snapshots pointing at the same URL collide during extraction paritytech/zombienet-sdk#540

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e-tests: Fix bulletin_fetch flake by staging the relay snapshot per-validator#3249

e2e-tests: Fix bulletin_fetch flake by staging the relay snapshot per-validator#3249
BigTava wants to merge 1 commit into
mainfrom
tiago-fix-flaky-bulletin-e2e-test

BigTava commented May 8, 2026 •

edited

Loading

Uh oh!

skunert commented May 12, 2026 •

edited

Loading

Uh oh!

pepoviola commented May 13, 2026

Uh oh!

skunert commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BigTava commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Cause

Uh oh!

skunert commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pepoviola commented May 13, 2026

Uh oh!

skunert commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BigTava commented May 8, 2026 •

edited

Loading

skunert commented May 12, 2026 •

edited

Loading