You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: resume pre-run replay against existing datadir (#219)
## Summary
Adds two pieces that together make pre-run replay resumable after an
instance crashes mid-replay (e.g. reth dies on payload N, you don't want
to re-execute payloads 1..N from scratch):
- **`direct` datadir method** — bypasses copy/snapshot/clone and
bind-mounts `source_dir` as-is into the container. Cleanup is a no-op so
the directory survives the run. Intended to point at a ZFS clone left
behind by `--debug.stop-after-prerun`.
- **Skip already-applied blocks during pre-run** — after RPC-ready, the
lifecycle reads the client's latest block number and passes it as
`ExecuteOptions.SkipUntilBlockNumber`. `runStepLines` drops every
leading line until it sees the first `engine_newPayload` whose
`blockNumber` exceeds that target, then resumes normal processing. Wired
into both the `--debug.stop-after-prerun` path (resume) and the main
`ExecuteTests` path (idempotent normal runs).
Resume workflow:
1. Find the ZFS clone path the prior `--debug.stop-after-prerun` run
logged (`data_mount=...` field on the final exit log).
2. Edit your config:
```yaml
datadirs:
reth:
source_dir: /path/to/benchmarkoor-clone-reth/...
method: direct
```
3. Re-run with `--debug.stop-after-prerun --limit-instance-id=reth`. The
container mounts the existing clone, the latest applied block is
detected, and pre-run resumes from the next block.
## Test plan
- [x] `go build`, `go test ./pkg/executor/... ./pkg/datadir/...
./pkg/config/...`, and `golangci-lint run
--new-from-rev="origin/master"` clean
- [x] End-to-end: trigger a mid-pre-run failure (reth crash),
reconfigure with `method: direct` pointing at the surviving clone,
re-run with `--debug.stop-after-prerun`, verify the "Resuming pre-run
replay" log fires with the expected `resumed_at_block` and the replay
continues from the next payload
## Notes
- The `runner-level` rollback strategy paths (`strategy_container.go`
ZFS-snapshot path, `strategy_checkpoint.go`) build their own
`ExecuteOptions` and call `RunPreRunSteps` directly. They don't yet
receive `SkipUntilBlockNumber` — would need `blockNum` threaded through
the strategy signatures. Out of scope here since the resume case goes
through `--debug.stop-after-prerun`.
0 commit comments