Skip to content

fix(fault-proof): robustness follow-ups to #865 + cost-estimator parity#894

Open
seolaoh wants to merge 4 commits into
succinctlabs:mainfrom
celo-org:seolaoh/fault-proof-robustness-followups
Open

fix(fault-proof): robustness follow-ups to #865 + cost-estimator parity#894
seolaoh wants to merge 4 commits into
succinctlabs:mainfrom
celo-org:seolaoh/fault-proof-robustness-followups

Conversation

@seolaoh
Copy link
Copy Markdown
Contributor

@seolaoh seolaoh commented Apr 29, 2026

Summary

Three independent follow-ups to #865 in the proposer, plus a cost-estimator flag that builds on #869 to align the estimator's split behavior with how the proposer actually proves.

1. fault-proof: warn-log L1 head regression in sync_state

Distinguish the two skip cases: WARN when confirmed_number moves backwards (load-balanced backend regression or deep reorg) so operators can detect unhealthy backends, DEBUG for the normal "L1 hasn't ticked" path. Pure observability; no behavior change.

2. fault-proof: reset creation guard when tracked game is pruned

When sync_games prunes future games due to abnormal cache states (backup restore into a shorter chain, deep L1 reorg, factory reset), the duplicate-creation guard could point at a game that no longer exists. Without resetting, should_create_game blocks indefinitely. Reset is gated on the guarded address being among the entries this prune actually removes — checking "absent from post-prune cache" instead would over-clear when a just-created game hasn't been added to the cache yet, allowing duplicate submission.

3. fault-proof: pre-flight on-chain status check before prove/resolve/claim

With sync_l1_confirmations > 0, the cache lags by ~sync_l1_confirmations × block_time. During that window, recently confirmed prove/resolve/claimCredit txs are invisible to the cache, so should_* flags re-fire and the proposer re-submits — wasting prover-network spend (full range + agg proof regeneration) and gas (reverted txs).

Each path now does one eth_call at latest before submission:

  • resolve_games: skip if GameStatus != IN_PROGRESS
  • claim_bonds: skip if credit(signer) == 0
  • should_skip_proving: skip if ProposalStatus is *ValidProofProvided or Resolved (covers both already-proven and timeout default-loss; Resolved is set whenever GameStatus moves out of IN_PROGRESS)

On RPC failure the check logs warn and proceeds, so transient backend issues don't block legitimate work.

4. scripts: add --no-safe-head-split to cost-estimator (follow-up to #869)

#869 fixed --batch-size precedence so the flag is honored when explicit --start/--end are given. But with SafeDB active, the splitter still cuts at span batch boundaries via split_range_based_on_safe_heads, regardless of the now-correct effective_batch_size. The proposer with RANGE_SPLIT_COUNT=1 (default) does not split that way — it produces one range proof per proposal interval. --no-safe-head-split forces split_range_basic so the estimator partitions only by --batch-size, giving a closer estimate of the per-segment cost the proposer actually incurs on the prover network.

Test plan

  • cargo check --all-targets --all-features --tests && cargo fmt --all -- --check && cargo clippy --all-features --all-targets -- -D warnings -A incomplete-features
  • Manual: cost-estimator with --no-safe-head-split produces a single batch instead of N span-batch-aligned chunks

@seolaoh seolaoh changed the title fault-proof: robustness follow-ups to #865 + cost-estimator parity fix(fault-proof): robustness follow-ups to #865 + cost-estimator parity Apr 29, 2026
seolaoh and others added 4 commits May 4, 2026 17:54
Distinguish the two skip cases: log at WARN when confirmed_number
moves backwards (load-balanced RPC backend regression or deep L1
reorg) so operators can detect unhealthy backends, and keep DEBUG
for the normal equal case where L1 simply hasn't ticked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When sync_games prunes future games (above pinned_latest_index) due to
abnormal cache states like backup restore into a shorter chain or deep
L1 reorg, the duplicate-creation guard could point at a game that no
longer exists on chain. Without resetting, should_create_game blocks
indefinitely because canonical_head_l2_block cannot advance through an
orphaned game.

Reset is gated on the guarded address being among the entries this
prune actually removes (evaluated before the removal loop). Checking
"absent from post-prune cache" would over-clear in the case where the
just-created game has not yet been added to the cache and an unrelated
prune fires, allowing should_create_game to re-submit a duplicate at
the same L2 block before the cache catches up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ve/claim

With sync_l1_confirmations > 0, the pinned cache lags behind the chain
tip by sync_l1_confirmations × block_time, so a recently confirmed
prove(), resolve(), or claimCredit() tx may not yet be reflected in
should_attempt_* flags. Without a pre-flight check, the proposer would
re-submit duplicate transactions that revert on chain — wasting gas for
resolve/claim, and re-running expensive proof generation for prove.

Each path now does one eth_call at `latest` before submission:
- resolve_games: skip if GameStatus != IN_PROGRESS
- claim_bonds: skip if credit(signer) == 0
- should_skip_proving: skip if ProposalStatus is *ValidProofProvided or
  Resolved (single check covers both already-proven and timeout
  default-loss cases since Resolved is set whenever GameStatus moves
  out of IN_PROGRESS)

On RPC failure the check logs a warn and proceeds, so transient backend
issues don't block legitimate work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When SafeDB is active, cost-estimator splits the requested range at every
span batch boundary via split_range_based_on_safe_heads, producing one
zkVM execution per span batch regardless of --batch-size. That mirrors a
hypothetical "split each proposal at span batch boundaries" workload, not
what the proposer actually does (RANGE_SPLIT_COUNT-driven arithmetic split,
default 1 = single execution per proposal interval).

The new --no-safe-head-split flag forces split_range_basic so the range
is partitioned solely by --batch-size, giving a closer estimate of the
per-segment cost the proposer sees on the prover network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant