Skip to content

feat(slasher): per-slot data-withholding watcher (A-523, A-525)#23116

Draft
PhilWindle wants to merge 9 commits intomerge-train/spartanfrom
phil/a-523-move-data-withholding-check-to-end-of-slot-instead-of-at-l2
Draft

feat(slasher): per-slot data-withholding watcher (A-523, A-525)#23116
PhilWindle wants to merge 9 commits intomerge-train/spartanfrom
phil/a-523-move-data-withholding-check-to-end-of-slot-instead-of-at-l2

Conversation

@PhilWindle
Copy link
Copy Markdown
Collaborator

Summary

Moves the data-withholding slash from the L1-prune path to a per-slot check at slotStart(checkpoint.slot + slashDataWithholdingToleranceSlots), and removes the now-unnecessary VALID_EPOCH_PRUNED offense and EpochPruneWatcher.

Per AZIP-7: validators are responsible for making tx data available, not for ensuring proofs land. The new DataWithholdingWatcher ticks at quarter-eth-slot cadence and, for each published checkpoint older than dataWithholdingToleranceSlots (default 3), probes the local mempool for the txs in the checkpoint's blocks. Missing txs trigger a DATA_WITHHOLDING slash for the validators who actually attested to that checkpoint.

Highlights

  • New DataWithholdingWatcher (yarn-project/slasher/src/watchers/data_withholding_watcher.ts) with full unit-test coverage. Sentinel-style tick + restart floor (no KV).
  • Slot-keyed DATA_WITHHOLDING — moved from 'epoch' to 'slot' in getTimeUnitForOffense. Offense identity is now per-checkpoint, not per-epoch.
  • Single source of truth for tolerance. P2PClient.collectingMissingTxs anchors its tx-collection deadline to slotStart(block.slot + slashDataWithholdingToleranceSlots) so the collection effort runs to exactly the wall-clock instant the watcher renders its verdict. The ad-hoc p2pMissingTxCollectionDeadlineMs is removed.
  • A-525 deletions bundled in: OffenseType.VALID_EPOCH_PRUNED, slashPrunePenalty config + env var + spartan plumbing, EpochPruneWatcher class + tests, valid_epoch_pruned_slash.test.ts.
  • e2e test rewritten in data_withholding_slash.test.ts: 4 validators, slashSelfAllowed, tx is mined normally then stubbed missing on every node, watcher fires, slash executes on-chain, committee is kicked. Asserts slot-keyed offense identity + on-chain effect.

Test plan

  • yarn workspace @aztec/slasher test — 76 tests pass (incl. 9 new DataWithholdingWatcher tests).
  • yarn workspace @aztec/stdlib test src/slashing — 55 tests pass after the keying flip.
  • yarn workspace @aztec/p2p test src/client/p2p_client.test.ts — 20 tests pass with the new slot-anchored deadline.
  • yarn build clean across the monorepo.
  • The e2e (e2e_p2p_data_withholding_slash) is in place but only runs in CI.

Out of scope

  • L1 contract changes — none needed (offense type code is purely off-chain).
  • The L2PruneUnproven event/emitter is left in place; nothing subscribes to it after this PR but the event itself stays available for future observers.
  • Re-execution of checkpoints (covered by sibling A-1022).

Closes A-523.
Closes A-525.

PhilWindle added 4 commits May 8, 2026 19:34
Introduces a slasher watcher that fires the data-withholding check at
slot S + slashDataWithholdingToleranceSlots (default 3) for every
published checkpoint, rather than waiting for L1 to prune the epoch.

- New SLASH_DATA_WITHHOLDING_TOLERANCE_SLOTS config knob (slasher +
  network-defaults) wired through the SlasherConfig.
- DATA_WITHHOLDING flips to slot-keyed in getTimeUnitForOffense; the new
  watcher emits epochOrSlot = checkpoint slot.
- P2PClient.collectingMissingTxs anchors its tx-collection deadline to
  slotStart(block.slot + tolerance) so the collection effort runs to
  exactly the wall-clock instant the watcher renders its verdict. The
  ad-hoc p2pMissingTxCollectionDeadlineMs is removed.
- The legacy EpochPruneWatcher stays around to emit VALID_EPOCH_PRUNED;
  A-525 will remove it in a follow-on commit.

Ticking, restart-floor, and signature-context plumbing follow the
existing Sentinel patterns.
…hPruneWatcher (A-525)

The VALID_EPOCH_PRUNED offense punished committees for failing to get
their epoch proven; AZIP-7 reframes this responsibility — validators are
on the hook for making data available, not for ensuring proofs land.
With the per-slot DataWithholdingWatcher in place, the L1-prune-driven
EpochPruneWatcher is no longer needed.

Removes:
- OffenseType.VALID_EPOCH_PRUNED + every reference (helpers, votes,
  serialization, README, slashing proposer doc).
- slashPrunePenalty config + SLASH_PRUNE_PENALTY env var + spartan
  deployment plumbing.
- EpochPruneWatcher class + tests.
- e2e/valid_epoch_pruned_slash.test.ts.
- Legacy e2e/data_withholding_slash.test.ts (rewritten in a follow-up).
Replaces the deleted test that exercised the old L1-prune path. The new
scenario:

- 4 validators, all in the committee, slashSelfAllowed.
- A tx is sent normally, gossiped, included in a block.
- Right after mining, every node's txPool.getTxsByHash is stubbed to
  return undefined for the tx hash, simulating data-withholding from
  every honest observer's point of view.
- After the tolerance window elapses, every DataWithholdingWatcher emits
  a slot-keyed DATA_WITHHOLDING offense for the checkpoint's attesters.
- Quorum is met (slashSelfAllowed), the slash executes on L1, and the
  committee is kicked.

The test asserts both the offense identity (slot-keyed, validators ==
committee) and the on-chain effect via awaitCommitteeKicked.
The parent's `work()` is already public; the override existed only to
make it accessible from the test file but it added no behaviour and
tripped the require-await lint rule.
@PhilWindle PhilWindle added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure labels May 8, 2026
PhilWindle added 5 commits May 9, 2026 10:25
The DataWithholdingWatcher floors processing at its own bootSlot to
avoid acting on partial state, so the very slot a validator boots in is
never inspected. With the previous test setup, the validators booted at
~slot 8 and the tx happened to be included in a checkpoint at exactly
slot 8 — every validator silently skipped it, so no offense was emitted
and the test timed out waiting for offenses.

Advance to epoch 8 between committee formation and tx submission so the
tx lands several slots past the validators' boot floor.
The PublishedCheckpoint only carries attestations from committee members
who actually attested — the slot's proposer signs the proposal but is
not present in the attestation list. So every DataWithholdingWatcher
emits exactly committee_size - 1 offenses (one per attester), and each
node's local offense store ends up with that many entries.

The previous assertion of committee_size offenses could never be met,
which caused awaitOffenseDetected to time out even though the watcher
was firing correctly on every node.
Previous fix was numerically right (3 of 4) but for the wrong reason. The
proposer DOES collect its own attestation (validator-3 was the slot 16
proposer in the failing run, and was correctly in the detected attesters
list). What actually happens is the proposer publishes the checkpoint as
soon as it has hit committee quorum, so any peer attestation that
arrives after that point is dropped on the floor.

PublishedCheckpoint.attestations therefore carries exactly `quorum`
entries — for committee 4 that's 3, but in general it's
`floor(committee*2/3) + 1`. Use computeQuorum from stdlib to compute the
expected count rather than hard-coding `committee_size - 1`.
Per AZIP-7, every committee member who vouched for the proposal should
be slashed for withholding the data — not just the subset whose
attestations the proposer happened to include before publishing.

The watcher now:
1. Recovers attesters from the PublishedCheckpoint on L1 (as before).
2. Also queries the p2p attestation pool via P2PApi for the same slot
   and proposal payload hash, picking up honest committee members whose
   attestations arrived after the proposer hit committee quorum and
   were therefore dropped on the floor.
3. Unions and dedupes by address.

The p2p pool entries are pruned at finalization, but the watcher fires
at slot + tolerance which is well before that, so they're still there.

Wires the new P2PApi dep through aztec-node/server.ts. Updates the unit
test subclass to return Promise (extractAttesters is now async) and
relaxes the e2e expectation to a range [quorum, committeeSize] since
the actual count depends on attestation propagation timing.
Now that the watcher unions L1 + p2p attesters, all 4 honest validators
that attest should end up in the slash. Tighten the e2e assertion to
match — exactly committee_size offenses, and the offended set must equal
the committee. If a propagation hiccup ever drops one, that's a real
issue the test should fail on rather than tolerate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant