feat(slasher): per-slot data-withholding watcher (A-523, A-525)#23116
Draft
PhilWindle wants to merge 9 commits intomerge-train/spartanfrom
Draft
feat(slasher): per-slot data-withholding watcher (A-523, A-525)#23116PhilWindle wants to merge 9 commits intomerge-train/spartanfrom
PhilWindle wants to merge 9 commits intomerge-train/spartanfrom
Conversation
Introduces a slasher watcher that fires the data-withholding check at slot S + slashDataWithholdingToleranceSlots (default 3) for every published checkpoint, rather than waiting for L1 to prune the epoch. - New SLASH_DATA_WITHHOLDING_TOLERANCE_SLOTS config knob (slasher + network-defaults) wired through the SlasherConfig. - DATA_WITHHOLDING flips to slot-keyed in getTimeUnitForOffense; the new watcher emits epochOrSlot = checkpoint slot. - P2PClient.collectingMissingTxs anchors its tx-collection deadline to slotStart(block.slot + tolerance) so the collection effort runs to exactly the wall-clock instant the watcher renders its verdict. The ad-hoc p2pMissingTxCollectionDeadlineMs is removed. - The legacy EpochPruneWatcher stays around to emit VALID_EPOCH_PRUNED; A-525 will remove it in a follow-on commit. Ticking, restart-floor, and signature-context plumbing follow the existing Sentinel patterns.
…hPruneWatcher (A-525) The VALID_EPOCH_PRUNED offense punished committees for failing to get their epoch proven; AZIP-7 reframes this responsibility — validators are on the hook for making data available, not for ensuring proofs land. With the per-slot DataWithholdingWatcher in place, the L1-prune-driven EpochPruneWatcher is no longer needed. Removes: - OffenseType.VALID_EPOCH_PRUNED + every reference (helpers, votes, serialization, README, slashing proposer doc). - slashPrunePenalty config + SLASH_PRUNE_PENALTY env var + spartan deployment plumbing. - EpochPruneWatcher class + tests. - e2e/valid_epoch_pruned_slash.test.ts. - Legacy e2e/data_withholding_slash.test.ts (rewritten in a follow-up).
Replaces the deleted test that exercised the old L1-prune path. The new scenario: - 4 validators, all in the committee, slashSelfAllowed. - A tx is sent normally, gossiped, included in a block. - Right after mining, every node's txPool.getTxsByHash is stubbed to return undefined for the tx hash, simulating data-withholding from every honest observer's point of view. - After the tolerance window elapses, every DataWithholdingWatcher emits a slot-keyed DATA_WITHHOLDING offense for the checkpoint's attesters. - Quorum is met (slashSelfAllowed), the slash executes on L1, and the committee is kicked. The test asserts both the offense identity (slot-keyed, validators == committee) and the on-chain effect via awaitCommitteeKicked.
The parent's `work()` is already public; the override existed only to make it accessible from the test file but it added no behaviour and tripped the require-await lint rule.
The DataWithholdingWatcher floors processing at its own bootSlot to avoid acting on partial state, so the very slot a validator boots in is never inspected. With the previous test setup, the validators booted at ~slot 8 and the tx happened to be included in a checkpoint at exactly slot 8 — every validator silently skipped it, so no offense was emitted and the test timed out waiting for offenses. Advance to epoch 8 between committee formation and tx submission so the tx lands several slots past the validators' boot floor.
The PublishedCheckpoint only carries attestations from committee members who actually attested — the slot's proposer signs the proposal but is not present in the attestation list. So every DataWithholdingWatcher emits exactly committee_size - 1 offenses (one per attester), and each node's local offense store ends up with that many entries. The previous assertion of committee_size offenses could never be met, which caused awaitOffenseDetected to time out even though the watcher was firing correctly on every node.
Previous fix was numerically right (3 of 4) but for the wrong reason. The proposer DOES collect its own attestation (validator-3 was the slot 16 proposer in the failing run, and was correctly in the detected attesters list). What actually happens is the proposer publishes the checkpoint as soon as it has hit committee quorum, so any peer attestation that arrives after that point is dropped on the floor. PublishedCheckpoint.attestations therefore carries exactly `quorum` entries — for committee 4 that's 3, but in general it's `floor(committee*2/3) + 1`. Use computeQuorum from stdlib to compute the expected count rather than hard-coding `committee_size - 1`.
Per AZIP-7, every committee member who vouched for the proposal should be slashed for withholding the data — not just the subset whose attestations the proposer happened to include before publishing. The watcher now: 1. Recovers attesters from the PublishedCheckpoint on L1 (as before). 2. Also queries the p2p attestation pool via P2PApi for the same slot and proposal payload hash, picking up honest committee members whose attestations arrived after the proposer hit committee quorum and were therefore dropped on the floor. 3. Unions and dedupes by address. The p2p pool entries are pruned at finalization, but the watcher fires at slot + tolerance which is well before that, so they're still there. Wires the new P2PApi dep through aztec-node/server.ts. Updates the unit test subclass to return Promise (extractAttesters is now async) and relaxes the e2e expectation to a range [quorum, committeeSize] since the actual count depends on attestation propagation timing.
Now that the watcher unions L1 + p2p attesters, all 4 honest validators that attest should end up in the slash. Tighten the e2e assertion to match — exactly committee_size offenses, and the offended set must equal the committee. If a propagation hiccup ever drops one, that's a real issue the test should fail on rather than tolerate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Moves the data-withholding slash from the L1-prune path to a per-slot check at
slotStart(checkpoint.slot + slashDataWithholdingToleranceSlots), and removes the now-unnecessaryVALID_EPOCH_PRUNEDoffense andEpochPruneWatcher.Per AZIP-7: validators are responsible for making tx data available, not for ensuring proofs land. The new
DataWithholdingWatcherticks at quarter-eth-slot cadence and, for each published checkpoint older thandataWithholdingToleranceSlots(default 3), probes the local mempool for the txs in the checkpoint's blocks. Missing txs trigger aDATA_WITHHOLDINGslash for the validators who actually attested to that checkpoint.Highlights
DataWithholdingWatcher(yarn-project/slasher/src/watchers/data_withholding_watcher.ts) with full unit-test coverage. Sentinel-style tick + restart floor (no KV).DATA_WITHHOLDING— moved from'epoch'to'slot'ingetTimeUnitForOffense. Offense identity is now per-checkpoint, not per-epoch.P2PClient.collectingMissingTxsanchors its tx-collection deadline toslotStart(block.slot + slashDataWithholdingToleranceSlots)so the collection effort runs to exactly the wall-clock instant the watcher renders its verdict. The ad-hocp2pMissingTxCollectionDeadlineMsis removed.OffenseType.VALID_EPOCH_PRUNED,slashPrunePenaltyconfig + env var + spartan plumbing,EpochPruneWatcherclass + tests,valid_epoch_pruned_slash.test.ts.data_withholding_slash.test.ts: 4 validators, slashSelfAllowed, tx is mined normally then stubbed missing on every node, watcher fires, slash executes on-chain, committee is kicked. Asserts slot-keyed offense identity + on-chain effect.Test plan
yarn workspace @aztec/slasher test— 76 tests pass (incl. 9 newDataWithholdingWatchertests).yarn workspace @aztec/stdlib test src/slashing— 55 tests pass after the keying flip.yarn workspace @aztec/p2p test src/client/p2p_client.test.ts— 20 tests pass with the new slot-anchored deadline.yarn buildclean across the monorepo.e2e_p2p_data_withholding_slash) is in place but only runs in CI.Out of scope
L2PruneUnprovenevent/emitter is left in place; nothing subscribes to it after this PR but the event itself stays available for future observers.Closes A-523.
Closes A-525.