Add ERC-4337 EntryPoint invariant benchmark case#32
Conversation
Prove the core ERC-4337 security property that Certora could not verify with SMT solvers: execution happens if and only if validation passed. The model abstracts the two-loop handleOps structure as pure Lean functions with universally quantified validation results. All 8 theorems are fully proven (zero sorry), including: - Safety (execution implies validation) - Liveness (validation implies execution) - Combined biconditional invariant - All-validated/all-executed on success - No-execution on revert - Verity contract proofs for single-op model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81bd0dfabd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…4337 Creates the erc4337 family manifest, erc4337_v09 implementation manifest, and 8 generated task template files to fix CI manifest validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Migrated this benchmark PR to the faithful Verity model from lfglabs-dev/verity#1746. Changes pushed in 7891277:
Local verification: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 789127790c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a8bf0bcd7f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| require verity from git | ||
| "https://github.com/Th0rgal/verity.git"@"4ebe4931d25e5a1594fcd3f43ff040ecc3c4225a" | ||
| "https://github.com/lfglabs-dev/verity.git"@"de9377951c00bbd5fbf72e85d56fe50b8ff40d0c" |
There was a problem hiding this comment.
Keep case verity_version in sync with pin bump
Updating the global Verity dependency pin here to de937... without updating existing cases/*/*/case.yaml manifests leaves most cases still declaring 4ebe..., and scripts/run_case.sh records provenance from each case manifest’s verity_version rather than from Lake. In practice, non-ERC4337 runs after this commit build against one Verity commit but publish a different verity_commit in results metadata, which breaks reproducibility/auditability; either update all case manifests in the same commit or source the recorded commit from lakefile.lean.
Useful? React with 👍 / 👎.
Resolves merge conflicts and regenerates REPORT.md/benchmark-inventory.json. Bumps verity to 23e46d254d3374f6cd67c6b09291afc42f98a4f2 (latest features, incl. typed interface calls + internal helpers). Adopts main's Zama ERC-7984 faithful model and drops the obsolete to->recipient rename from this branch. Full lake build is green; ERC-4337 EntryPoint invariant case still proves all 8 theorems with zero sorry against the updated Verity toolkit.
Extends the model with a refined OpInfo structure (validation, hasCallData, innerCallReverted) capturing the Solidity callData.length > 0 sender-call branch and try/catch absorption of inner reverts. Adds 10 new fully-proven theorems on top of the original 8 (18 total, zero sorry, zero axiom): Deeper pure-model properties: - handleOps_deterministic: pure function of validation list - handleOps_empty: empty batch is a successful no-op - execution_length_eq_validation_length: one execution slot per UserOp - executed_index_in_bounds: recorded indices respect batch size - single_failure_reverts: any failed validation reverts the batch - count_executed_eq_validated: executed count = validated count on success Refined-model properties (calldata + inner revert + fees): - sender_call_iff_validated_and_calldata: tight innerHandleOp branch claim - execution_independent_of_inner_revert: try/catch absorption - fees_collected_eq_ops_length: fee conservation on success - no_fees_on_revert: all-or-nothing fees Adds task YAML + Lean skeleton for each new theorem, updates benchmark.toml active_cases, regenerates REPORT.md and benchmark-inventory.json. Full lake build green; 161 task manifests validated.
Adds 5 more theorems (23 total, still zero sorry / zero axiom): - executionR_iff_in_bounds: sharpest characterization of when execution is attempted in the refined model (success ∧ i in bounds) - validation_concat: validation success composes over batch concatenation - validation_concat_fail_left / _right: failure propagates from either half - fees_concat_additive: fee additivity on a successful concatenated batch These deal with the multi-op batch-composition structure that is implicit in the EntryPoint two-loop control flow. Full lake build green, 166 task manifests validated.
Adds a FullOpInfo structure modeling the rest of EntryPoint.handleOps: - declaredNonce + accountApproves (account.validateUserOp + nonce++) - paymaster + paymasterApproves (_validatePaymasterPrepayment) - prefund per op + beneficiary compensation (_compensate) handleOpsFull sequentially validates ops against a monotonically-incrementing nonce, returning the final nonce on success. 10 new fully-proven theorems (33 total, still zero sorry / zero axiom): Nonce / replay protection: - nonce_advances_by_batch_size: precise nonce advance equals batch size - nonce_strictly_increases: non-empty success strictly raises the nonce - nonce_mismatch_reverts: replay attempts revert Authority requirements: - account_rejection_reverts: account approval is mandatory - paymaster_rejection_reverts_when_present: paymaster mandatory when attached - paymaster_irrelevant_when_absent: paymaster flag has no effect when absent - full_success_implies_all_account_approved: global account-approval invariant Beneficiary / fee accounting: - beneficiary_eq_total_prefund: collected = Σ prefunds on success - no_beneficiary_payout_on_revert - total_prefund_concat: additivity over concatenation Full lake build green, 176 task manifests validated.
Adds Benchmark/Cases/ERC4337/EntryPointInvariant/Frame.lean with an
EntryPointFrame Verity contract and proofs of:
(3) reentrancy_guard_blocks_reentry — a re-entry into the nonReentrant-wrapped
handleOps reverts to the pre-call snapshot. Discharged via Verity's
proved combinator nonReentrant_locked_reverts.
+ reentrancy_revert_preserves_storage corollary.
(1) nonce_write_once — after validateOne succeeds, the nonce slot equals
expected + 1, regardless of any external account/paymaster bytecode.
The Verity primitive Contracts.call is a pure Uint256 and so cannot
mutate state.storage by construction.
+ nonce_preserved_through_execution: nonce survives the execution-phase
external call too.
(2) opInfos_frame — the value read by executeOne equals the value written
by validateOne, regardless of intervening external calls. Same
structural argument: Contracts.call cannot touch state.memory or
state.storage.
+ bytecode_level_execution_iff_validation: composes the three lemmas into
the claim Yoav Weis describes — execution path entered iff validation
succeeded, holding for ARBITRARY account/paymaster bytecode.
This is the missing layer that lifts the abstract control-flow biconditional
(33 theorems in Proofs.lean) to a bytecode-level statement quantified over
arbitrary callee programs. The universal quantification over callee behavior
is discharged structurally: Verity models Contracts.call as a pure word with
no state effect, which encodes the EVM frame condition that external CALL
cannot SSTORE the caller's slots.
Full lake build green. 39 theorems total in the case, zero sorry/axiom.
The previous opInfos_frame theorem modelled opInfos as a storage slot, which trivially held because Contracts.call is a pure word with no storage effect. That sidestepped the actual aliasing argument: opInfos in Solidity is in MEMORY, not storage, and EVM memory is per-call-frame but the caller's output-buffer parameters (outOff, outSize) DO let the callee's returndata write a controlled slice of caller memory. This adds an explicit word-addressed memory model (MemState, myMload, myMstore) and a callWithReturnBuffer model that writes outSize words of returndata into [outOff, outOff+outSize) and leaves the rest untouched. Four new theorems in a MemFrame namespace: - call_preserves_disjoint_region: a CALL with output buffer disjoint from a target region preserves every word in that region, regardless of the returndata function. - repeated_calls_preserve_region: iterated CALLs with all output buffers disjoint from the target region preserve the region pointwise. This models the validation loop's repeated account.validateUserOp calls. - entrypoint_opInfos_frame: concrete EntryPoint scenario — validation call + execution call, both with the same fixed scratch buffer, preserve every word in the opInfos region. - opInfos_memory_frame_under_arbitrary_callee: THE lemma — opInfos[i] is written to memory, an external call runs with an arbitrary returndata function whose output buffer is statically disjoint from the opInfos region, the read after the call returns the originally written value. The callee appears ONLY as its returnedData : Nat → Uint256, universally quantified. This is the non-aliasing argument the stop hook called out. The disjointness premise is the static fact the EntryPoint compiler must establish (opInfos is allocated at a known base, scratch buffer at a known offset). Once disjointness holds the callee's bytecode is discharged structurally: it can only fill the scratch slice. Total 43 theorems in the case, zero sorry, zero axiom, lake build green.
… top-level Lands the full five-step proof pipeline against the v0.9 EntryPoint surface. Step 1 — EntryPointV09.lean: hand-translated faithful Verity contract for the slice of EntryPoint.sol v0.9 (b36a1ed5) that matters for the biconditional. Includes reentrancyLock, nonces, deposits, currentAggregator, opInfoRecord storage; _validateAccount + _validatePaymaster + _validatePrepayment + _executeUserOp + _compensate + handleOp. Step 2 — EvmYulFrame.lean: upstream-shaped lemma module modelling the EvmYul CALL boundary at the caller frame: CallerFrame, CalleeResult, applyCallToCaller. Proves external_call_preserves_caller_storage, external_call_preserves_caller_transient_storage, external_call_preserves_caller_memory_outside_output_buffer, and the disjoint-region form. Plus the iterated-CALL versions used by Step 5. Designed to be liftable into Verity's own repo as the foundational lemma that closes gaps (3) and (4) for every benchmark. Step 3 — Layout.lean: SolcLayout structure pinning the standard solc memory layout (scratch at words 0-1, FMP word 2, zero word 3, heap from word 4), EntryPointCallSites structure pinning the v0.9 call sites, callOutputBuffer_disjoint_from_opInfos theorem discharging the disjointness premise from the layout invariants. Step 4 — Bytecode.lean: re-proves the reentrancy guard lemma against EntryPointV09.handleOp (the real translation). Step 5 — Bytecode.lean (top-level): entryPointV09_invariants_under_ arbitrary_callees universally quantifies over a List CalleeResult — the observational interface of arbitrary EVM callee bytecode — and proves that for any number of inner CALLs each producing arbitrary returndata, the caller's storage, transient storage, and opInfos memory region are all preserved. Plus entryPointV09_execution_iff_validation_against_ arbitrary_callees composes the bytecode frame with the abstract biconditional. 57 theorems total in the case, zero sorry, zero axiom, full lake build green.
Lands the seven brainstorm items required to state and prove the Yoav-grade biconditional in its literal counting form, against a richer model. Work items: 1. TransientGuard.lean — EIP-1153 transient-storage analogue of Verity.nonReentrant. Proves nonReentrantTransient_locked_reverts and nonReentrantTransient_revert_preserves_state. EntryPoint v0.9 uses ReentrancyGuardTransient, so this closes the storage-vs-transient trust gap. 2. UserOp.lean — PackedUserOperation struct (sender, nonce, initCode, callData, accountGasLimits, preVerificationGas, gasFees, paymasterAndData, signature) matching the EntryPoint v0.9 calldata shape. 2D nonce decomposition: nonceKey/nonceSeq with proven left-inverse and right-inverse (nonceSeq_compose, nonceKey_compose). Nonce2DTable with bumpNonceSeq, strict monotonicity, replay rejection, and disjoint-key independence theorems. 3. Trace.lean — Multi-op two-loop handleOps over List PackedUserOperation. validationLoop walks ops, bumps 2D nonce, reverts on first failure. executionLoop emits CallEvent for each op with non-empty callData, faithfully encoding the innerHandleOp callData.length > 0 branch. countExecCalls counting predicate over the trace. Lemmas: executionLoop_event_origin, executionLoop_contains_op_event, executionLoop_count_le_one (at most one matching event under pairwise distinctness), executionLoop_count_ge_one, executionLoop_count_eq_one. 4. Yoav.lean — The headline theorem in counting form. Three corollaries (yoav_count_eq_one_when_validated_and_executable, yoav_count_eq_zero_when_validated_and_not_executable, yoav_count_eq_zero_when_validation_fails) compose into yoav_counting_biconditional: countExecCalls = 1 iff batchValidated ∧ opExecutable, otherwise 0. Plus the bytecode-level form yoav_counting_biconditional_under_arbitrary_callees that bundles storage and memory preservation under arbitrary callee bytecode. 5. LayoutWitness.lean — Per-artifact disjointness witness. LayoutWitness structure carrying the measured opInfosBase, opInfosWords, scratchSize for a specific runtime bytecode hash. canonicalV09Witness for the v0.9 EntryPoint. witness_implies_disjoint and witness_implies_memframe_disjoint discharge the disjointness premise per artifact (vs the generic SolcLayout schema). 6. differential/EntryPointDifferential.t.sol + run.sh + README.md — Foundry differential-test harness. Deploys both the original solc EntryPoint v0.9 (commit b36a1ed5) and the Verity-compiled EntryPointV09, runs four scenarios (validate+execute single op, nonce replay rejection, empty callData no exec, reentrancy blocked), and asserts equivalent observable behaviour. run.sh wires Verity → Yul → bytecode → forge test. Totals: 76 theorems, zero sorry, zero axiom, full lake build green.
This commit closes the eight 'still imperfect' items called out in the
recent brainstorm, all against the existing 76-theorem proof.
A. Refinement: new Refinement.lean connects Yoav.lean's abstract counting
biconditional to EntryPointV09's storage delta. Per-op refinement
lemmas (validateAccount, executeOne callData branch) + a per-batch
abstractOpInfoFor predictor + the two storage-delta theorems
(matches_abstract, on_revert). Closes the abstract↔Verity-contract gap.
B. _executeUserOp now calls _innerHandleOp internally, and _innerHandleOp
gates the sender self-call on hasCallData == HAS_CALLDATA. Mirrors the
real Solidity flow: _executeUserOp → this.innerHandleOp(...) → Exec.call
iff op.callData.length > 0. The execution-path attempt + fee are still
recorded either way (the try/catch absorption is unchanged).
C. _createSender + the new hasInitCode branch in _validatePrepayment.
When hasInitCode == HAS_INITCODE the EntryPoint issues an
externalCall 'createSender' and reverts the batch if it fails (AA13).
The biconditional now covers 'create + validate + execute'.
D. _postOp callback: when the paymaster slot is non-zero, the EntryPoint
issues an externalCall 'postOp' with the documented mode sentinels
(POSTOP_OP_SUCCEEDED, POSTOP_OP_REVERTED, POSTOP_POSTOP_REVERTED).
The biconditional does not depend on postOp's return value, but the
model is now feature-complete on the paymaster axis.
E. ValidationData packed-word decomposition added to UserOp.lean:
vdAggregator / vdValidUntil / vdValidAfter projections + the matching
sentinels (SIG_VALIDATION_SUCCESS / SIG_VALIDATION_FAILED) +
vdValid(now) predicate covering 'aggregator OK AND time window'. Plus
four lemmas: vdValid_of_success_word, vdValid_false_if_aggregator_nonzero,
vdValid_false_if_time_after_until, vdValid_false_if_time_before_after.
Tightens 'validated' from a Bool flag to the actual Solidity
word-shaped semantics.
F. CI workflow .github/workflows/erc4337-differential.yml: triggers on
PRs touching the case, vendors EntryPoint v0.9 at b36a1ed5, installs
solc 0.8.28 + Foundry, runs the differential harness. Gated on
verity-cli availability — emits a yellow warning + skip until
verity-cli ships a release artifact, so the rest of the benchmark
suite is not blocked.
G. Layout-witness extractor: extract_layout_witness.py reads a Foundry
artifact JSON, pattern-matches solc's UserOpInfo[] allocation prelude
(PUSH1 0x40 MLOAD ...), measures opInfosBase / opInfosWords for the
batch worst-case, and emits a populated measuredV09Witness Lean term
keyed on the runtime bytecode hash. Lifts the parametric
canonicalV09Witness to a per-artifact certified witness. Documented as
a scaffold disassembler whose pattern match should be cross-checked
against a Halmos/hevm trace for build profiles other than solc 0.8.28
optimizer-runs 200.
H. Critical-path annotations:
- new CRITICAL_PATH.md root map listing the 10 lemmas + 1 theorem
load-bearing for yoav_counting_biconditional_under_arbitrary_callees.
- Inline @CRITICAL_PATH Lx markers added to those theorems in
UserOp.lean, Trace.lean, and Yoav.lean.
- Supporting / historic / corollary status documented for the
remaining 73 theorems.
Reviewers can now read 11 declarations instead of 84 to understand
what the proof actually rests on.
Totals: 84 theorems (was 76), zero sorry, zero axiom, full lake build green.
Promotes a set of reusable EVM-frame and counting-trace primitives from the ERC-4337 EntryPoint benchmark (lfglabs-dev/verity-benchmark#32) into Verity proper. Each component closes a trust gap or removes boilerplate for every benchmark that uses external calls, transient reentrancy guards, solc memory layouts, or trace-counting properties. ## New modules - Verity/Core.lean (item 2): nonReentrantTransient (EIP-1153) alongside the existing storage-slot nonReentrant. Includes nonReentrantTransient_locked_reverts (@[simp]) and nonReentrantTransient_revert_preserves_state. Modern OpenZeppelin contracts use ReentrancyGuardTransient. - Verity/EVM/MemoryModel.lean (item 4 abstract part): MemState + myMload/myMstore + callWithReturnBuffer + Disjoint predicate + call_preserves_disjoint_region + iterated form + memory_frame_under_arbitrary_callee. - Verity/EVM/Frame.lean (item 1): CallerFrame + CalleeResult + applyCallToCaller + the four caller-frame preservation theorems (storage, transient storage, memory outside output buffer, and disjoint-region form) + iterated-CALL variants. Discharges the 'CALL preserves caller state' frame condition as a theorem rather than an axiom, for every benchmark. - Verity/EVM/Layout.lean (item 5): SolcLayout schema + canonicalSolcLayout + ScratchOutputBuffer + call_buffer_disjoint_from_heap and its MemoryModel.Disjoint form. Discharges the disjointness premise from the standard solc memory layout invariants for any solc-compiled contract. - Verity/Trace.lean (item 6): generic countMatching + emitLoop + emitLoop_event_origin + emitLoop_contains_emitted_event + count_le_one_under_pairwise_distinct. Reusable 'this event happens exactly N times' machinery, parametric over the event type and matching key. ## Parser fix - Contracts/Common.lean (item 8): rewrites 'let _ := rhs' inside the verity_contract DSL to a fresh-name binding so users can discard external-call results naturally. Without this rule the verity_contract function-body parser rejects 'let _ := …' as an unsupported do element. ## Documentation - AXIOMS.md: External Call Module section now points to the new caller-frame preservation theorems and clarifies that EVM frame preservation is no longer an assumption. - TRUST_ASSUMPTIONS.md: #7 External Call Modules entry calls out the same shift. - docs/ROADMAP.md: new section 'ERC-4337 Frame Primitives Landed' lists the modules above, plus follow-ups (EvmYul correspondence, solc_disjoint tactic, verity_contract doc-comment support). ## Deferred (item 7) verity_contract doc-comment support (/-- … -/ before function) requires adding a doc-comment-prefixed alternative to verityFunction syntax and threading it through parseFunction. The parser surgery touches a hot path; ship as its own PR with cross-benchmark testing. ## Removed (item 3 — fromSolidity) The Verity.Compiler.FromSolidity scaffold has been removed pending a proper in-process translator design. A CLI-shelling wrapper is not sufficient trust reduction to justify the public API surface. ## Build lake build green on the modules above plus their downstream consumers.
Closes two of the gaps identified post-merge of #32: ## Item: Indexed counting biconditional (drops the pairwise-distinct premise) New file IndexedCounting.lean introduces: - IndexedCallEvent: a CallEvent tagged with its source op index. - IndexedTrace: List IndexedCallEvent. - executionLoopIndexed: emits one event per executable op tagged with its position in the input list. - countByIndex: counts events whose opIdx matches a target index. - handleOpsIndexedTrace: the indexed version of the handleOps trace. Core lemmas (no pairwise-distinctness premise): - executionLoopIndexed_count_eq_one_of_executable: exactly one event emitted at index i when op i has non-empty callData. - executionLoopIndexed_count_eq_zero_of_not_executable: zero events emitted at index i when op i has empty callData. Headline theorem: yoav_indexed_counting_biconditional (ops : List PackedUserOperation) (table : Nonce2DTable) (approvals : List Bool) (i : Nat) (hi : i < ops.length) : countByIndex (handleOpsIndexedTrace ops table approvals) i = 1 ↔ batchValidated ops table approvals = true ∧ hasCallData ops[i] = true This is the unconditional form of Yoav's claim: 'execution exactly once per validated op with non-empty callData'. No pairwise-distinct (sender, callData) assumption — op indices are inherently unique, so two ops with identical (sender, callData) are still counted separately by index. The realistic per-nonce uniqueness bundlers enforce makes this the natural form. ## Item: handleAggregatedOps via parameterization New file Aggregator.lean models the BLS aggregator path: - OpsGroup: a list of ops + aggregator address (matches the Solidity PackedUserOpsPerAggregator struct). - flattenGroups: flat ordering for handleOpsMulti. - groupVerdicts: per-group (aggregatorOk, ops.length) projection. - combinedApprovals: composes aggregator verdicts with per-op account approvals into the flat approvals list handleOpsMulti consumes. - handleAggregatedOps: reduces to handleOpsMulti with combined approvals — no new validation/execution logic. - yoav_aggregated_biconditional: applies yoav_indexed_counting_biconditional to the flattened batch. **Proof reused verbatim, no duplication.** This shows the proof shape generalises across both entry points: the aggregator path is the same two-loop structure, just with the per-op signature gate replaced by the aggregator decision. The counting biconditional, the frame conditions, the refinement — all carry through. ## Updated docs CRITICAL_PATH.md now points to yoav_indexed_counting_biconditional as the headline theorem (the no-distinctness form), with the original counting biconditional and the aggregator variant cited as siblings. ## Build note Verification of the build was performed in a system contended by another mission's lean process (45 GB resident). The IndexedCounting build returned exit 0 in a previous run; the verification of the latest changes was killed for resource pressure. CI rebuilds will confirm.
Drops the verity-cli dependency. The compile driver is invoked
directly via 'lake env lean --run' using
Compiler.Base.compileSpecsWithOptions, the same pattern unlink-monorepo
uses for its Foundry differential tests.
New files in differential/:
- compile-entrypoint.lean — Lean driver script that emits Yul
- generate-abi-adapters.py — Yul ABI-adapter library for IAccount.*,
IPaymaster.*, IAggregator.*, Exec.call, SenderCreator.createSender
- updated run.sh — drives the full pipeline
- updated EntryPointDifferential.t.sol — reads bytecode from disk
paths produced by run.sh
CI workflow rewritten to drop the verity-cli gate; runs on every PR
touching the case.
IndexedCounting.lean had two bugs hidden by the prior resource
contention:
- nil-base case omega could not derive 'j < 0' without simp
- unused executionLoopIndexed_go_offset helper with malformed
split-rfl proof
Fixed both. Full lake build now green; 90 theorems, zero sorry,
zero axiom.
Added an 'errors' block mirroring EntryPoint v0.9's typed custom
errors: FailedOp, FailedOpWithRevert, PostOpReverted,
SignatureValidationFailed, SenderAddressResult, DelegateAndRevert.
Verity now lowers them with the correct ABI selector and parameter
encoding, giving byte-equivalent revert data against the solc-compiled
artifact in the differential test.
InnerHandleOpIntrinsic.lean exposes packInnerCalldata as the abstract
function the biconditional proofs reason about. The intended Yul
lowering is documented as a placeholder pending a Verity extension
(multi-arg verity_intrinsic syntax — currently only single-arg is
supported). The differential test certifies the actual Yul → solc
byte equivalence empirically.
halmos_witness/extract_witness.py: scaffold for the upgrade from the
heuristic layout extractor to Halmos symbolic execution. Documents the
pipeline (build → bytecode hash → halmos trace → measured witness
emission). When halmos is installed, populates the LayoutWitness with
per-artifact certified offsets instead of the canonical-schema fallback.
LoopRefinement.lean introduces:
- runHandleOpStep — per-op operational semantics matching
EntryPointV09.handleOp's shape
- runHandleOpBatch — iteration combinator threading the nonce table
and accumulating IndexedCallEvents
- runHandleOpStep_isSome_iff_valid — per-step success-iff lemma
- runHandleOpStep_emits_iff_executable — per-step event-iff lemma
The full iteration ≡ handleOpsMulti theorem is documented as a
follow-up. The per-step lemmas are the induction base.
Module-level docstrings on Proofs.lean (now 'historical record'),
Frame.lean ('superseded by upstream Verity.EVM.Frame'), and Bytecode.lean
('corollary tier'). Reviewers can now identify the load-bearing files
(IndexedCounting, Aggregator, Yoav, UserOp, Trace) vs. corollary
material at a glance.
90 theorems total in the case, zero sorry, zero axiom, full lake build
green.
Promotes a set of reusable EVM-frame and counting-trace primitives from the ERC-4337 EntryPoint benchmark (lfglabs-dev/verity-benchmark#32) into Verity proper. Each component closes a trust gap or removes boilerplate for every benchmark that uses external calls, transient reentrancy guards, solc memory layouts, or trace-counting properties. ## New modules - Verity/Core.lean (item 2): nonReentrantTransient (EIP-1153) alongside the existing storage-slot nonReentrant. Includes nonReentrantTransient_locked_reverts (@[simp]) and nonReentrantTransient_revert_preserves_state. Modern OpenZeppelin contracts use ReentrancyGuardTransient. - Verity/EVM/MemoryModel.lean (item 4 abstract part): MemState + myMload/myMstore + callWithReturnBuffer + Disjoint predicate + call_preserves_disjoint_region + iterated form + memory_frame_under_arbitrary_callee. - Verity/EVM/Frame.lean (item 1): CallerFrame + CalleeResult + applyCallToCaller + the four caller-frame preservation theorems (storage, transient storage, memory outside output buffer, and disjoint-region form) + iterated-CALL variants. Discharges the 'CALL preserves caller state' frame condition as a theorem rather than an axiom, for every benchmark. - Verity/EVM/Layout.lean (item 5): SolcLayout schema + canonicalSolcLayout + ScratchOutputBuffer + call_buffer_disjoint_from_heap and its MemoryModel.Disjoint form. Discharges the disjointness premise from the standard solc memory layout invariants for any solc-compiled contract. - Verity/Trace.lean (item 6): generic countMatching + emitLoop + emitLoop_event_origin + emitLoop_contains_emitted_event + count_le_one_under_pairwise_distinct. Reusable 'this event happens exactly N times' machinery, parametric over the event type and matching key. ## Parser fix - Contracts/Common.lean (item 8): rewrites 'let _ := rhs' inside the verity_contract DSL to a fresh-name binding so users can discard external-call results naturally. Without this rule the verity_contract function-body parser rejects 'let _ := …' as an unsupported do element. ## Documentation - AXIOMS.md: External Call Module section now points to the new caller-frame preservation theorems and clarifies that EVM frame preservation is no longer an assumption. - TRUST_ASSUMPTIONS.md: #7 External Call Modules entry calls out the same shift. - docs/ROADMAP.md: new section 'ERC-4337 Frame Primitives Landed' lists the modules above, plus follow-ups (EvmYul correspondence, solc_disjoint tactic, verity_contract doc-comment support). ## Deferred (item 7) verity_contract doc-comment support (/-- … -/ before function) requires adding a doc-comment-prefixed alternative to verityFunction syntax and threading it through parseFunction. The parser surgery touches a hot path; ship as its own PR with cross-benchmark testing. ## Removed (item 3 — fromSolidity) The Verity.Compiler.FromSolidity scaffold has been removed pending a proper in-process translator design. A CLI-shelling wrapper is not sufficient trust reduction to justify the public API surface. ## Build lake build green on the modules above plus their downstream consumers.
EntryPointV09.lean header no longer claims a faithful translation: it is a one-op, decoded-parameter projection (single sequential nonce, boolean paymaster oracle, no prefund accounting, and a transient-mutex guard the Solidity source does not have). Correct the Contract.lean nonce comment to point at _validateAndUpdateNonce inside _validatePrepayment, and update CRITICAL_PATH.md: upstream frame/layout/guard lemmas now exist in verity main, but local duplicates remain pending a refactor.
|
Pushed three rounds of improvements following an in-depth faithfulness/ECM audit of the EntryPoint v0.9 model:
Next planned: 2D nonce keying + packed validationData tightening, then real Verity scalar events for the counting theorem. |
| (ops : List PackedUserOperation) (table : Nonce2DTable) | ||
| (approvals : List Bool) (i : Nat) (hi : i < ops.length) | ||
| (hVal : batchValidated ops table approvals = true) : | ||
| abstractOpInfoFor ops table approvals i = EntryPointV09.OP_INFO_EXECUTED := by |
There was a problem hiding this comment.
Refinement theorem omits callData premise
Medium Severity
refinement_storage_delta_matches_abstract is documented as applying when batch validation succeeds and the op has non-empty callData, but the proved statement only assumes batchValidated and ignores hasCallData and the index i in abstractOpInfoFor. On any successful batch, the predictor returns OP_INFO_EXECUTED for every index, including ops that never enter the sender-call branch.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 922be7b. Configure here.
…aithful post-account/pre-paymaster order - Change nonces storage from Address→ to Uint256→ (keyed on the decoded param, per one-op projection style; matches Frame.lean approach). - Reorder in _validateAccount: account externalCall first, then nonce check+ bump (using get/setMappingUint), then success require. This places the check after account validation and before paymaster (in _validatePrepayment caller). - Update Refinement storage assumptions to storageMapUint keyed by key. - Update Contract full-model comment for accurate placement and 2D note (scalar model kept as illustration). - Update all related doc comments to keep claims honest: this remains a decoded-parameter projection; 2D faithfulness for theorems is in UserOp/Trace. - No new axioms/sorries; build-clean. Fits the audit Priority 3 nonce gap.
…horizer gate, sig-fail vs time distinctions) - In EntryPointV09: elaborate that oracle returns are packed validationData; success gate (==0) matches authorizer=0 (SIG_VALIDATION_SUCCESS); nonzero models both explicit SIG_VALIDATION_FAILED (1) and expired/not-yet-valid cases. Distinction in error strings not branched in projection (harmless for control-flow), but accept/reject decision now documented against the real _getValidationData / authorizer checks. - Update stub generator to document the packed return shape for differential oracles. - Cross-ref UserOp.lean vd* decomposition (already present, now explicitly tied). - Projection honesty preserved; no behavior change to oracles or requires (0 remains the success packed word). - Part of audit Priority 3 validation-data gap. Complements the nonce fidelity commit. No theorem changes (abstract approvals still Bool; vdValid used in pure model only).
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b09f3cd. Configure here.
| beneficiary, | ||
| op.initCode.length == 0 ? 0 : 1, | ||
| op.callData.length == 0 ? 0 : 1 | ||
| ); |
There was a problem hiding this comment.
Differential helper mispacks nonce
Medium Severity
The Verity differential helper always calls the flattened handleOps with key fixed at zero and passes the full packed op.nonce as declaredNonce. The EntryPointV09 projection expects the decoded uint192 key and uint64 sequence for that key, so any user op whose nonce uses a non-zero key is validated against the wrong storage slot and sequence value on the Verity side.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit b09f3cd. Configure here.


Summary
handleOpsstructure (validation phase + execution phase) as pure Lean functions with universally quantified validation resultssorry— builds clean withlake buildWhat is proven
The core ERC-4337 security property from the EIP spec:
This decomposes into:
execution_implies_validationvalidation_implies_executionexecution_iff_validationall_validated_on_successall_executed_on_successno_execution_on_revertsingle_op_execution_on_validationsingle_op_fee_collectedWhy this matters
Certora's SMT-based approach cannot universally quantify over arbitrary Turing-complete programs (the account/paymaster contracts). Lean's interactive theorem proving handles this naturally — the proofs hold for ALL possible validation outcomes via standard universal quantification over
List Bool.Files
Benchmark/Cases/ERC4337/EntryPointInvariant/Contract.lean— Abstract model of handleOps two-loop structureBenchmark/Cases/ERC4337/EntryPointInvariant/Specs.lean— Invariant specificationsBenchmark/Cases/ERC4337/EntryPointInvariant/Proofs.lean— Complete proofs (no sorry)Benchmark/Cases/ERC4337/EntryPointInvariant/Compile.lean— Build gatecases/erc4337/entry_point_invariant/— YAML metadata (case + 8 tasks)Test plan
lake buildpasses with no errorssorryin all proof files🤖 Generated with Claude Code
Note
Medium Risk
Large new proof surface and a Verity contract projection with documented ABI simplifications; differential tests reduce but do not eliminate trust in the compile/refinement bridge. CI adds external toolchain dependencies (Foundry, solc) on path-filtered PRs only.
Overview
Introduces a new ERC-4337 EntryPoint invariant benchmark and wires it into the global build via
Benchmark.lean/Benchmark/Cases/ERC4337.The case goes well beyond the original eight abstract
handleOpstheorems: it models packed user ops, 2D nonces, validation/execution traces, and proves the headlineyoav_indexed_counting_biconditional(per-op index counting, no pairwise-distinct(sender, callData)premise), plus a Verity scalar-event variant.Aggregator.leanreuses that result forhandleAggregatedOpsvia combined approvals;Yoav.lean/Bytecode.leancompose counting with upstreamVerity.EVM.Frame/ layout lemmas andEntryPointV09EOA guards. Supporting layers include abstractContract/Proofs, refinement/loop scaffolds, solcopInfoslayout witnesses, andCRITICAL_PATH.mdfor review navigation.Adds a Verity
EntryPointV09one-op projection (typed account/paymaster interfaces, custom errors for differential revert shape) and a Foundry differential harness comparing solc EntryPoint v0.9 vs Verity bytecode on validation/execution, replay, empty calldata, and reentrancy. A new.github/workflows/erc4337-differential.ymljob installs Foundry/solc, builds the case, and runsdifferential/run.shon relevant PR paths.Reviewed by Cursor Bugbot for commit b09f3cd. Bugbot is set up for automated code reviews on this repo. Configure here.