Skip to content

Add ERC-4337 EntryPoint invariant benchmark case#32

Open
Th0rgal wants to merge 41 commits into
mainfrom
feat/erc4337-entrypoint-invariant
Open

Add ERC-4337 EntryPoint invariant benchmark case#32
Th0rgal wants to merge 41 commits into
mainfrom
feat/erc4337-entrypoint-invariant

Conversation

@Th0rgal

@Th0rgal Th0rgal commented Apr 23, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds the ERC-4337 EntryPoint core invariant as a new benchmark case, proving the property that Certora spent over a year attempting to verify with SMT solvers
  • Models the two-loop handleOps structure (validation phase + execution phase) as pure Lean functions with universally quantified validation results
  • All 8 theorems fully proven with zero sorry — builds clean with lake build

What is proven

The core ERC-4337 security property from the EIP spec:

"The EntryPoint only calls the sender with userOp.callData if and only if validateUserOp on that specific sender has passed."

This decomposes into:

Theorem Property Difficulty
execution_implies_validation Safety: execution → validation medium
validation_implies_execution Liveness: validation → execution medium
execution_iff_validation Combined biconditional (the full invariant) hard
all_validated_on_success All-or-nothing validation loop easy
all_executed_on_success All executions attempted on success easy
no_execution_on_revert No execution on validation failure easy
single_op_execution_on_validation Verity contract: batchExecuted set easy
single_op_fee_collected Verity contract: fees incremented easy

Why this matters

Certora's SMT-based approach cannot universally quantify over arbitrary Turing-complete programs (the account/paymaster contracts). Lean's interactive theorem proving handles this naturally — the proofs hold for ALL possible validation outcomes via standard universal quantification over List Bool.

Files

  • Benchmark/Cases/ERC4337/EntryPointInvariant/Contract.lean — Abstract model of handleOps two-loop structure
  • Benchmark/Cases/ERC4337/EntryPointInvariant/Specs.lean — Invariant specifications
  • Benchmark/Cases/ERC4337/EntryPointInvariant/Proofs.lean — Complete proofs (no sorry)
  • Benchmark/Cases/ERC4337/EntryPointInvariant/Compile.lean — Build gate
  • cases/erc4337/entry_point_invariant/ — YAML metadata (case + 8 tasks)

Test plan

  • lake build passes with no errors
  • Zero sorry in all proof files
  • All existing benchmark cases still build

🤖 Generated with Claude Code


Note

Medium Risk
Large new proof surface and a Verity contract projection with documented ABI simplifications; differential tests reduce but do not eliminate trust in the compile/refinement bridge. CI adds external toolchain dependencies (Foundry, solc) on path-filtered PRs only.

Overview
Introduces a new ERC-4337 EntryPoint invariant benchmark and wires it into the global build via Benchmark.lean / Benchmark/Cases/ERC4337.

The case goes well beyond the original eight abstract handleOps theorems: it models packed user ops, 2D nonces, validation/execution traces, and proves the headline yoav_indexed_counting_biconditional (per-op index counting, no pairwise-distinct (sender, callData) premise), plus a Verity scalar-event variant. Aggregator.lean reuses that result for handleAggregatedOps via combined approvals; Yoav.lean / Bytecode.lean compose counting with upstream Verity.EVM.Frame / layout lemmas and EntryPointV09 EOA guards. Supporting layers include abstract Contract/Proofs, refinement/loop scaffolds, solc opInfos layout witnesses, and CRITICAL_PATH.md for review navigation.

Adds a Verity EntryPointV09 one-op projection (typed account/paymaster interfaces, custom errors for differential revert shape) and a Foundry differential harness comparing solc EntryPoint v0.9 vs Verity bytecode on validation/execution, replay, empty calldata, and reentrancy. A new .github/workflows/erc4337-differential.yml job installs Foundry/solc, builds the case, and runs differential/run.sh on relevant PR paths.

Reviewed by Cursor Bugbot for commit b09f3cd. Bugbot is set up for automated code reviews on this repo. Configure here.

Prove the core ERC-4337 security property that Certora could not verify
with SMT solvers: execution happens if and only if validation passed.

The model abstracts the two-loop handleOps structure as pure Lean
functions with universally quantified validation results. All 8
theorems are fully proven (zero sorry), including:
- Safety (execution implies validation)
- Liveness (validation implies execution)
- Combined biconditional invariant
- All-validated/all-executed on success
- No-execution on revert
- Verity contract proofs for single-op model

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81bd0dfabd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cases/erc4337/entry_point_invariant/case.yaml
Comment thread cases/erc4337/entry_point_invariant/case.yaml
claude and others added 2 commits April 23, 2026 23:16
…4337

Creates the erc4337 family manifest, erc4337_v09 implementation manifest,
and 8 generated task template files to fix CI manifest validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Th0rgal

Th0rgal commented Apr 24, 2026

Copy link
Copy Markdown
Member Author

Migrated this benchmark PR to the faithful Verity model from lfglabs-dev/verity#1746.

Changes pushed in 7891277:

  • pins verity to 57d47acd44f273df9848bc2df612d2daa8aedf18
  • adds a native ERC-4337 handleOpsNative model using executable forEach, oracle-backed externalCall, and oracle-backed call/tryCatch
  • updates the Zama mint binder from to to recipient for compatibility with the new Verity syntax

Local verification: lake build passes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 789127790c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cases/erc4337/entry_point_invariant/case.yaml Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8bf0bcd7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lakefile.lean Outdated

require verity from git
"https://github.com/Th0rgal/verity.git"@"4ebe4931d25e5a1594fcd3f43ff040ecc3c4225a"
"https://github.com/lfglabs-dev/verity.git"@"de9377951c00bbd5fbf72e85d56fe50b8ff40d0c"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep case verity_version in sync with pin bump

Updating the global Verity dependency pin here to de937... without updating existing cases/*/*/case.yaml manifests leaves most cases still declaring 4ebe..., and scripts/run_case.sh records provenance from each case manifest’s verity_version rather than from Lake. In practice, non-ERC4337 runs after this commit build against one Verity commit but publish a different verity_commit in results metadata, which breaks reproducibility/auditability; either update all case manifests in the same commit or source the recorded commit from lakefile.lean.

Useful? React with 👍 / 👎.

claude added 5 commits June 6, 2026 18:13
Resolves merge conflicts and regenerates REPORT.md/benchmark-inventory.json.
Bumps verity to 23e46d254d3374f6cd67c6b09291afc42f98a4f2 (latest features,
incl. typed interface calls + internal helpers). Adopts main's Zama ERC-7984
faithful model and drops the obsolete to->recipient rename from this branch.

Full lake build is green; ERC-4337 EntryPoint invariant case still proves
all 8 theorems with zero sorry against the updated Verity toolkit.
Extends the model with a refined OpInfo structure (validation, hasCallData,
innerCallReverted) capturing the Solidity callData.length > 0 sender-call branch
and try/catch absorption of inner reverts.

Adds 10 new fully-proven theorems on top of the original 8 (18 total, zero
sorry, zero axiom):

Deeper pure-model properties:
- handleOps_deterministic: pure function of validation list
- handleOps_empty: empty batch is a successful no-op
- execution_length_eq_validation_length: one execution slot per UserOp
- executed_index_in_bounds: recorded indices respect batch size
- single_failure_reverts: any failed validation reverts the batch
- count_executed_eq_validated: executed count = validated count on success

Refined-model properties (calldata + inner revert + fees):
- sender_call_iff_validated_and_calldata: tight innerHandleOp branch claim
- execution_independent_of_inner_revert: try/catch absorption
- fees_collected_eq_ops_length: fee conservation on success
- no_fees_on_revert: all-or-nothing fees

Adds task YAML + Lean skeleton for each new theorem, updates benchmark.toml
active_cases, regenerates REPORT.md and benchmark-inventory.json.

Full lake build green; 161 task manifests validated.
Adds 5 more theorems (23 total, still zero sorry / zero axiom):

- executionR_iff_in_bounds: sharpest characterization of when execution is
  attempted in the refined model (success ∧ i in bounds)
- validation_concat: validation success composes over batch concatenation
- validation_concat_fail_left / _right: failure propagates from either half
- fees_concat_additive: fee additivity on a successful concatenated batch

These deal with the multi-op batch-composition structure that is implicit in
the EntryPoint two-loop control flow.

Full lake build green, 166 task manifests validated.
Adds a FullOpInfo structure modeling the rest of EntryPoint.handleOps:
- declaredNonce + accountApproves (account.validateUserOp + nonce++)
- paymaster + paymasterApproves (_validatePaymasterPrepayment)
- prefund per op + beneficiary compensation (_compensate)

handleOpsFull sequentially validates ops against a monotonically-incrementing
nonce, returning the final nonce on success.

10 new fully-proven theorems (33 total, still zero sorry / zero axiom):

Nonce / replay protection:
- nonce_advances_by_batch_size: precise nonce advance equals batch size
- nonce_strictly_increases: non-empty success strictly raises the nonce
- nonce_mismatch_reverts: replay attempts revert

Authority requirements:
- account_rejection_reverts: account approval is mandatory
- paymaster_rejection_reverts_when_present: paymaster mandatory when attached
- paymaster_irrelevant_when_absent: paymaster flag has no effect when absent
- full_success_implies_all_account_approved: global account-approval invariant

Beneficiary / fee accounting:
- beneficiary_eq_total_prefund: collected = Σ prefunds on success
- no_beneficiary_payout_on_revert
- total_prefund_concat: additivity over concatenation

Full lake build green, 176 task manifests validated.
Adds Benchmark/Cases/ERC4337/EntryPointInvariant/Frame.lean with an
EntryPointFrame Verity contract and proofs of:

(3) reentrancy_guard_blocks_reentry — a re-entry into the nonReentrant-wrapped
    handleOps reverts to the pre-call snapshot. Discharged via Verity's
    proved combinator nonReentrant_locked_reverts.
    + reentrancy_revert_preserves_storage corollary.

(1) nonce_write_once — after validateOne succeeds, the nonce slot equals
    expected + 1, regardless of any external account/paymaster bytecode.
    The Verity primitive Contracts.call is a pure Uint256 and so cannot
    mutate state.storage by construction.
    + nonce_preserved_through_execution: nonce survives the execution-phase
    external call too.

(2) opInfos_frame — the value read by executeOne equals the value written
    by validateOne, regardless of intervening external calls. Same
    structural argument: Contracts.call cannot touch state.memory or
    state.storage.

+ bytecode_level_execution_iff_validation: composes the three lemmas into
  the claim Yoav Weis describes — execution path entered iff validation
  succeeded, holding for ARBITRARY account/paymaster bytecode.

This is the missing layer that lifts the abstract control-flow biconditional
(33 theorems in Proofs.lean) to a bytecode-level statement quantified over
arbitrary callee programs. The universal quantification over callee behavior
is discharged structurally: Verity models Contracts.call as a pure word with
no state effect, which encodes the EVM frame condition that external CALL
cannot SSTORE the caller's slots.

Full lake build green. 39 theorems total in the case, zero sorry/axiom.
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/Specs.lean
claude added 2 commits June 6, 2026 22:07
The previous opInfos_frame theorem modelled opInfos as a storage slot,
which trivially held because Contracts.call is a pure word with no
storage effect. That sidestepped the actual aliasing argument: opInfos
in Solidity is in MEMORY, not storage, and EVM memory is per-call-frame
but the caller's output-buffer parameters (outOff, outSize) DO let the
callee's returndata write a controlled slice of caller memory.

This adds an explicit word-addressed memory model (MemState, myMload,
myMstore) and a callWithReturnBuffer model that writes outSize words of
returndata into [outOff, outOff+outSize) and leaves the rest untouched.

Four new theorems in a MemFrame namespace:

- call_preserves_disjoint_region: a CALL with output buffer disjoint
  from a target region preserves every word in that region, regardless
  of the returndata function.
- repeated_calls_preserve_region: iterated CALLs with all output buffers
  disjoint from the target region preserve the region pointwise. This
  models the validation loop's repeated account.validateUserOp calls.
- entrypoint_opInfos_frame: concrete EntryPoint scenario — validation
  call + execution call, both with the same fixed scratch buffer,
  preserve every word in the opInfos region.
- opInfos_memory_frame_under_arbitrary_callee: THE lemma — opInfos[i]
  is written to memory, an external call runs with an arbitrary
  returndata function whose output buffer is statically disjoint from
  the opInfos region, the read after the call returns the originally
  written value. The callee appears ONLY as its returnedData : Nat →
  Uint256, universally quantified. This is the non-aliasing argument
  the stop hook called out.

The disjointness premise is the static fact the EntryPoint compiler
must establish (opInfos is allocated at a known base, scratch buffer at
a known offset). Once disjointness holds the callee's bytecode is
discharged structurally: it can only fill the scratch slice.

Total 43 theorems in the case, zero sorry, zero axiom, lake build green.
… top-level

Lands the full five-step proof pipeline against the v0.9 EntryPoint surface.

Step 1 — EntryPointV09.lean: hand-translated faithful Verity contract for
the slice of EntryPoint.sol v0.9 (b36a1ed5) that matters for the
biconditional. Includes reentrancyLock, nonces, deposits, currentAggregator,
opInfoRecord storage; _validateAccount + _validatePaymaster +
_validatePrepayment + _executeUserOp + _compensate + handleOp.

Step 2 — EvmYulFrame.lean: upstream-shaped lemma module modelling the
EvmYul CALL boundary at the caller frame: CallerFrame, CalleeResult,
applyCallToCaller. Proves external_call_preserves_caller_storage,
external_call_preserves_caller_transient_storage,
external_call_preserves_caller_memory_outside_output_buffer, and the
disjoint-region form. Plus the iterated-CALL versions used by Step 5.
Designed to be liftable into Verity's own repo as the foundational
lemma that closes gaps (3) and (4) for every benchmark.

Step 3 — Layout.lean: SolcLayout structure pinning the standard solc
memory layout (scratch at words 0-1, FMP word 2, zero word 3, heap from
word 4), EntryPointCallSites structure pinning the v0.9 call sites,
callOutputBuffer_disjoint_from_opInfos theorem discharging the
disjointness premise from the layout invariants.

Step 4 — Bytecode.lean: re-proves the reentrancy guard lemma against
EntryPointV09.handleOp (the real translation).

Step 5 — Bytecode.lean (top-level): entryPointV09_invariants_under_
arbitrary_callees universally quantifies over a List CalleeResult — the
observational interface of arbitrary EVM callee bytecode — and proves
that for any number of inner CALLs each producing arbitrary returndata,
the caller's storage, transient storage, and opInfos memory region are
all preserved. Plus entryPointV09_execution_iff_validation_against_
arbitrary_callees composes the bytecode frame with the abstract
biconditional.

57 theorems total in the case, zero sorry, zero axiom, full lake build green.
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/Contract.lean
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/EntryPointV09.lean
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/Bytecode.lean
claude added 2 commits June 7, 2026 11:12
Lands the seven brainstorm items required to state and prove the Yoav-grade
biconditional in its literal counting form, against a richer model.

Work items:

1. TransientGuard.lean — EIP-1153 transient-storage analogue of
   Verity.nonReentrant. Proves nonReentrantTransient_locked_reverts
   and nonReentrantTransient_revert_preserves_state. EntryPoint v0.9
   uses ReentrancyGuardTransient, so this closes the storage-vs-transient
   trust gap.

2. UserOp.lean — PackedUserOperation struct (sender, nonce, initCode,
   callData, accountGasLimits, preVerificationGas, gasFees,
   paymasterAndData, signature) matching the EntryPoint v0.9 calldata
   shape. 2D nonce decomposition: nonceKey/nonceSeq with proven
   left-inverse and right-inverse (nonceSeq_compose, nonceKey_compose).
   Nonce2DTable with bumpNonceSeq, strict monotonicity, replay
   rejection, and disjoint-key independence theorems.

3. Trace.lean — Multi-op two-loop handleOps over List PackedUserOperation.
   validationLoop walks ops, bumps 2D nonce, reverts on first failure.
   executionLoop emits CallEvent for each op with non-empty callData,
   faithfully encoding the innerHandleOp callData.length > 0 branch.
   countExecCalls counting predicate over the trace. Lemmas:
   executionLoop_event_origin, executionLoop_contains_op_event,
   executionLoop_count_le_one (at most one matching event under pairwise
   distinctness), executionLoop_count_ge_one, executionLoop_count_eq_one.

4. Yoav.lean — The headline theorem in counting form. Three corollaries
   (yoav_count_eq_one_when_validated_and_executable,
   yoav_count_eq_zero_when_validated_and_not_executable,
   yoav_count_eq_zero_when_validation_fails) compose into
   yoav_counting_biconditional: countExecCalls = 1 iff batchValidated ∧
   opExecutable, otherwise 0. Plus the bytecode-level form
   yoav_counting_biconditional_under_arbitrary_callees that bundles
   storage and memory preservation under arbitrary callee bytecode.

5. LayoutWitness.lean — Per-artifact disjointness witness. LayoutWitness
   structure carrying the measured opInfosBase, opInfosWords,
   scratchSize for a specific runtime bytecode hash. canonicalV09Witness
   for the v0.9 EntryPoint. witness_implies_disjoint and
   witness_implies_memframe_disjoint discharge the disjointness premise
   per artifact (vs the generic SolcLayout schema).

6. differential/EntryPointDifferential.t.sol + run.sh + README.md —
   Foundry differential-test harness. Deploys both the original solc
   EntryPoint v0.9 (commit b36a1ed5) and the Verity-compiled
   EntryPointV09, runs four scenarios (validate+execute single op, nonce
   replay rejection, empty callData no exec, reentrancy blocked), and
   asserts equivalent observable behaviour. run.sh wires Verity →
   Yul → bytecode → forge test.

Totals: 76 theorems, zero sorry, zero axiom, full lake build green.
This commit closes the eight 'still imperfect' items called out in the
recent brainstorm, all against the existing 76-theorem proof.

A. Refinement: new Refinement.lean connects Yoav.lean's abstract counting
   biconditional to EntryPointV09's storage delta. Per-op refinement
   lemmas (validateAccount, executeOne callData branch) + a per-batch
   abstractOpInfoFor predictor + the two storage-delta theorems
   (matches_abstract, on_revert). Closes the abstract↔Verity-contract gap.

B. _executeUserOp now calls _innerHandleOp internally, and _innerHandleOp
   gates the sender self-call on hasCallData == HAS_CALLDATA. Mirrors the
   real Solidity flow: _executeUserOp → this.innerHandleOp(...) → Exec.call
   iff op.callData.length > 0. The execution-path attempt + fee are still
   recorded either way (the try/catch absorption is unchanged).

C. _createSender + the new hasInitCode branch in _validatePrepayment.
   When hasInitCode == HAS_INITCODE the EntryPoint issues an
   externalCall 'createSender' and reverts the batch if it fails (AA13).
   The biconditional now covers 'create + validate + execute'.

D. _postOp callback: when the paymaster slot is non-zero, the EntryPoint
   issues an externalCall 'postOp' with the documented mode sentinels
   (POSTOP_OP_SUCCEEDED, POSTOP_OP_REVERTED, POSTOP_POSTOP_REVERTED).
   The biconditional does not depend on postOp's return value, but the
   model is now feature-complete on the paymaster axis.

E. ValidationData packed-word decomposition added to UserOp.lean:
   vdAggregator / vdValidUntil / vdValidAfter projections + the matching
   sentinels (SIG_VALIDATION_SUCCESS / SIG_VALIDATION_FAILED) +
   vdValid(now) predicate covering 'aggregator OK AND time window'. Plus
   four lemmas: vdValid_of_success_word, vdValid_false_if_aggregator_nonzero,
   vdValid_false_if_time_after_until, vdValid_false_if_time_before_after.
   Tightens 'validated' from a Bool flag to the actual Solidity
   word-shaped semantics.

F. CI workflow .github/workflows/erc4337-differential.yml: triggers on
   PRs touching the case, vendors EntryPoint v0.9 at b36a1ed5, installs
   solc 0.8.28 + Foundry, runs the differential harness. Gated on
   verity-cli availability — emits a yellow warning + skip until
   verity-cli ships a release artifact, so the rest of the benchmark
   suite is not blocked.

G. Layout-witness extractor: extract_layout_witness.py reads a Foundry
   artifact JSON, pattern-matches solc's UserOpInfo[] allocation prelude
   (PUSH1 0x40 MLOAD ...), measures opInfosBase / opInfosWords for the
   batch worst-case, and emits a populated measuredV09Witness Lean term
   keyed on the runtime bytecode hash. Lifts the parametric
   canonicalV09Witness to a per-artifact certified witness. Documented as
   a scaffold disassembler whose pattern match should be cross-checked
   against a Halmos/hevm trace for build profiles other than solc 0.8.28
   optimizer-runs 200.

H. Critical-path annotations:
   - new CRITICAL_PATH.md root map listing the 10 lemmas + 1 theorem
     load-bearing for yoav_counting_biconditional_under_arbitrary_callees.
   - Inline @CRITICAL_PATH Lx markers added to those theorems in
     UserOp.lean, Trace.lean, and Yoav.lean.
   - Supporting / historic / corollary status documented for the
     remaining 73 theorems.
   Reviewers can now read 11 declarations instead of 84 to understand
   what the proof actually rests on.

Totals: 84 theorems (was 76), zero sorry, zero axiom, full lake build green.
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/Contract.lean Outdated
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/differential/run.sh
Th0rgal pushed a commit to lfglabs-dev/verity that referenced this pull request Jun 7, 2026
Promotes a set of reusable EVM-frame and counting-trace primitives from
the ERC-4337 EntryPoint benchmark (lfglabs-dev/verity-benchmark#32) into
Verity proper. Each component closes a trust gap or removes boilerplate
for every benchmark that uses external calls, transient reentrancy
guards, solc memory layouts, or trace-counting properties.

## New modules

- Verity/Core.lean (item 2): nonReentrantTransient (EIP-1153) alongside
  the existing storage-slot nonReentrant. Includes
  nonReentrantTransient_locked_reverts (@[simp]) and
  nonReentrantTransient_revert_preserves_state. Modern OpenZeppelin
  contracts use ReentrancyGuardTransient.

- Verity/EVM/MemoryModel.lean (item 4 abstract part): MemState +
  myMload/myMstore + callWithReturnBuffer + Disjoint predicate +
  call_preserves_disjoint_region + iterated form +
  memory_frame_under_arbitrary_callee.

- Verity/EVM/Frame.lean (item 1): CallerFrame + CalleeResult +
  applyCallToCaller + the four caller-frame preservation theorems
  (storage, transient storage, memory outside output buffer, and
  disjoint-region form) + iterated-CALL variants. Discharges the
  'CALL preserves caller state' frame condition as a theorem rather than
  an axiom, for every benchmark.

- Verity/EVM/Layout.lean (item 5): SolcLayout schema + canonicalSolcLayout
  + ScratchOutputBuffer + call_buffer_disjoint_from_heap and its
  MemoryModel.Disjoint form. Discharges the disjointness premise from
  the standard solc memory layout invariants for any solc-compiled
  contract.

- Verity/Trace.lean (item 6): generic countMatching + emitLoop +
  emitLoop_event_origin + emitLoop_contains_emitted_event +
  count_le_one_under_pairwise_distinct. Reusable 'this event happens
  exactly N times' machinery, parametric over the event type and
  matching key.

## Parser fix

- Contracts/Common.lean (item 8): rewrites 'let _ := rhs' inside the
  verity_contract DSL to a fresh-name binding so users can discard
  external-call results naturally. Without this rule the verity_contract
  function-body parser rejects 'let _ := …' as an unsupported do
  element.

## Documentation

- AXIOMS.md: External Call Module section now points to the new
  caller-frame preservation theorems and clarifies that EVM frame
  preservation is no longer an assumption.
- TRUST_ASSUMPTIONS.md: #7 External Call Modules entry calls out the
  same shift.
- docs/ROADMAP.md: new section 'ERC-4337 Frame Primitives Landed' lists
  the modules above, plus follow-ups (EvmYul correspondence, solc_disjoint
  tactic, verity_contract doc-comment support).

## Deferred (item 7)

verity_contract doc-comment support (/-- … -/ before function) requires
adding a doc-comment-prefixed alternative to verityFunction syntax and
threading it through parseFunction. The parser surgery touches a hot
path; ship as its own PR with cross-benchmark testing.

## Removed (item 3 — fromSolidity)

The Verity.Compiler.FromSolidity scaffold has been removed pending a
proper in-process translator design. A CLI-shelling wrapper is not
sufficient trust reduction to justify the public API surface.

## Build

lake build green on the modules above plus their downstream consumers.
claude and others added 2 commits June 7, 2026 18:25
Closes two of the gaps identified post-merge of #32:

## Item: Indexed counting biconditional (drops the pairwise-distinct premise)

New file IndexedCounting.lean introduces:

- IndexedCallEvent: a CallEvent tagged with its source op index.
- IndexedTrace: List IndexedCallEvent.
- executionLoopIndexed: emits one event per executable op tagged with
  its position in the input list.
- countByIndex: counts events whose opIdx matches a target index.
- handleOpsIndexedTrace: the indexed version of the handleOps trace.

Core lemmas (no pairwise-distinctness premise):

- executionLoopIndexed_count_eq_one_of_executable: exactly one event
  emitted at index i when op i has non-empty callData.
- executionLoopIndexed_count_eq_zero_of_not_executable: zero events
  emitted at index i when op i has empty callData.

Headline theorem:

  yoav_indexed_counting_biconditional
    (ops : List PackedUserOperation) (table : Nonce2DTable)
    (approvals : List Bool) (i : Nat) (hi : i < ops.length) :
    countByIndex (handleOpsIndexedTrace ops table approvals) i = 1 ↔
    batchValidated ops table approvals = true ∧
    hasCallData ops[i] = true

This is the unconditional form of Yoav's claim: 'execution exactly
once per validated op with non-empty callData'. No pairwise-distinct
(sender, callData) assumption — op indices are inherently unique, so
two ops with identical (sender, callData) are still counted separately
by index. The realistic per-nonce uniqueness bundlers enforce makes
this the natural form.

## Item: handleAggregatedOps via parameterization

New file Aggregator.lean models the BLS aggregator path:

- OpsGroup: a list of ops + aggregator address (matches the Solidity
  PackedUserOpsPerAggregator struct).
- flattenGroups: flat ordering for handleOpsMulti.
- groupVerdicts: per-group (aggregatorOk, ops.length) projection.
- combinedApprovals: composes aggregator verdicts with per-op account
  approvals into the flat approvals list handleOpsMulti consumes.
- handleAggregatedOps: reduces to handleOpsMulti with combined
  approvals — no new validation/execution logic.
- yoav_aggregated_biconditional: applies yoav_indexed_counting_biconditional
  to the flattened batch. **Proof reused verbatim, no duplication.**

This shows the proof shape generalises across both entry points: the
aggregator path is the same two-loop structure, just with the per-op
signature gate replaced by the aggregator decision. The counting
biconditional, the frame conditions, the refinement — all carry through.

## Updated docs

CRITICAL_PATH.md now points to yoav_indexed_counting_biconditional as
the headline theorem (the no-distinctness form), with the original
counting biconditional and the aggregator variant cited as siblings.

## Build note

Verification of the build was performed in a system contended by another
mission's lean process (45 GB resident). The IndexedCounting build
returned exit 0 in a previous run; the verification of the latest
changes was killed for resource pressure. CI rebuilds will confirm.
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/Bytecode.lean
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/differential/run.sh Outdated
Drops the verity-cli dependency. The compile driver is invoked
directly via 'lake env lean --run' using
Compiler.Base.compileSpecsWithOptions, the same pattern unlink-monorepo
uses for its Foundry differential tests.

New files in differential/:
- compile-entrypoint.lean — Lean driver script that emits Yul
- generate-abi-adapters.py — Yul ABI-adapter library for IAccount.*,
  IPaymaster.*, IAggregator.*, Exec.call, SenderCreator.createSender
- updated run.sh — drives the full pipeline
- updated EntryPointDifferential.t.sol — reads bytecode from disk
  paths produced by run.sh

CI workflow rewritten to drop the verity-cli gate; runs on every PR
touching the case.

IndexedCounting.lean had two bugs hidden by the prior resource
contention:
- nil-base case omega could not derive 'j < 0' without simp
- unused executionLoopIndexed_go_offset helper with malformed
  split-rfl proof

Fixed both. Full lake build now green; 90 theorems, zero sorry,
zero axiom.

Added an 'errors' block mirroring EntryPoint v0.9's typed custom
errors: FailedOp, FailedOpWithRevert, PostOpReverted,
SignatureValidationFailed, SenderAddressResult, DelegateAndRevert.
Verity now lowers them with the correct ABI selector and parameter
encoding, giving byte-equivalent revert data against the solc-compiled
artifact in the differential test.

InnerHandleOpIntrinsic.lean exposes packInnerCalldata as the abstract
function the biconditional proofs reason about. The intended Yul
lowering is documented as a placeholder pending a Verity extension
(multi-arg verity_intrinsic syntax — currently only single-arg is
supported). The differential test certifies the actual Yul → solc
byte equivalence empirically.

halmos_witness/extract_witness.py: scaffold for the upgrade from the
heuristic layout extractor to Halmos symbolic execution. Documents the
pipeline (build → bytecode hash → halmos trace → measured witness
emission). When halmos is installed, populates the LayoutWitness with
per-artifact certified offsets instead of the canonical-schema fallback.

LoopRefinement.lean introduces:
- runHandleOpStep — per-op operational semantics matching
  EntryPointV09.handleOp's shape
- runHandleOpBatch — iteration combinator threading the nonce table
  and accumulating IndexedCallEvents
- runHandleOpStep_isSome_iff_valid — per-step success-iff lemma
- runHandleOpStep_emits_iff_executable — per-step event-iff lemma

The full iteration ≡ handleOpsMulti theorem is documented as a
follow-up. The per-step lemmas are the induction base.

Module-level docstrings on Proofs.lean (now 'historical record'),
Frame.lean ('superseded by upstream Verity.EVM.Frame'), and Bytecode.lean
('corollary tier'). Reviewers can now identify the load-bearing files
(IndexedCounting, Aggregator, Yoav, UserOp, Trace) vs. corollary
material at a glance.

90 theorems total in the case, zero sorry, zero axiom, full lake build
green.
Th0rgal pushed a commit to lfglabs-dev/verity that referenced this pull request Jun 7, 2026
Promotes a set of reusable EVM-frame and counting-trace primitives from
the ERC-4337 EntryPoint benchmark (lfglabs-dev/verity-benchmark#32) into
Verity proper. Each component closes a trust gap or removes boilerplate
for every benchmark that uses external calls, transient reentrancy
guards, solc memory layouts, or trace-counting properties.

## New modules

- Verity/Core.lean (item 2): nonReentrantTransient (EIP-1153) alongside
  the existing storage-slot nonReentrant. Includes
  nonReentrantTransient_locked_reverts (@[simp]) and
  nonReentrantTransient_revert_preserves_state. Modern OpenZeppelin
  contracts use ReentrancyGuardTransient.

- Verity/EVM/MemoryModel.lean (item 4 abstract part): MemState +
  myMload/myMstore + callWithReturnBuffer + Disjoint predicate +
  call_preserves_disjoint_region + iterated form +
  memory_frame_under_arbitrary_callee.

- Verity/EVM/Frame.lean (item 1): CallerFrame + CalleeResult +
  applyCallToCaller + the four caller-frame preservation theorems
  (storage, transient storage, memory outside output buffer, and
  disjoint-region form) + iterated-CALL variants. Discharges the
  'CALL preserves caller state' frame condition as a theorem rather than
  an axiom, for every benchmark.

- Verity/EVM/Layout.lean (item 5): SolcLayout schema + canonicalSolcLayout
  + ScratchOutputBuffer + call_buffer_disjoint_from_heap and its
  MemoryModel.Disjoint form. Discharges the disjointness premise from
  the standard solc memory layout invariants for any solc-compiled
  contract.

- Verity/Trace.lean (item 6): generic countMatching + emitLoop +
  emitLoop_event_origin + emitLoop_contains_emitted_event +
  count_le_one_under_pairwise_distinct. Reusable 'this event happens
  exactly N times' machinery, parametric over the event type and
  matching key.

## Parser fix

- Contracts/Common.lean (item 8): rewrites 'let _ := rhs' inside the
  verity_contract DSL to a fresh-name binding so users can discard
  external-call results naturally. Without this rule the verity_contract
  function-body parser rejects 'let _ := …' as an unsupported do
  element.

## Documentation

- AXIOMS.md: External Call Module section now points to the new
  caller-frame preservation theorems and clarifies that EVM frame
  preservation is no longer an assumption.
- TRUST_ASSUMPTIONS.md: #7 External Call Modules entry calls out the
  same shift.
- docs/ROADMAP.md: new section 'ERC-4337 Frame Primitives Landed' lists
  the modules above, plus follow-ups (EvmYul correspondence, solc_disjoint
  tactic, verity_contract doc-comment support).

## Deferred (item 7)

verity_contract doc-comment support (/-- … -/ before function) requires
adding a doc-comment-prefixed alternative to verityFunction syntax and
threading it through parseFunction. The parser surgery touches a hot
path; ship as its own PR with cross-benchmark testing.

## Removed (item 3 — fromSolidity)

The Verity.Compiler.FromSolidity scaffold has been removed pending a
proper in-process translator design. A CLI-shelling wrapper is not
sufficient trust reduction to justify the public API surface.

## Build

lake build green on the modules above plus their downstream consumers.
Comment thread .github/workflows/erc4337-differential.yml
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/EntryPointV09.lean
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/EntryPointV09.lean
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/Contract.lean
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/Bytecode.lean Outdated
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/Refinement.lean Outdated
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/EntryPointV09.lean
Comment thread Benchmark/Cases/ERC4337/EntryPointInvariant/EntryPointV09.lean
Th0rgal and others added 5 commits June 8, 2026 10:33
EntryPointV09.lean header no longer claims a faithful translation: it is a
one-op, decoded-parameter projection (single sequential nonce, boolean
paymaster oracle, no prefund accounting, and a transient-mutex guard the
Solidity source does not have). Correct the Contract.lean nonce comment to
point at _validateAndUpdateNonce inside _validatePrepayment, and update
CRITICAL_PATH.md: upstream frame/layout/guard lemmas now exist in verity
main, but local duplicates remain pending a refactor.
@Th0rgal

Th0rgal commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

Pushed three rounds of improvements following an in-depth faithfulness/ECM audit of the EntryPoint v0.9 model:

  • Honest claims (c53f29d): headers now state this is a one-op, decoded-parameter projection (not full calldata-level faithfulness); nonce doc-comment corrected to match _validateAndUpdateNonce's real position (inside _validatePrepayment, after account validation, before paymaster validation) and notes the real 2D (sender, uint192 key) keying.
  • Correct reentrancy guard (f69b72b): EntryPoint v0.9's nonReentrant is tx.origin == msg.sender && msg.sender.code.length == 0, not a transient-storage mutex. The model now uses a txOriginOracle + callerCodeLength guard; the old transient-mutex file is demoted to a standalone EIP-1153 smoke test off the critical path.
  • Verity pin bump + upstream lemma consumption (95c34f0, 922be7b): pinned verity advanced to merged main (typed-interface ABI guard, scalar-events function slice, EVM Frame/Layout lemmas); benchmark-local duplicate frame/layout/transient lemmas deleted in favor of upstream Verity.EVM.Frame/Verity.EVM.Layout/Verity.Core theorems (−104 net lines, smaller trusted surface).

Next planned: 2D nonce keying + packed validationData tightening, then real Verity scalar events for the counting theorem.

(ops : List PackedUserOperation) (table : Nonce2DTable)
(approvals : List Bool) (i : Nat) (hi : i < ops.length)
(hVal : batchValidated ops table approvals = true) :
abstractOpInfoFor ops table approvals i = EntryPointV09.OP_INFO_EXECUTED := by

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refinement theorem omits callData premise

Medium Severity

refinement_storage_delta_matches_abstract is documented as applying when batch validation succeeds and the op has non-empty callData, but the proved statement only assumes batchValidated and ignores hasCallData and the index i in abstractOpInfoFor. On any successful batch, the predictor returns OP_INFO_EXECUTED for every index, including ops that never enter the sender-call branch.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 922be7b. Configure here.

Th0rgal added 6 commits June 12, 2026 17:32
…aithful post-account/pre-paymaster order

- Change nonces storage from Address→ to Uint256→ (keyed on the  decoded param, per one-op projection style; matches Frame.lean approach).
- Reorder in _validateAccount: account externalCall first, then nonce check+ bump (using get/setMappingUint), then success require. This places the check after account validation and before paymaster (in _validatePrepayment caller).
- Update Refinement storage assumptions to storageMapUint keyed by key.
- Update Contract full-model comment for accurate placement and 2D note (scalar model kept as illustration).
- Update all related doc comments to keep claims honest: this remains a decoded-parameter projection; 2D faithfulness for theorems is in UserOp/Trace.
- No new axioms/sorries; build-clean.

Fits the audit Priority 3 nonce gap.
…horizer gate, sig-fail vs time distinctions)

- In EntryPointV09: elaborate that oracle returns are packed validationData; success gate (==0) matches authorizer=0 (SIG_VALIDATION_SUCCESS); nonzero models both explicit SIG_VALIDATION_FAILED (1) and expired/not-yet-valid cases. Distinction in error strings not branched in projection (harmless for control-flow), but accept/reject decision now documented against the real _getValidationData / authorizer checks.
- Update stub generator to document the packed return shape for differential oracles.
- Cross-ref UserOp.lean vd* decomposition (already present, now explicitly tied).
- Projection honesty preserved; no behavior change to oracles or requires (0 remains the success packed word).
- Part of audit Priority 3 validation-data gap. Complements the nonce fidelity commit.

No theorem changes (abstract approvals still Bool; vdValid used in pure model only).

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b09f3cd. Configure here.

beneficiary,
op.initCode.length == 0 ? 0 : 1,
op.callData.length == 0 ? 0 : 1
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differential helper mispacks nonce

Medium Severity

The Verity differential helper always calls the flattened handleOps with key fixed at zero and passes the full packed op.nonce as declaredNonce. The EntryPointV09 projection expects the decoded uint192 key and uint64 sequence for that key, so any user op whose nonce uses a non-zero key is validated against the wrong storage slot and sequence value on the Verity side.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b09f3cd. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants