Skip to content

Commit fa93e54

Browse files
johnchandlerburnhamarthurpaulinosamuelburnham
authored
Lazy env, anon-mode kernel check, and ZK-aggregation foundations (#415)
* Refactor Claim ADT + add merkle infrastructure for ZK aggregation Adds the format hooks needed before recursive verification lands in Aiur. Five-variant Claim ADT with explicit assumption commitments, a canonical Blake3 merkle module, a serializable AssumptionTree for recovering leaf sets, and a Contains claim for the discharge step. Claim ADT (5 variants): - Eval { input, output, assumptions: Option<Address> } - Check { const_addr, assumptions: Option<Address> } - CheckEnv { root, assumptions: Option<Address> } - Reveal { comm, info } -- unchanged, no assumptions - Contains { tree, const_addr } -- new, for inclusion proofs Tag4 reorganized to keep everything in single-byte tags: - 0xE for env, comm, AssumptionTree, and claims (slots 0-7) - 0xF for proofs (slots 0-4; 5-7 reserved) - Comm moved from variant 5 -> 1 Matches the "Variant (0-7)" constraint documented in docs/Ixon.md. Env serialization: - Every .ixe file now carries a canonical merkle root over its consts.keys() in the on-disk header (non-optional, 32 bytes; empty const sets use the zero-address sentinel). - Two envs with the same const set produce byte-identical roots regardless of insertion order. Verified on deserialize. New modules: - src/ix/ixon/merkle.rs + Ix/Merkle.lean: canonical sorted builder, free-form merkle_join composition, membership proofs, domain separation per RFC 6962. - src/ix/ixon/assumption_tree.rs + Ix/AssumptionTree.lean: serializable merkle tree with Leaf/Padding/Node variants. canonical() builds the same shape merkle_root_canonical hashes; join() is O(1) free-form composition. - src/ix/kernel/claim.rs: builders that compute transitive-dep assumptions from an env (build_check_claim, build_eval_claim, build_check_env_claim, env_merkle_root). Other: - Extracted shared BFS walker on Env::bfs_refs + Env::transitive_deps_excl; the inlined test-feature copy in lean_env.rs now calls into it. - Lean FFI export rs_env_merkle_root for cross-impl verification of the env root. - Proof bytes for all variants are uniform opaque ZK bytes; witness data (e.g., Contains merkle paths) is prover-side scratch consumed by the ZK circuit and not transmitted on the wire. - docs/Ixon.md Tag4 tables and env section updated; .ixe extension documented. Recursive verification (the ZK proof generation for Contains and the aggregation discharge transitions) is intentionally deferred to a follow-up. Tests: 993 Rust unit tests pass (was 953 pre-refactor), 813 Lean tests pass with no failures; cargo clippy clean. * Add lazy env deserialization and anon-mode kernel FFI Two related changes to reduce kernel memory + lay groundwork for metadata-isolated typechecking. **Lazy constant deserialization.** `Env::consts` now stores `LazyConstant` (`Arc<[u8]>` + `OnceLock<Arc<Constant>>`) instead of `Constant`. Constants are materialized on first access and the structured form is cached. Sparse access patterns (single-constant typecheck, transitive_deps walk, claim builders) only parse the closure they need; non-reachable constants stay as raw bytes. The `.ixe` section-2 layout gains a Tag0 length sidecar before each constant's Tag4 bytes: old: [addr:32] [Tag4 constant bytes] new: [addr:32] [Tag0 length] [Tag4 constant bytes] The length is section-level framing, not part of the constant's content hash. `Address::hash(raw_bytes) == addr` is preserved. `Constant::put`/`Constant::get` are unchanged. The Lean side (`Ix.Ixon.putEnv`/`getEnv`) reads/writes the new format. **Anon-mode kernel FFI.** New `rs_kernel_check_consts_anon(path, addrs, quiet)` exposes anonymous-mode typechecking by content address. The kernel runs as `KEnv<Anon>` / `TypeChecker<Anon>` with every `M::MField<T>` erased to `()`, so the typechecking logic structurally cannot read metadata. Useful for zkPCC verifiers that hold only addresses. Supporting pieces: - `KernelMode::HAS_META: bool` for future compile-time gating. - `AnonEnv<'a>` wrapper (`src/ix/kernel/anon_env.rs`) exposing only consts/blobs/transitive walks — no `named`/`names`/`comms`. - `Env::get_anon` reads header + blobs + consts, parse-and-drops metadata sections (3-5), returns an `Env` with empty `named`/ `names`/`comms`. Same merkle-root verification as `Env::get`. - `rs_de_env_anon` FFI + `Ix.Ixon.rsDeEnvAnon` Lean wrapper. - `Ix.KernelCheck.rsCheckConstsAnonFFI` Lean binding. Caveat: `ixon_ingress::<Anon>` still consults `Env::named` internally to enumerate work items. The resulting `KEnv<Anon>` is metadata-free so the typechecker is anon, but full ingress-level metadata isolation is a follow-up. Tests: 16 new (9 lazy + sparsity; 3 AnonEnv; 4 get_anon). All 1009 Rust unit tests pass; all 813 Lean tests pass; clippy clean. * Simplify ix check + add metadata-free anon mode - `ix check` becomes .ixe-only with a positional `<path>`. Drops the `--lean` compile-and-check flow and the `--env` flag; direct Lean → kernel checks remain available through `rsCheckConstsFFI` for tests. Removes the redundant `CheckIxonCmd.lean`. - New `--anon` flag runs a metadata-free kernel check: loads the .ixe via `Env::get_anon` (discards named/names/comms), enumerates work items from `env.consts` alone, and rebuilds member + ctor projection addresses deterministically via `Constant::commit`. Rejects `--consts`/`--ns`/`--consts-file` since it always checks everything. - Compiler: unwrap singleton non-inductive Muts blocks to standalone Defn/Recr. `Expr::Rec(0, univs)` already resolves correctly against a one-member block, so the wrapper was pure overhead; this keeps the env structurally uniform and matches `compile_single_def`. The anon ingress for standalones passes `mut_ctx_override = [self_id]` so self-recursive standalones still typecheck. - Kernel: anon parallel runner mirrors `run_checks_parallel_on_large _stacks` (32 workers, slow-detection, RSS tracking). Genericized `check_one_const<M>` / `check_consts_loop<M>` / `format_tc_error<M>` to share the runner. `KernelMode::meta_field_with` / `meta_field_try` gate metadata lookups in expression ingress at compile time. - Anon lazy ingress (`LazyAnonIngress` in tc.rs, helpers in ingress.rs) dedupes blocks via `kenv.blocks.contains_key(&KId<Anon>(B, ()))` before re-running `ingress_anon_block` (prevents N² re-ingestion when sibling projections fault separately within one worker check). - `Env::get_anon` now harvests `ReducibilityHints` from the otherwise-discarded Named section into a sidecar `Env::anon_hints: FxHashMap<Address, ReducibilityHints>`. Anon ingress threads these through a new `hints_override` parameter on `ingress_defn` so the lazy-delta tiebreak in `def_eq::def_rank_id` sees realistic heights instead of `Regular(0)`. Hints are performance advice, not correctness data — supplying them in anon mode preserves the metadata-free trust model. - Regenerate the canonical addresses in `PrimAddrs::new()`; singleton- unwrap changed the content hash of every self-recursive primitive (Nat.add, String, Nat.rec, …). End-to-end on compileinitstd.ixe (105,487 constants): meta: 105487/105487 in 20.6s, peak RSS 7.4GB anon: 89010/89010 in 17.8s, peak RSS 4.0GB * Clean up cargo clippy --all-targets - 11× `cloned_ref_to_slice_refs`: `&[x.clone()]` → `std::slice::from_ref(&x)` in `assumption_tree.rs` + `merkle.rs` test modules. Same allocator footprint, no clone, matches the lint's recommended idiom. - 2× `useless_vec`: `vec![0xE3, 0x00, 0x00]` → `[0xE3, 0x00, 0x00]` in serde-reject tests where the buffer is only borrowed. All 1011 unit tests pass. `cargo clippy --all-targets` is now clean. * ix check: print full content hashes + add --workers flag - The anon-mode progress / failure-log labels and the meta-mode hash-display fallback were truncating addresses to 16 hex chars (`@1f4b195aefa10e26`). Drop the `[..16]` slice so the full 64-char Blake3 hex is printed (`@1f4b195aefa10e2690d13c5b98b3d9124d3fbb5c…`), matching what tooling like `--fail-out` records and what the metadata-free workflow actually needs to identify a constant. - `--workers N` flag on `ix check` plumbs through the existing `IX_KERNEL_CHECK_WORKERS` env var that `resolve_kernel_check_workers` in `src/ffi/kernel.rs` reads. Useful for isolating per-worker memory cost (`--workers 1` lets you see the env's static footprint without per-worker overhead) and for capping concurrency in resource-tight contexts. * Mmap .ixe + cache-free LazyConstant for anon mode `lake exe ix check compilemathlib.ixe --anon` on a 3.2 GB env now peaks at ~11 GB RSS, down from ~40 GB; build_anon_work is 50× faster. Three intertwined changes: 1. Memory-map the .ixe (avoids ~3.2 GB heap copy of bytes) - Cargo: add memmap2 = "0.9". - `BytesSource` enum in `src/ix/ixon/lazy.rs` distinguishes heap-resident `Arc<[u8]>` from `(Arc<Mmap>, offset, len)` windows into a memory-mapped file. `LazyConstant::raw_bytes`, `verify_address`, `PartialEq` route through `BytesSource::as_slice` so every existing consumer is mmap-transparent. - `Env::get_anon_mmap(path)` in `src/ix/ixon/serialize.rs` opens the file, mmaps it, and stores Section-2 consts as mmap-backed `LazyConstant`s. Sections 3-4 (names + named) are still parsed transiently to harvest hints, then dropped before return. - `rs_kernel_check_anon` (`src/ffi/kernel.rs`) switches to `get_anon_mmap`; the old `std::fs::read` + `Env::get_anon` heap path is gone for the anon FFI. 2. Strip the persistent `LazyConstant` parsed-Constant cache - `LazyConstant.cache` was `Arc<OnceLock<Arc<Constant>>>` and accumulated forever — for mathlib that meant ~30 GB of parsed `Arc<Expr>` trees pinned in the env across the entire run. - New shape: `cache: Option<Arc<Constant>>`. Populated only by `from_constant` (compile-side, where we already own the parsed value). `from_bytes`/`from_mmap_slice` set `None`, and their `get()` parses fresh on every call without storing. - Re-parse cost is bounded by the existing per-worker dedup: `kenv.consts` already addresses each ingressed constant once per work item, and `clear_releasing_memory()` drops the kenv between items. The kenv is now the only persistent materialization layer. 3. `LazyConstant::peek_variant` for `build_anon_work` - One-byte read of the outer Tag4 head to identify the `ConstantInfo` variant — no body parse, no allocation. - `build_anon_work` now dispatches on `peek_variant()`; only `Muts` blocks trigger `lc.get()` (we need the member list for projection-address enumeration), and that `Arc<Constant>` drops at the end of the match arm. - Previously every constant was fully materialized at startup just to read its variant tag. With ~95% of the env being standalone/projection, the work-enumeration pass now skims the env at near-IO speed. Test updates: - `mmap_slice_roundtrips` (already added with the earlier mmap scaffolding) exercises the mmap window roundtrip. - New `from_bytes_does_not_cache` and `from_constant_clones_share_cache` document the new caching contract. - `peek_variant_*` tests cover every variant + empty-bytes and unknown-flag error paths. - `lazy_sparsity_only_materializes_closure` in `src/ix/ixon/env.rs` reframed to assert BFS-of-closure correctness instead of cache side-effects (`is_materialized()` no longer fires for lazy loads). End-to-end on compilemathlib.ixe (--anon, 32 workers): before: 640658/640658 in 266s, peak RSS ~40 GB after: 640658/640658 in 223s, peak RSS 11.3 GB * Address review findings: correctness + small refactors Code-review pass over the anon-mode / mmap / cache-strip work. This commit lands the highest-value subset — the correctness fixes and the small refactors that prevent future drift — and leaves the larger items (format-version bump, sealed marker trait, AnonEnv audit) for separate follow-ups. Correctness: - Verify per-constant address on load (#1). `LazyConstant::verify_address` existed but was never invoked from `Env::get`, `get_anon`, or `get_anon_mmap`. The env-level merkle root catches missing/extra entries but not byte-tampering of a constant whose key is intact; without this check, corruption surfaced much later inside `LazyConstant::get` with a misleading parse error. Inline the `Address::hash(bytes) == addr` check in each loader's Section-2 loop. Added 3 corruption-detection tests (`env_const_bytes_tampering_*`). This required updating a handful of existing tests that stored constants under fake `Address::hash(b"a")` keys instead of their true content hashes — round-tripping such envs now correctly rejects. Added a `store_canonical(env, c) -> Address` test helper for the canonical pattern, and `*_discriminator(refs, n)` so tests can produce content-distinct constants when the same ref-set would otherwise collide. - Hard-error in `ixon_env_to_decoded` on parse failure (#6). `src/ffi/ixon/env.rs` used `filter_map` to silently drop any const whose bytes failed `LazyConstant::get` — the Lean caller had no signal of the lost entries. Switch to a Result-collecting loop and propagate the first parse error; update both FFI call sites (`rs_de_env`, `rs_de_env_anon`). - Doc fixes (#4, #5). `docs/Ixon.md`'s AssumptionTree section described only `Leaf=0x00` and `Node=0x01`, missing `Padding=0x01` (and the docs' `Node` tag collided with the real `Padding` tag — the real `Node` is `0x02`). Also: the env-section table cell said "opt-tagged merkle root" but the implementation writes a bare 32-byte address. Refactors that prevent future drift: - Canonical projection-address helpers (#9). Four production sites reconstructed `Constant::new(IxonCI::{D,I,R,C}Prj{...}).commit().0` by hand; the anon pipeline silently breaks if any one drifts. Extract `defn_proj_address` / `indc_proj_address` / `recr_proj_address` / `ctor_proj_address` (plus `_constant` variants) in `src/ix/ixon/constant.rs`. Update `compile.rs`, `compile/mutual.rs`, and the anon `*_proj_addr` helpers to call them. - `verify_proj_addr_in_env` helper (#21). `ingress_anon_block` had the same "computed-address not in env" check repeated four times (DPrj/RPrj/IPrj/CPrj). DRY into one helper that produces a consistent error format. UX: - Anon display labels switched to `#<hex>` everywhere (#18). Rust was emitting `@<hex>` in progress/fail-out; Lean's `runCheckAnon` was emitting `#{i}` (result index). Standardize on `#<hex>` so the CLI failure summary is joinable with the fail-out file. This required exposing addresses per result slot to Lean — `rsCheckAnonFFI` now returns `Array (String × Option CheckError)` pairing each result with its content-address hex string. Rust builds the pairs via a new `build_anon_result_array` helper. - Stale "check-ixon" header in FailureLog (#16). One-line rename in the doc comment + `# ix check-ixon failures` writeln to `# ix check failures`. Tests: - `testPrimitivesParity` (PrimAddrs regen-parity test). Catches silent drift between hardcoded primitive addresses in `PrimAddrs::new()` and what `rsCompileEnvFFI` produces from the live Lean primitives. Plumbing: new `PrimAddrs::lean_parity_table()` returns `(lean_name, hex)` pairs; new `rs_prim_addrs_canonical` FFI exposes them to Lean; new test in `BuildPrimitives.lean` iterates `kernelPrimitives`, compares against the hardcoded table, and fails with a printable diff on mismatch (with instructions to regenerate). Skipped: `eagerReduce` — synthetic kernel marker whose PrimAddrs value (`0xff…3`) intentionally diverges from its compiled content hash (which collides with `id`). - 3 new corruption-detection tests in `serialize.rs` exercise the Section-2 verify check for `Env::get`, `Env::get_anon`, `Env::get_anon_mmap`. Verification: `cargo test --lib` 1025 passing (3 new), `cargo clippy --lib --all-targets` clean, `lake build` clean, `lake exe ix check compileinitstd.ixe` 105487/105487 in ~21s, `--anon` 89010/89010 with peak RSS 1.4 GB, `.lake/build/bin/IxTests --ignored rust-kernel-build-primitives` shows both `build primitives dump` and `primitive address parity (PrimAddrs vs live compile)` pass. Out of scope (deferred to follow-ups): - #2 Format version bump (Env::FLAG) - #3 rs_kernel_check_consts_anon still uses Env::get on main - #8 Sealed marker trait for the lazy_anon transmute - #11/#12 AnonEnv audit (vestigial wrapper) - #13, #15, #17, #19, #20, #22-25 Various quality cleanups * Round-2 review fixes: mmap defense + small cleanup Six small items from the round-2 review of the anon-mode work: - N1: stat-at-open defense in `Env::get_anon_mmap` (`src/ix/ixon/serialize.rs`). Capture the file size before mmap; if the kernel's mapped length disagrees (file truncated between open and map) bail with a clear error instead of letting workers SIGBUS deep in `LazyConstant::get`. Truncate-in-place under a live mapping is still undefined per POSIX — documented as a caller contract — but the open-time check catches the common case. - New `get_anon_mmap_survives_file_unlink` test: load via mmap, unlink the path, then materialize both already-touched and not-yet-touched constants. Locks in the inode-retention invariant that the SIGBUS analysis depends on; a future refactor that switched to `mmap_anonymous` or copied bytes into a tmpfile would fail loudly here instead of letting workers SIGBUS in production. - #11: delete `AnonEnv::as_ixon_env_unchecked` (`src/ix/kernel/anon_env.rs`). Dead `pub(crate)` escape hatch marked `#[allow(dead_code)]` — confirmed zero callers and removed. - #13: `debug_assert!` on `OnceLock::set` for both result vectors (`src/ffi/kernel.rs`, meta + anon paths). The previous `let _ = results[idx].set(...)` silently dropped a re-set. If a future `build_*_work` dedup refactor breaks the one-write-per-slot invariant, debug builds now panic with the slot index instead of silently losing results. - #19: `allNames.contains` O(n²) preflight → `Std.HashSet` lookup (`Ix/Cli/CheckCmd.lean`). At mathlib scale (~700k env names × thousands of seed names) the previous linear scan-per-name spent measurable seconds on missing-name preflight alone. - N4: doc imprecision in `src/ix/ixon/lazy.rs` — the cache-policy preamble said the worker `KEnv` is "cleared between work items" but the actual cadence is `clear_every` items (`IX_KERNEL_CHECK_CLEAR_EVERY`, default 1 but tunable). Refined wording so the doc matches the code. Verification: `cargo test --lib` 1026 passing (1 new), `cargo clippy --lib --all-targets` clean, `lake build` clean, `lake exe ix check compileinitstd.ixe --anon` 89010/89010 in 15.2s peak RSS 1.4 GB (regression-clean). Out of scope, separate follow-ups: - #2 Format version bump - #3 rs_kernel_check_consts_anon zkPCC FFI (Env::get → mmap) - #8 Sealed marker trait for the lazy_anon transmute - #10 child_arena closure side-effect (widen MField to tuple) - #12 AnonEnv duplicates bfs_refs / transitive_deps_excl - #14 anon worker bypasses M-generic display helpers - #17 Anon --fail-out docstring claims --consts-file compat - #20 Singleton-unwrap duplicated across compile paths - #22-24 Lean `partial def`, `.get!` in tests, unguarded as-u64 casts - N2/N3 Lean is_materialized() callers + compute_const_size_breakdown - Test gaps: rs_kernel_check_anon integration, concurrent mmap, verify_address-on-mmap-corruption, etc. * rustfmt * Tests: derive RawConst addrs from content (verify_address fix) The `Address::hash(bytes) == addr` defense added in f792102 rejects `RawConst { addr, const }` pairs where `addr` is uncorrelated with `serConstant const`. The Rust-side tests in `serialize.rs` were updated at the time (`store_canonical` helper), but three Lean-side sites still produced mismatched pairs and surface now as: × ∃: Env serialization Lean==Rust × ∃: serde RawEnv with data × ∃: serde RawEnv roundtrip deserialization failed: rs_de_env: Env::get: const at idx 0 bytes hash to 6110b739… but stored under b177ec1b… Three fixes, all derive `addr := Address.blake3 (serConstant c)`: - `Tests/Gen/Ixon.lean::genRawConst` — property-test generator was `RawConst.mk <$> genAddress <*> genConstant` (uncorrelated). Now generates the const first, then derives addr from `serConstant`. - `Tests/FFI/Lifecycle.lean::serdeTests` (`withData` unit case) — was using one `testAddr := Address.blake3 #[1,2,3]` for both the const and unrelated blob/comm slots. Split into `testAddr` (still used for the content-hash-free blob/comm) and `testConstAddr := Address.blake3 (serConstant testConst)`. Added a name entry for the new canonical addr. - `Tests/FFI/Lifecycle.lean::genSerdeRawEnv` — the "pool of addresses" pattern picked const addrs from a random pool that couldn't possibly match content hashes. Restructured: consts derive canonical addrs from content, each gets its own name-table entry appended to the pool-derived entries so the serde pipeline's "all addresses resolvable" invariant still holds. * Regenerate Aiur primitive addresses for singleton-unwrap Compiler's singleton non-inductive Muts unwrap (12630aa) changed the content hashes of 48 primitives (`Nat.rec`, `Nat.add`, `Nat.mul`, …). The Aiur kernel's `Ix/IxVM/Kernel/Primitive.lean` still held the pre-unwrap blake3 bytes, so primitive dispatch missed every affected constant and fell through to structural `Nat.rec` unfolding. Tests like `IxVMPrim.nat_mul_big` then drove millions of Nat.succ unfoldings inside the Aiur trace, OOMing the prover. Addresses regenerated from `PrimAddrs::lean_parity_table()` (which the branch had already updated in `src/ix/kernel/primitive.rs`). * Fix standalone Recr ingress: typ-based inductive lookup `find_matching_block_addr` heuristic in the standalone-Recr branch of `build_convert_inputs_walk` picks the inductive block by matching ctor count to rule count. When multiple in-scope inductives share the same ctor count, it returns the wrong block — the stored rules' `ctor_idx` then points to a sibling's ctor, while `populate_rules` (canonical) derives `ctor_idx` from the right inductive's `ctor_indices`. The mismatch surfaces as `compare_rules`'s `assert_eq!(s_ctor, c_ctor)` panic (e.g. `38 != 40` for `IxVMPrim.nat_land_lit`). Switch the standalone-Recr path to the same typ-based resolution `build_aux_recr_ctor_idxs` already uses for aux Recr blocks: peel `params + motives + minors + indices` foralls of `recr.typ`, take the major's head Ref, look it up in `refs` to get the inductive's address, then resolve `IPrj.block` for the Muts wrapper. Slice ctor positions for the specific member via `extract_member_ctor_idxs`. Also store the resolved Muts `block_addr` in `CKRecr` (was passing the heuristic's wrong address through to `convert_recursor`). --------- Co-authored-by: Arthur Paulino <arthurleonardo.ap@gmail.com> Co-authored-by: Samuel Burnham <45365069+samuelburnham@users.noreply.github.com>
1 parent ac459a3 commit fa93e54

49 files changed

Lines changed: 7026 additions & 1332 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Cargo.lock

Lines changed: 14 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ rayon = "1"
1919
rustc-hash = "2"
2020
tiny-keccak = { version = "2", features = ["keccak"] }
2121
dashmap = { version = "6.1.0", features = ["rayon"] }
22+
memmap2 = "0.9"
2223
sha2 = "0.10"
2324
# Iroh dependencies
2425
bytes = { version = "1.10.1", optional = true }

Ix.lean

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ public import Ix.CompileM
1212
public import Ix.DecompileM
1313
public import Ix.KernelCheck
1414
public import Ix.Claim
15+
public import Ix.Merkle
16+
public import Ix.AssumptionTree
1517
public import Ix.Commit
1618
public import Ix.Benchmark.Bench
1719
public import Ix.Aiur

Ix/AssumptionTree.lean

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
/-
2+
# AssumptionTree: serializable merkle tree over `Address` leaves
3+
4+
Used to recover the leaf set committed to by a conditional claim's
5+
`assumptions` root. The root alone tells the verifier *which* set
6+
was assumed; the AssumptionTree carries the actual leaves so the
7+
verifier can inspect them.
8+
9+
Two construction modes — both produce the same `node`-shaped trees,
10+
differ only in how leaves are arranged:
11+
12+
- `canonical leaves` builds the same shape that `Ix.Merkle.merkleRootCanonical`
13+
hashes, with `padding` nodes wherever odd-leaf padding occurs.
14+
- `join l r` is free-form O(1) composition; result root matches
15+
`Ix.Merkle.merkleJoin`.
16+
17+
## Serialization
18+
19+
Tag4 size 2 under flag 0xE:
20+
21+
```text
22+
[Tag4(0xE, 2) = 0xE2] [body]
23+
24+
body recursive:
25+
leaf(addr): [0x00] [addr:32]
26+
padding: [0x01]
27+
node(l, r): [0x02] [body l] [body r]
28+
```
29+
30+
`padding` represents the zero-sentinel slot used by the canonical
31+
builder to even out odd levels; its root is exactly `zeroAddress`,
32+
matching the bare 32-byte zero that `Ix.Merkle` mixes into odd-level
33+
hashing. Splitting it from `leaf` keeps `leaves` clean (returns only
34+
real leaves, not synthetic padding addresses).
35+
-/
36+
37+
module
38+
public import Ix.Address
39+
public import Ix.Merkle
40+
public import Ix.Ixon
41+
42+
public section
43+
44+
namespace Ix
45+
46+
open Ixon
47+
open Ix.Merkle (leafHash nodeHash zeroAddress merkleJoin)
48+
49+
/-- A merkle tree over `Address` leaves with explicit shape. -/
50+
inductive AssumptionTree where
51+
| leaf (addr : Address)
52+
| padding
53+
| node (left right : AssumptionTree)
54+
deriving BEq, Repr, Inhabited
55+
56+
namespace AssumptionTree
57+
58+
/-- Recursively compute the root hash. -/
59+
partial def root : AssumptionTree → Address
60+
| .leaf addr => leafHash addr
61+
| .padding => zeroAddress
62+
| .node l r => nodeHash l.root r.root
63+
64+
/-- In-order traversal of real leaves (skips `padding`). Iterative
65+
stack-based walk to avoid stack overflow on deep trees. -/
66+
partial def leaves (t : AssumptionTree) : Array Address := Id.run do
67+
let mut acc : Array Address := #[]
68+
let mut stack : Array AssumptionTree := #[t]
69+
while !stack.isEmpty do
70+
let top := stack.back!
71+
stack := stack.pop
72+
match top with
73+
| .leaf a => acc := acc.push a
74+
| .padding => continue
75+
| .node l r =>
76+
-- Push right first so left is processed first (in-order via LIFO).
77+
stack := stack.push r
78+
stack := stack.push l
79+
return acc
80+
81+
/-- True iff `target` appears as a `leaf` somewhere in the tree. -/
82+
partial def contains (t : AssumptionTree) (target : Address) : Bool :=
83+
match t with
84+
| .leaf a => a == target
85+
| .padding => false
86+
| .node l r => l.contains target || r.contains target
87+
88+
/-- Build the canonical sorted+padded merkle tree over a leaf set.
89+
Returns `none` for an empty (post-dedup) leaf set. Matches the
90+
shape committed to by `merkleRootCanonical`. -/
91+
partial def canonical (leaves : Array Address) : Option AssumptionTree :=
92+
let sorted := dedupSorted (leaves.qsort fun a b => compare a b == .lt)
93+
if sorted.isEmpty then
94+
none
95+
else if sorted.size == 1 then
96+
some (.leaf sorted[0]!)
97+
else
98+
some (reduce (sorted.map .leaf))
99+
where
100+
dedupSorted (xs : Array Address) : Array Address := Id.run do
101+
if xs.isEmpty then return #[]
102+
let mut acc : Array Address := #[xs[0]!]
103+
for i in [1:xs.size] do
104+
if !(xs[i]! == xs[i-1]!) then acc := acc.push xs[i]!
105+
return acc
106+
reduce (level : Array AssumptionTree) : AssumptionTree :=
107+
if level.size == 1 then level[0]!
108+
else reduce (pairLevel level)
109+
pairLevel (level : Array AssumptionTree) : Array AssumptionTree := Id.run do
110+
let mut next : Array AssumptionTree := #[]
111+
let mut i := 0
112+
while i < level.size do
113+
let l := level[i]!
114+
let r := if i + 1 < level.size then level[i+1]! else .padding
115+
next := next.push (.node l r)
116+
i := i + 2
117+
return next
118+
119+
/-- Combine two existing subtrees into a new free-form node in O(1). -/
120+
@[inline] def join (l r : AssumptionTree) : AssumptionTree := .node l r
121+
122+
/-- Recursive helper for `merkleProof`. Returns the leaf-to-root path
123+
if `target` is present, else `none`. -/
124+
partial def searchPath (t : AssumptionTree) (target : Address)
125+
: Option (Array (Address × Bool)) :=
126+
match t with
127+
| .leaf a => if a == target then some #[] else none
128+
| .padding => none
129+
| .node l r =>
130+
match l.searchPath target with
131+
| some p => some (p.push (r.root, false))
132+
| none =>
133+
match r.searchPath target with
134+
| some p => some (p.push (l.root, true))
135+
| none => none
136+
137+
/-- Produce a merkle membership path for `target`. Path is in
138+
leaf-to-root order (matches `verifyMerkleProof`). -/
139+
def merkleProof (t : AssumptionTree) (target : Address)
140+
: Option Ix.Merkle.MerklePath := searchPath t target
141+
142+
/-! ## Serialization -/
143+
144+
def FLAG : UInt8 := 0xE
145+
def VARIANT : UInt64 := 2
146+
147+
def BODY_LEAF : UInt8 := 0x00
148+
def BODY_PADDING : UInt8 := 0x01
149+
def BODY_NODE : UInt8 := 0x02
150+
151+
partial def putBody : AssumptionTree → PutM Unit
152+
| .leaf addr => do
153+
putU8 BODY_LEAF
154+
Serialize.put addr
155+
| .padding => do
156+
putU8 BODY_PADDING
157+
| .node l r => do
158+
putU8 BODY_NODE
159+
putBody l
160+
putBody r
161+
162+
def put (t : AssumptionTree) : PutM Unit := do
163+
putTag4 ⟨FLAG, VARIANT⟩
164+
putBody t
165+
166+
partial def getBody : GetM AssumptionTree := do
167+
let tag : UInt8 ← getU8
168+
if tag == BODY_LEAF then
169+
return .leaf (← Serialize.get)
170+
else if tag == BODY_PADDING then
171+
return .padding
172+
else if tag == BODY_NODE then
173+
let l ← getBody
174+
let r ← getBody
175+
return .node l r
176+
else
177+
throw s!"AssumptionTree.getBody: invalid body tag {tag.toNat}"
178+
179+
def get : GetM AssumptionTree := do
180+
let tag ← getTag4
181+
if tag.flag != FLAG || tag.size != VARIANT then
182+
throw s!"AssumptionTree.get: expected Tag4 0xE/2, got {tag.flag.toNat}/{tag.size}"
183+
getBody
184+
185+
def ser (t : AssumptionTree) : ByteArray := runPut (put t)
186+
def de (bytes : ByteArray) : Except String AssumptionTree :=
187+
runGet get bytes
188+
189+
end AssumptionTree
190+
191+
end Ix
192+
193+
end

0 commit comments

Comments
 (0)