Skip to content

Commit 096fe8e

Browse files
IxVM kernel: cut Aiur FFT cost on reduction- and check-heavy shards (#438)
* IxVM: cheaper Expr.Share / universe lookups (list_drop + field index) Expr.Share resolution and universe lookups walked the sharing/univ lists with `list_lookup_u64` (per-step `u64_is_zero` + `relaxed_u64_pred`). Switch to: - universe lookups: `list_lookup(univs, flatten_u64(idx))` — cheap per-step field subtraction instead of the U64 predecessor. - Expr.Share (Convert convert_expr + Ingress deref_share): `let ListNode.Cons(e, _) = load(list_drop(sharing, flatten_u64(idx)));`. list_drop returns the sublist *pointer*; repeated Share lookups collapse to shared memo entries, so the walk is near-free. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 10` (Init, owned=3, closure=676): total FFT 3.89G -> 2.11G. The merged pointer-list walk drops from 1.93G (list_lookup) to 5.8M (list_drop, 0.27%). (Measurement run had in-circuit blake3 verification disabled — not committed.) * IxVM: hot/cold split address_eq on a limb-0 prefilter The primitive-dispatch gauntlet in whnf compares each Const head against ~50 known primitive addresses, so address_eq is the single hottest circuit. The full 32-limb compare ran at width 109 on every call. Split it: `address_eq` now compares limb 0 only and rejects (the common case — a single differing limb proves inequality), delegating the remaining 31-limb compare to a separate `address_eq_tail`. Because Aiur charges a function's full width on every one of its rows, the cold path must be a separate function (a nested match in address_eq changes nothing — measured identical); as its own circuit, address_eq_tail's height is only the rare limb-0-match rows. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 10`: - address_eq: width 109 -> 80, FFT 509.8M -> 374.1M - address_eq_tail (new): width 108, height 1683, FFT 1.9M - total shard FFT: 2.106G -> 1.973G Limb-0 is the sweet spot: hot height (259991) dominates tail height (1683), so pulling a second limb forward (N=2) widens the hot circuit by more than it saves in the tail (measured 1.976G, worse). Result is identical to a full 32-limb compare (sound). (Measurement run had in-circuit blake3 verification disabled — not committed.) * IxVM: hot/cold split whnf_with_spine (Const + Proj head dispatch) whnf_with_spine is the single hottest circuit on reduction-heavy consts: width 89 charged on every one of its 3.36M rows. The width was set by two arms that only a minority of rows reach — the Const-head dispatch (delta/iota/quot/prim gauntlet) and the Proj-head reduction. Factor both into their own functions (`whnf_const_head`, `whnf_proj_head`). Because Aiur charges a function's full width on every row, the ~76% of steps that are App/Lam/Let now run at the narrow residual width instead of 89; the wide dispatch only taxes its own (smaller) row count. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9` (1 reduction-heavy owned const): - whnf_with_spine: width 89 -> 34, FFT 6.48G -> 2.48G - whnf_const_head (new): width 77, height 821490, FFT 1.24G - whnf_proj_head (new): width 56, height 3452, FFT 0.002G - total shard FFT: 45.54G -> 42.78G (-6.1%) The Proj arm is rare (3452 rows) but was nearly half the width; extracting it dropped whnf_with_spine 65 -> 34 for ~zero relocated cost. Pure refactor. * IxVM: non-tail collect_spine, single shared definition Replace the tail-recursive `collect_spine_go(e, acc)` (and its verbatim copy `collect_spine_simple_go`) with one non-tail `collect_spine(e)` in the kernelTypes block. Repoint all callers (whnf, primitive, inductive_check, def_eq); delete both old copies. Keyed on `e` alone, it now dedups shared sub-spines (0 -> 2.19M cache hits). Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`: - collect_spine: 5.38M rows / 2.65G -> 2.17M rows / 0.96G - list_snoc.Ptr: +0.96G (new) - total shard FFT: 42.78G -> 41.67G * IxVM: keyed skip-set for sharded check (replace addr_in_list scan) check_all_skipping tested each closure const against the assumption-leaf list via addr_in_list, an O(N) linear address_eq scan — the single largest address_eq source on sharded checks (1.81M calls on Init shard 9, more than all primitive dispatch combined). Build an RBTreeMap skip-set once (keyed on the first 4 address bytes), then test membership with one rbtree lookup + one confirming address_eq. Key collisions overwrite: sound because a missed skip only rechecks a frontier const, and the confirming address_eq rules out false positives. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`: - address_eq: 2.07M rows / 3.48G -> 274K rows / 0.40G - rbtree build + lookups: +~0.46G - total shard FFT: 41.67G -> 37.84G * IxVM: collapse symbolic-base Nat.rec succ-tower to an offset try_nat_linear_rec only folded `Nat.rec base (λ_ ih => succ ih) (Lit n)` when the base was also a literal; a symbolic base declined and fell through to n iota steps that materialize succ^n(base). Reduce the symbolic-base case directly to the offset primitive `Nat.add base (Lit n)` instead. This keeps the value in the same compact `base + n` form a literal already has, so def-eq comparing it converges instead of descending n unary succ layers (the UTF-8 `x + 0xC0` reduction class). Offset magnitude stays KLimbs; only the nat_add top-position index is a (small) G. Measured on a `x + D = Nat.rec x succ-step D` benchmark: the unary def-eq descent collapses (nat_succ_of D -> ~constant), ~-15% total FFT at D=512. The remaining cost is the in-whnf chain reduction (substitution), unchanged here. * IxVM: keep Nat base+literal compact (stop succ-tower materialization) Reducing `Nat.add x (Lit n)` (symbolic x) delta-unfolded into a succ^n(x) tower, and the nat_succ dispatch then whnf-cascaded the whole chain — O(n) substitution per reduction, the dominant cost on Nat-arithmetic proofs (the UTF-8 codec class). Two coordinated changes: - whnf (try_nat_offset_stuck): leave `Nat.add base (Lit n)` (non-literal base, nonzero n) stuck as a compact offset instead of unfolding it. - def-eq (nat_offset_of + try_def_eq_nat): decompose each side to base + KLimbs offset (reading `Nat.add base (Lit m)` in O(1), peeling the few succs whnf exposes) and decide `base_a ≟ base_b` only when offsets are equal; otherwise fall back. All magnitudes stay KLimbs (no Goldilocks collapse). Measured on `x + D = Nat.rec x succ-step D` (depths 64..512): total FFT goes from O(D) (328M at D=512) to a depth-independent ~28.5M (-91% at D=512). Full `lake test -- --ignored ixvm` passes (297 tests; BAD cases still rejected); a recursor-on-symbolic-offset completeness test reduces correctly. * IxVM: keep symbolic Nat.div/mod stuck (stop division-tower blowup) `Nat.div x (Lit k)` / `Nat.mod x (Lit k)` with a symbolic base delta-unfolded the division algorithm and expanded hugely, even though the result is irreducible for a symbolic base. `Nat.shiftRight x k` unfolds to k nested `Nat.div _ 2`, so a symbolic `>>>` materialized a deep, super-linear tower — the dominant cost in the UTF-8 codec's `c >>> 6/12/18` encoding. Generalize `try_nat_offset_stuck` (which already keeps `Nat.add base (Lit n)` compact) to also leave `Nat.div`/`Nat.mod base (Lit k)` (k ≥ 2) stuck. Thresholds keep `x/1 = x` and `x/0 = 0` reducing. Both `x >>> 18` and `(x >>> 9) >>> 9` then reduce to the same nested-stuck-div form, so def-eq stays structural and the identity still holds. Measured on `x >>> 18 = (x >>> 9) >>> 9`: total FFT 15.65G -> 455M (-97%), now depth-independent (matches `x >>> 6`). No regression on the Nat.add reproducers; full `lake test -- --ignored ixvm` passes (297 tests). * IxVM: narrow Ixon expr deserialization (Let split + telescope by &Expr) Ingress deserialization (parsing every closure const's Ixon bytes into Expr trees) is the dominant cost when checking large-closure constants. Two width cuts on the hottest parsers: - get_expr: the Let arm (three recursive get_expr calls) set the function width though Let is rare. Factored into get_expr_let — width 102 -> 75. - get_app_telescope: passed `func` by value (the wide Expr union) on every recursion step. Pass it by `&Expr`, loaded only at the base case — width 95 -> 78. Measured by profiling ingress of the `ByteArray.utf8DecodeChar?…` closure (2000 consts): get_expr 766M -> 563M, get_app_telescope 477M -> 392M, total ingress FFT 3.13G -> 2.85G. Full `lake test -- --ignored ixvm` passes (297). * IxVM: lbr-trimmed context keys for whnf/infer/def-eq memoization Key whnf / k_infer / k_is_def_eq memoization on the loose-bvar-reachable suffix of the local context (ctx_trim / ctx_reachable), mirroring Rust's whnf_key / infer_key / def_eq_ctx_key. A closed subterm (lbr 0) keys to the empty context and shares its result across every binder depth; eager substitution is unchanged. Fast paths (closed -> Nil, full-reach -> as-is) keep the boundary cheap. FFT (lake exe ix check <const> --stats-out): Nat.add_comm 58.28M -> 57.90M (-0.66%); Int.emod_emod_of_dvd 5.090G -> 4.691G (-7.85%); Init shard 10 4.738G -> 4.721G (-0.37%). 297 ixvm tests pass. * ix CLI: resolve private Lean.Name args in check/prove parseName can't rebuild a private name (_private.M.0.foo) from its display string -- the marker / scope-index components don't round-trip through naive dot-splitting. Add toString fallbacks: resolveName for the compiled-in env and resolveIxeAddr for a .ixe env, so check and prove (which share forEachClaim) resolve names like _private.Init.Data.Vector.Extract.0.Vector.extract_append._proof_1 across both the compiled-in and --ixe paths. * IxVM: multi-arg beta with narrow substitution walk whnf_apply_beta substituted one spine arg per expr_inst1, re-walking the (shrinking) body for each, so a K-arg application `(λλλ.body) x y z` walked the body K times. Mirror Rust whnf.rs:541-567: peel the whole telescope (peel_beta) and substitute all consumed args in ONE expr_inst_many walk (simul_subst semantics). Single-arg betas stay on the cheap expr_inst1 path, so short telescopes don't pay the list overhead (why the earlier unconditional simul_subst regressed). Hot/cold split expr_inst_many_walk's BVar arm (list_length / list_lookup / expr_lift) into expr_inst_many_bvar so the hot App/Lam walk stays narrow. Substitution was 53% of FFT on omega-heavy proofs. FFT (ix check <const>): _private.…Vector.extract_append._proof_1 56.64G -> 38.31G (-32.4%); Nat.add_comm 57.90M -> 56.17M; Init shard 10 4.721G -> 4.668G (-1.1%); shard 9 16.346G -> 16.314G. 297 ixvm tests pass. * IxVM: gate the primitive gauntlet behind a memoized prim_any_addr whnf_const_head ran 5 try_* dispatchers (nat / str / bitvec / native / decidable) in sequence on every Const head, each comparing the head address against its family's primitive addresses — wasted on the common non-primitive head, and worse, several try_* speculatively WHNF their args (to test for literals / decidables), cascading into deep reductions that are then thrown away. Factor the chain into try_address_primitives and gate it behind prim_any_addr: a memoized (per head address) sum of the 43 head-dispatched primitive address_eq checks — 1 only for a real primitive, so a 0 skips the whole gauntlet straight to projection-definition / delta. Validated by a differential assert (prim_any_addr must be 1 whenever the gauntlet matched) run over the 297-test suite + Int.emod_emod_of_dvd + Vector.extract_append._proof_1; it caught two missing addresses (string_utf8_byte_size, punit_size_of_1), then passed clean across millions of reductions. Assert removed after validation. FFT (ix check <const>): _private.…Vector.extract_append._proof_1 38.31G -> 28.02G (-26.8%); Int.emod_emod_of_dvd 4.69G -> 3.97G (-15%); Nat.add_comm 56.17M -> 56.06M; Init shard 9 16.314G -> 16.279G; shard 10 4.668G -> 4.659G. 297 ixvm tests pass.
1 parent 623be4b commit 096fe8e

12 files changed

Lines changed: 646 additions & 216 deletions

Ix/Cli/CheckCmd.lean

Lines changed: 34 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,16 @@ def parseName (arg : String) : Lean.Name :=
4545
| some n => Lean.Name.mkNum acc n
4646
| none => Lean.Name.mkStr acc s
4747

48+
/-- Resolve a CLI name argument against the env. `parseName` can't rebuild
49+
private names (`_private.M.0.foo`) — the marker/scope-index components
50+
don't round-trip through naive dot-splitting. So if the parsed name
51+
isn't present, fall back to matching the arg against each constant's
52+
`toString` (the displayed form the user copied). -/
53+
def resolveName (env : Lean.Environment) (arg : String) : Option Lean.Name :=
54+
let parsed := parseName arg
55+
if env.constants.contains parsed then some parsed
56+
else (env.constants.toList.find? (fun (k, _) => toString k == arg)).map (·.1)
57+
4858
def addrOfHex! (label : String) (s : String) : IO Address := do
4959
match Address.fromString s with
5060
| some a => pure a
@@ -88,6 +98,16 @@ partial def ixNameToLeanName : Ix.Name → Lean.Name
8898
| .str p s _ => .str (ixNameToLeanName p) s
8999
| .num p n _ => .num (ixNameToLeanName p) n
90100

101+
/-- Resolve a CLI name argument against a `.ixe` env's named map. Like
102+
`resolveName` for the compiled-in env: `parseName` can't rebuild private
103+
names, so when the direct lookup misses, fall back to matching the arg
104+
against each named constant's displayed `toString`. -/
105+
def resolveIxeAddr (ixonEnv : Ixon.Env) (arg : String) : Option Address :=
106+
match ixonEnv.getAddr? (Ix.Name.fromLeanName (parseName arg)) with
107+
| some a => some a
108+
| none =>
109+
(ixonEnv.named.toList.find? (fun (k, _) => toString (ixNameToLeanName k) == arg)).map (·.2.addr)
110+
91111
/-- Build a `ClaimWitness` for the `verify_claim` entrypoint against
92112
`Ix.Claim.check addr none` (full transitive typecheck of the
93113
target's closure). -/
@@ -189,14 +209,13 @@ def forEachClaim
189209
if !keepGoing then break
190210
else
191211
for arg in names do
192-
let name := parseName arg
193-
match ixonEnv.getAddr? (Ix.Name.fromLeanName name) with
212+
match resolveIxeAddr ixonEnv arg with
194213
| none =>
195-
IO.eprintln s!"{name} not found in {path}"
196-
failures := failures.push (toString name)
214+
IO.eprintln s!"{arg} not found in {path}"
215+
failures := failures.push arg
197216
if !keepGoing then break
198217
| some addr =>
199-
let label := toString name
218+
let label := arg
200219
let claim := Ix.Claim.check addr none
201220
let witness ← mkWitness addr ixonEnv
202221
if (← runOne claim witness label) ≠ 0 then
@@ -224,17 +243,18 @@ def forEachClaim
224243
if !keepGoing then break
225244
else
226245
for arg in names do
227-
let name := parseName arg
228-
if !env.constants.contains name then
229-
IO.eprintln s!"{name} not found"
230-
failures := failures.push (toString name)
246+
match resolveName env arg with
247+
| none =>
248+
IO.eprintln s!"{arg} not found"
249+
failures := failures.push arg
231250
if !keepGoing then break
232251
else continue
233-
let label := toString name
234-
let (claim, witness) ← buildOne name
235-
if (← runOne claim witness label) ≠ 0 then
236-
failures := failures.push label
237-
if !keepGoing then break
252+
| some name =>
253+
let label := toString name
254+
let (claim, witness) ← buildOne name
255+
if (← runOne claim witness label) ≠ 0 then
256+
failures := failures.push label
257+
if !keepGoing then break
238258

239259
if failures.isEmpty then pure 0
240260
else

Ix/IxVM/Convert.lean

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,9 @@ def convert := ⟦
6868
match load(idxs) {
6969
ListNode.Nil => store(ListNode.Nil),
7070
ListNode.Cons(idx, rest) =>
71-
let u = load(list_lookup_u64(univs, idx));
71+
-- universe indices are small; walk with a field index (cheap per-step
72+
-- field sub) instead of `list_lookup_u64`'s per-step U64 predecessor.
73+
let u = load(list_lookup(univs, flatten_u64(idx)));
7274
store(ListNode.Cons(store(convert_univ(u)), convert_univ_idxs(rest, univs))),
7375
}
7476
}
@@ -145,7 +147,9 @@ def convert := ⟦
145147
) -> KExpr {
146148
match load(e) {
147149
Expr.Srt(univ_idx) =>
148-
let u = load(list_lookup_u64(univs, univ_idx));
150+
-- field-indexed walk (see `convert_univ_idxs`): avoids the per-step
151+
-- U64 predecessor of `list_lookup_u64` on this hot universe lookup.
152+
let u = load(list_lookup(univs, flatten_u64(univ_idx)));
149153
store(KExprNode.Srt(store(convert_univ(u)))),
150154

151155
Expr.Var(idx) =>
@@ -199,7 +203,8 @@ def convert := ⟦
199203
convert_expr(body, sharing, ref_idxs, recur_idxs, lit_blobs, univs))),
200204

201205
Expr.Share(idx) =>
202-
convert_expr(list_lookup_u64(sharing, idx), sharing, ref_idxs, recur_idxs, lit_blobs, univs),
206+
let ListNode.Cons(e, _) = load(list_drop(sharing, flatten_u64(idx)));
207+
convert_expr(e, sharing, ref_idxs, recur_idxs, lit_blobs, univs),
203208
}
204209
}
205210

Ix/IxVM/Ingress.lean

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -53,17 +53,23 @@ def ingress := ⟦
5353
bytes
5454
}
5555

56-
-- Compare two 32-byte addresses for equality
57-
fn address_eq(a: Addr, b: Addr) -> G {
58-
let [a0, a1, a2, a3, a4, a5, a6, a7,
56+
-- Compare two 32-byte addresses for equality.
57+
--
58+
-- Cold path: limb 0 already matched, compare the remaining 31 limbs.
59+
-- Factored into its own function so it forms a separate circuit whose height
60+
-- is only the (rare) limb-0-match rows; Aiur charges a function's full width
61+
-- on every one of its rows, so a nested match in `address_eq` would not save
62+
-- anything — the split must be a function boundary.
63+
fn address_eq_tail(a: Addr, b: Addr) -> G {
64+
let [_, a1, a2, a3, a4, a5, a6, a7,
5965
a8, a9, a10, a11, a12, a13, a14, a15,
6066
a16, a17, a18, a19, a20, a21, a22, a23,
6167
a24, a25, a26, a27, a28, a29, a30, a31] = load(a);
62-
let [b0, b1, b2, b3, b4, b5, b6, b7,
68+
let [_, b1, b2, b3, b4, b5, b6, b7,
6369
b8, b9, b10, b11, b12, b13, b14, b15,
6470
b16, b17, b18, b19, b20, b21, b22, b23,
6571
b24, b25, b26, b27, b28, b29, b30, b31] = load(b);
66-
match [to_field(a0) - to_field(b0), to_field(a1) - to_field(b1),
72+
match [to_field(a1) - to_field(b1),
6773
to_field(a2) - to_field(b2), to_field(a3) - to_field(b3),
6874
to_field(a4) - to_field(b4), to_field(a5) - to_field(b5),
6975
to_field(a6) - to_field(b6), to_field(a7) - to_field(b7),
@@ -80,7 +86,20 @@ def ingress := ⟦
8086
to_field(a28) - to_field(b28), to_field(a29) - to_field(b29),
8187
to_field(a30) - to_field(b30), to_field(a31) - to_field(b31)] {
8288
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
83-
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] => 1,
89+
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] => 1,
90+
_ => 0,
91+
}
92+
}
93+
94+
-- Limb-0 prefilter: a single differing limb proves inequality, and almost
95+
-- every comparison (the primitive-dispatch gauntlet in whnf) mismatches at
96+
-- limb 0. Hot rows reject here at narrow width; only limb-0 matches pay the
97+
-- wide `address_eq_tail` compare. Identical result to a full 32-limb compare.
98+
fn address_eq(a: Addr, b: Addr) -> G {
99+
let av = load(a);
100+
let bv = load(b);
101+
match to_field(av[0]) - to_field(bv[0]) {
102+
0 => address_eq_tail(a, b),
84103
_ => 0,
85104
}
86105
}
@@ -792,7 +811,9 @@ def ingress := ⟦
792811
-- Deref Expr.Share via the constant's sharing list.
793812
fn deref_share(e: Expr, sharing: List‹&Expr›) -> Expr {
794813
match e {
795-
Expr.Share(idx) => deref_share(load(list_lookup_u64(sharing, idx)), sharing),
814+
Expr.Share(idx) =>
815+
let ListNode.Cons(e, _) = load(list_drop(sharing, flatten_u64(idx)));
816+
deref_share(load(e), sharing),
796817
_ => e,
797818
}
798819
}

Ix/IxVM/IxonDeserialize.lean

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -101,15 +101,17 @@ def ixonDeserialize := ⟦
101101
-- Expression deserialization
102102
-- ============================================================================
103103

104-
-- App telescope: read count args, wrapping func in App nodes
105-
fn get_app_telescope(func: Expr, stream: ByteStream, count: U64) -> (Expr, ByteStream) {
104+
-- App telescope: read count args, wrapping func in App nodes. `func` is passed
105+
-- by pointer (loaded only at the base case) so the recursion doesn't carry the
106+
-- wide `Expr` union by value on every row.
107+
fn get_app_telescope(func: &Expr, stream: ByteStream, count: U64) -> (Expr, ByteStream) {
106108
let is_zero = u64_is_zero(count);
107109
match is_zero {
108-
1 => (func, stream),
110+
1 => (load(func), stream),
109111
0 =>
110112
let (arg, s) = get_expr(stream);
111-
let app = Expr.App(store(func), store(arg));
112-
get_app_telescope(app, s, relaxed_u64_pred(count)),
113+
let app = Expr.App(func, store(arg));
114+
get_app_telescope(store(app), s, relaxed_u64_pred(count)),
113115
}
114116
}
115117

@@ -180,7 +182,7 @@ def ixonDeserialize := ⟦
180182
-- App: Tag4(0x7, count) + func + args...
181183
0x7 =>
182184
let (func, s2) = get_expr(s);
183-
get_app_telescope(func, s2, size),
185+
get_app_telescope(store(func), s2, size),
184186

185187
-- Lam: Tag4(0x8, count) + types... + body
186188
0x8 => get_lam_telescope(s, size),
@@ -189,17 +191,23 @@ def ixonDeserialize := ⟦
189191
0x9 => get_all_telescope(s, size),
190192

191193
-- Let: Tag4(0xA, non_dep) + expr(ty) + expr(val) + expr(body)
192-
0xA =>
193-
let (ty, s2) = get_expr(s);
194-
let (val, s3) = get_expr(s2);
195-
let (body, s4) = get_expr(s3);
196-
(Expr.Let(size, store(ty), store(val), store(body)), s4),
194+
0xA => get_expr_let(s, size),
197195

198196
-- Share: Tag4(0xB, idx)
199197
0xB => (Expr.Share(size), s),
200198
}
201199
}
202200

201+
-- Let arm of get_expr, split out: three recursive `get_expr` calls make it the
202+
-- widest (and a rare) arm, so inlined it taxes every get_expr row.
203+
fn get_expr_let(s: ByteStream, size: U64) -> (Expr, ByteStream) {
204+
let (ty, s2) = get_expr(s);
205+
let (val, s3) = get_expr(s2);
206+
let (body, s4) = get_expr(s3);
207+
(Expr.Let(size, store(ty), store(val), store(body)), s4)
208+
}
209+
210+
203211
-- ============================================================================
204212
-- Universe deserialization
205213
-- ============================================================================

Ix/IxVM/Kernel/Claim.lean

Lines changed: 45 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -804,6 +804,43 @@ def claim := ⟦
804804
}
805805
}
806806

807+
-- Pack the first 4 address bytes (LE) into a u32 key for the skip-set rbtree.
808+
--
809+
-- Capped at 4 bytes because `RBTreeMap` orders keys with `u32_less_than`, a
810+
-- 32-bit comparator gadget — a wider key (a single `G` could hold 7 bytes in
811+
-- Goldilocks) would overflow it and corrupt the tree ordering. A 32-bit key
812+
-- space means key collisions are possible (~N²/2^33 over N leaves), but they
813+
-- are harmless: a collision makes `addr_in_set`'s confirming `address_eq`
814+
-- fail, yielding a false negative (a frontier const gets rechecked) never a
815+
-- false positive (a wrong skip). See `build_skip_set`.
816+
fn addr_key(a: Addr) -> G {
817+
let arr = load(a);
818+
to_field(arr[0])
819+
+ to_field(arr[1]) * 256
820+
+ to_field(arr[2]) * 65536
821+
+ to_field(arr[3]) * 16777216
822+
}
823+
824+
-- Build an O(log N) membership set from the assumption-leaf list, keyed on
825+
-- `addr_key`. Key collisions overwrite — sound because the only consequence is
826+
-- a missed skip (a frontier const gets rechecked, never wrongly trusted); the
827+
-- confirming `address_eq` in `addr_in_set` rules out false positives.
828+
fn build_skip_set(leaves: List‹Addr›, acc: RBTreeMap‹Addr›) -> RBTreeMap‹Addr› {
829+
match load(leaves) {
830+
ListNode.Nil => acc,
831+
ListNode.Cons(a, rest) =>
832+
build_skip_set(rest, rbtree_map_insert(addr_key(a), a, acc)),
833+
}
834+
}
835+
836+
-- Membership via one rbtree lookup (cheap u32 key compares) + one confirming
837+
-- full `address_eq`, replacing the O(N) linear `addr_in_list` scan that
838+
-- dominated address_eq cost on sharded checks.
839+
fn addr_in_set(target: Addr, skip_set: RBTreeMap‹Addr›) -> G {
840+
let found = rbtree_map_lookup_or_default(addr_key(target), skip_set, store([0u8; 32]));
841+
address_eq(found, target)
842+
}
843+
807844
-- ============================================================================
808845
-- check_all variant that skips positions whose addr is in the
809846
-- supplied assumption-leaf list.
@@ -813,24 +850,27 @@ def claim := ⟦
813850
addrs: List‹Addr›,
814851
asm_leaves: List‹Addr›) {
815852
let _ = check_canonical_block_sort(top);
816-
check_all_skipping_iter(consts, top, addrs, asm_leaves, 0)
853+
-- Build the skip-set once (O(N log N)) instead of an O(N) linear scan per
854+
-- checked const.
855+
let skip_set = build_skip_set(asm_leaves, RBTreeMap.Nil);
856+
check_all_skipping_iter(consts, top, addrs, skip_set, 0)
817857
}
818858

819859
fn check_all_skipping_iter(consts: List‹&KConstantInfo›,
820860
top: List‹&KConstantInfo›,
821861
addrs: List‹Addr›,
822-
asm_leaves: List‹Addr›,
862+
skip_set: RBTreeMap‹Addr›,
823863
pos: G) {
824864
match load(consts) {
825865
ListNode.Nil => (),
826866
ListNode.Cons(&ci, rest) =>
827867
let addr = list_lookup(addrs, pos);
828-
match addr_in_list(addr, asm_leaves) {
868+
match addr_in_set(addr, skip_set) {
829869
1 =>
830-
check_all_skipping_iter(rest, top, addrs, asm_leaves, pos + 1),
870+
check_all_skipping_iter(rest, top, addrs, skip_set, pos + 1),
831871
_ =>
832872
let _ = check_const(ci, pos, top, addrs);
833-
check_all_skipping_iter(rest, top, addrs, asm_leaves, pos + 1),
873+
check_all_skipping_iter(rest, top, addrs, skip_set, pos + 1),
834874
},
835875
}
836876
}

0 commit comments

Comments
 (0)