Skip to content

Commit 806aa7e

Browse files
antiguruclaude
andcommitted
transform: add equality-saturation MIR optimizer (mz_transform::eqsat)
Introduce an equality-saturation optimizer that lowers a subset of MirRelationExpr into an e-graph, applies confluent rewrite rules plus congruence closure to a bounded fixpoint, and extracts the cheapest equivalent plan under a two-axis (memory-first, time-second) cost model. The pass registers in the logical and physical optimizers behind per-phase feature flags, equivalence- and type-preserving at the live boundary (adopt_if_type_preserving), with a MAX_PLAN_SIZE guard. It reuses the production CanonicalizeMfp, Demand, and ProjectionPushdown machinery after raise, and seeds index availability and literal-filter lookups from the oracle in the physical phase. Includes: the Rel IR with opaque bail, lower/raise round-trip, the saturation engine with budgets and scope peeling for Let/LetRec, lattice e-class analyses (NonNeg, Keys, Monotonic, Equivalences, ConstantColumns), greedy and ILP extractors, extraction-time CSE, a build-time rule DSL with per-rule codegen, a pluggable join orderer (DpSub default, Dpccp, DphypBushy), a Lean 4 soundness-spec emitter, and the architecture reference doc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019m58NEHNTg4NQ2YuQmrpzQ
1 parent 45a4691 commit 806aa7e

148 files changed

Lines changed: 29556 additions & 3130 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.superpowers/sdd/e1-report.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# E-T1 Report: Wire CSE into the live path
2+
3+
## Changes
4+
5+
### raise.rs: scope-threaded raise
6+
7+
Refactored `raise(rel, commit_wcoj)` to delegate to `raise_inner(rel, commit_wcoj, scope: &mut BTreeMap<usize, ReprRelationType>)`.
8+
The `Rel::Let` arm now raises the value, calls `.typ()` on the result, inserts `(id, typ)` into scope, raises the body, and removes the id from scope on exit.
9+
The `Rel::LocalGet { get: None }` arm now looks up `scope[id]` and emits `MirRelationExpr::Get { id: Id::Local(LocalId::new(u64::cast_from(id))), typ: scope[id].clone(), access_strategy: AccessStrategy::UnknownOrLocal }`.
10+
The `Rel::LocalGet { get: Some(g) }` arm returns `g.clone()` unchanged (existing behavior preserved).
11+
Imports added: `std::collections::BTreeMap`, `mz_expr::{AccessStrategy, Id}`.
12+
13+
### cse.rs: fresh ids + closed-subtree guard
14+
15+
Fresh ids: added `max_local_id(rel)` walker that collects the maximum id from all `Rel::Let` and `Rel::LocalGet` nodes; CSE ids now start at `max + 1`.
16+
17+
Closed-subtree guard (the key bug fix): added `is_closed(rel)` which returns false if the subtree contains any `Rel::LocalGet { get: Some(_) }` or `Rel::Let` nodes.
18+
Updated `worth_binding` to require `is_closed`: CSE only binds subtrees that do not reference lower'd locals.
19+
Without this guard, CSE would hoist subtrees like `Project(Get l2)` OUTSIDE the scope where `l2` is defined, causing Typecheck to panic with "l2 is unbound".
20+
21+
### eqsat.rs: call CSE in live path
22+
23+
Added `let best = crate::eqsat::cse::eliminate_common_subexpressions(&best);` between extraction and raise in `optimize_inner`, with a comment explaining it subsumes RelationCSE.
24+
25+
## Datadriven witness
26+
27+
Case (j) in `eqsat.spec`: a Union of two identical `Filter(#0 = 1)(Get t0)` branches.
28+
The extracted plan has the filter as a shared subterm.
29+
After CSE, the output shows a `With` (Let) binding:
30+
31+
```
32+
With
33+
cte l1 =
34+
Filter (#0 = 1)
35+
Get t0
36+
Return
37+
Union
38+
Get l1
39+
Get l1
40+
```
41+
42+
A `Let` binding appears as expected.
43+
44+
## arithmetic.slt gate result
45+
46+
Command: `bin/sqllogictest --optimized -- test/sqllogictest/arithmetic.slt`
47+
Wall time: ~3:31
48+
Pass count: 206/206
49+
Typecheck panic: none
50+
Non-positive multiplicity: none
51+
52+
## Test summary
53+
54+
* `cargo test -p mz-transform --lib eqsat`: 53 passed (includes new `raise_cse_let_with_placeholder_local_get` test)
55+
* `cargo test -p mz-transform --test test_transforms`: 1 passed (datadriven, including CSE witness case (j))
56+
* `cargo test -p mz-transform --test wcoj_decision`: 6 passed
57+
* `cargo test -p mz-transform --test roundtrip`: 22 passed
58+
* `cargo test -p mz-transform --test compare_real`: 1 passed
59+
* `cargo clippy -p mz-transform`: clean
60+
* `bin/lint`: failures only for `buf`/`trufflehog` (infra tools not installed, expected)
61+
62+
## Concerns
63+
64+
The `is_closed` guard restricts CSE to subtrees that contain no lowered-local references.
65+
This is conservative: a shared subtree that references a lower'd local could in principle be bound within its defining scope.
66+
However, the simpler approach (no scope-aware hoisting) is correct and sufficient for the e-graph use case: the e-graph never produces Let/LocalGet nodes itself (it bails to opaque leaves for LetRec), so the only locals in the extracted tree come from original MIR Lets, which are preserved verbatim.
67+
The CSE still binds shared compound subterms that are "pure" (no local references), which covers the common case of a shared filtered or projected base table.

Cargo.lock

Lines changed: 93 additions & 11 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,7 @@ cc = "1.2.62"
319319
cfg-if = "1.0.4"
320320
chrono = { version = "0.4.39", default-features = false, features = ["clock", "serde", "std"] }
321321
chrono-tz = { version = "0.8.1", features = ["case-insensitive", "serde"] }
322+
chumsky = "0.13.0"
322323
clap = { version = "4.6.1", features = ["derive"] }
323324
clap_complete = "4.6.2"
324325
columnar = "0.13.0"
@@ -363,6 +364,7 @@ futures-task = "0.3.32"
363364
futures-util = "0.3.32"
364365
gcp_auth = "0.12.6"
365366
glob = "0.3.3"
367+
good_lp = { version = "1.15", default-features = false, features = ["microlp"] }
366368
globset = "0.4.18"
367369
governor = "0.10.1"
368370
half = "2"

ci/test/lint-main/checks/check-copyright.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ copyright_files=$(grep -vE \
3232
-e '(^|/)Cargo\.lock$' \
3333
-e '(^|/)types\.lock$' \
3434
-e '(^|/)\.mzprofile$' \
35+
-e '(^|/)lean-toolchain$' \
3536
-e '^src/mz-deploy/src/cli/scaffold/gitignore$' \
3637
-e '^test/mz-deploy/projects/multi-profile/v1/models/app/core/ambiguous\.sql$' \
3738
-e '^about\.toml$' \

deny.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,7 @@ wrappers = [
206206
"globset",
207207
"launchdarkly-server-sdk",
208208
"launchdarkly-server-sdk-evaluation",
209+
"microlp",
209210
"native-tls",
210211
"opendal",
211212
"os_info",

0 commit comments

Comments
 (0)