Skip to content

Commit 24965e0

Browse files
Rename top level constant and add better comment that its for performance
1 parent 050e5fd commit 24965e0

5 files changed

Lines changed: 609 additions & 342 deletions

File tree

python/egglog/exp/param_eq/PERFORMANCE_DEBUG_LOG.md

Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -657,3 +657,285 @@ encoding is slower even when the final e-graph is smaller.
657657
python/egglog/exp/param_eq/test_pipeline.py::test_container_preserves_repeated_polynomial_products -q`
658658
passed.
659659
`uv run ruff check python/egglog/exp/param_eq/pipeline.py` passed.
660+
661+
## 2026-05-04: remove `POLYNOMIALS` analysis table
662+
663+
- Status:
664+
accepted local fix candidate.
665+
- Question:
666+
can the remaining `POLYNOMIALS: Map[Num, ContainerPolynomial]` analysis
667+
table be removed without losing the parameter-quality parity that the
668+
repeated-product and outer-scale container rules need?
669+
- Baseline dependency:
670+
the table was used by two rules in `container_basic_rules`: the
671+
repeated-polynomial product rule
672+
`P + c*M*P + R -> P * (1 + c*M) + R`, and the outer-constant scale rule
673+
`(c*x + d*y + k) * M - c -> c * (M * (...) - 1)`.
674+
An integer-residual guard also used it only to reject nested polynomial
675+
terms.
676+
- Failed probe 1:
677+
replaced the two table lookups with `POLYNOMIAL_MONOMIALS`, keyed by
678+
`{polynomial(P): 1}`. The focused repeated-product regression with the
679+
`log(abs(...))` expression regressed from the expected `after_params <= 8`
680+
to `after_params=9`.
681+
- Failed probe 2:
682+
changed `POLYNOMIAL_MONOMIALS` to prefer the newer value. The same
683+
repeated-product regression still extracted with `after_params=9`, so this
684+
was not just a stale merge choice for that table.
685+
- Falsification probe:
686+
restored `POLYNOMIALS` only for the outer-scale rule. The focused regression
687+
still failed with `after_params=9`, proving the repeated-product rule was the
688+
first missing dependency.
689+
- Accepted probe:
690+
removed the table lookup entirely and bound the polynomial relation directly:
691+
after collecting candidate keys, the repeated-product rule now uses
692+
`counts.count(n) > 0` and `polynomial(poly1) == n`; the outer-scale rule now
693+
uses `mono[n] == 1` and `polynomial(poly1) == n`. This gives the rule an
694+
explicit relation dependency without storing a global `Num -> polynomial`
695+
map.
696+
- Code removed:
697+
deleted the `POLYNOMIALS` constant, its default analysis initialization, and
698+
its population rule.
699+
- Focused validation:
700+
`uv run pytest
701+
python/egglog/exp/param_eq/test_pipeline.py::test_container_uses_preferred_singleton_polynomial_for_inverse_scale
702+
python/egglog/exp/param_eq/test_pipeline.py::test_container_preserves_repeated_polynomial_products
703+
python/egglog/exp/param_eq/test_pipeline.py::test_container_preserves_repeated_numeric_factor_before_like_term_merge
704+
python/egglog/exp/param_eq/test_pipeline.py::test_container_splits_integer_residual_after_like_term_merge
705+
python/egglog/exp/param_eq/test_pipeline.py::test_container_matches_binary_coefficient_factoring_choices
706+
-q` passed with `10 passed`.
707+
`uv run ruff check python/egglog/exp/param_eq/pipeline.py` passed.
708+
- Broader validation:
709+
full `test_pipeline.py` still has unrelated existing failures: snapshot
710+
drift and a test calling the removed `_new_rewrite_scheduler`. The targeted
711+
POLYNOMIALS regressions are green.
712+
- Sample comparison:
713+
`make -C python/egglog/exp/param_eq compare-binary-container-sample
714+
SAMPLE_LIMIT_PER_DATASET=50` compared 100 shared rows. After params were
715+
`9 better / 0 worse / 91 same` for containers. Extracted cost was
716+
`46 better / 7 worse / 47 same`; total e-graph size was
717+
`97 better / 3 worse / 0 same`.
718+
- Direct A/B against the old table:
719+
temporarily reversed only the `pipeline.py` removal patch and ran
720+
`run_egglog_corpus --variant container --limit-per-dataset 20` to
721+
`/tmp/egglog_container_with_polynomials_first20.csv`, then restored the
722+
patch and reran to `/tmp/egglog_container_no_polynomials_first20.csv`.
723+
On the 40 shared rows, `after_params` was identical for all rows. Median
724+
e-graph total size changed from `37` to `36`, with `39 / 1 / 0`
725+
better / worse / same under the table-free version. Median runtime changed
726+
from `189.5ms` to `191.8ms`, with `15 / 25 / 0` faster / slower / same,
727+
so this sample supports a state-size win but not a measured runtime win.
728+
- Current conclusion:
729+
`POLYNOMIALS` was a workaround for making polynomial-representation
730+
dependencies visible through higher-order map-derived keys. The direct
731+
`polynomial(poly1) == n` constraints now make those dependencies explicit
732+
enough for the relevant rules, so the extra table can be removed.
733+
734+
## 2026-05-04: keep `POLYNOMIAL_MONOMIALS` as a singleton-key index
735+
736+
- Status:
737+
rejected full removal.
738+
- Question:
739+
can `POLYNOMIAL_MONOMIALS: Map[ContainerMonomial, ContainerPolynomial]` be
740+
removed after `POLYNOMIALS` was removed?
741+
- Purpose of the table:
742+
it maps the concrete singleton monomial key `{polynomial(P): 1}` back to `P`.
743+
The remaining users are not asking for every polynomial representation of a
744+
`Num`; they are asking whether an already-present monomial key is exactly a
745+
nested polynomial term.
746+
- Direct replacement tested:
747+
replaced lookups such as `polynomial_monomials[candidate_mono]` with
748+
constraints like `mono == {n: 1}` and `polynomial(poly1) == n`.
749+
- Sample result:
750+
on a first-20-per-dataset container sample, after params were identical on
751+
all 40 rows compared with the version that kept `POLYNOMIAL_MONOMIALS`.
752+
Median runtime looked better in that small sample (`172.5ms` vs `191.8ms`),
753+
but this did not cover the worst focused canary.
754+
- Falsifying canary:
755+
`test_container_matches_binary_coefficient_factoring_choices` for
756+
`0.103875 + 0.01063 * (exp(x0) - exp(x1) + 2.824*x0 + 6.894*x1 +
757+
7.894*(x0 + x1 - x0*x0))`.
758+
With direct constraints, the test still passed but the single case took
759+
about `120.8s`.
760+
- Isolation:
761+
disabling all three direct singleton-polynomial rules made the same canary
762+
fast but lost the expected node-count improvement. Re-enabling only the
763+
exact nested polynomial flattening rule reproduced the slowdown, so the
764+
expensive pattern is:
765+
`a*polynomial(P) + R -> a*P + R` with direct
766+
`mono == {n: 1}; polynomial(P) == n` matching.
767+
- Restored result:
768+
restoring `POLYNOMIAL_MONOMIALS` reduced that canary to about `9.1s`, and the
769+
focused 16-test bundle completed in `14.79s`.
770+
- Current conclusion:
771+
`POLYNOMIAL_MONOMIALS` is still needed as a performance index. The direct
772+
relation form is semantically valid, but it causes the matcher to enumerate
773+
many polynomial e-classes before proving that the singleton monomial key is
774+
present. The table keeps the join keyed by existing monomial keys.
775+
776+
## 2026-05-04: `POLYNOMIALS` removal slowdown root cause
777+
778+
- Status:
779+
diagnosis only. Temporary guards were reverted.
780+
- Question:
781+
why did some examples get slower after removing `POLYNOMIALS`, even though
782+
params stayed equal and e-graph size usually got slightly smaller?
783+
- A/B command:
784+
temporarily reversed only the current `pipeline.py` patch and ran
785+
`run_egglog_corpus --variant container --limit-per-dataset 50` to
786+
`/tmp/egglog_container_with_polynomials_first50.csv`, restored the patch, and
787+
reran to `/tmp/egglog_container_no_polynomials_first50.csv`.
788+
- A/B result:
789+
on 100 shared rows, median runtime was similar
790+
(`169.9ms` with `POLYNOMIALS`, `167.9ms` without), but 51 rows were slower.
791+
The largest slowdown was `pagie/Bingo/raw=17/algo_row=17/original`, from
792+
`464ms` to `1077ms`, with after params unchanged at `8` and e-graph total
793+
size changing from `546` to `545`.
794+
- Canary expression:
795+
the repeated-product/log expression beginning
796+
`(-2.0 * -0.16807062259009387 + -0.00015170797954567304*x1) *
797+
log(abs(...))`.
798+
- Phase split on the canary:
799+
with `POLYNOMIALS`, total was `0.353s`: analysis `0.063s`, rewrite
800+
`0.170s`, extraction `0.112s`.
801+
without `POLYNOMIALS`, total was `0.879s`: analysis `0.057s`, rewrite
802+
`0.694s`, extraction `0.120s`.
803+
The graph was not bigger; the slowdown is rewrite matching near convergence.
804+
- Rule attribution:
805+
old repeated-product rule using `%POLYNOMIALS` took `0.009s` with
806+
`67` matches.
807+
new direct repeated-product rule using `counts.count(n) > 0` and
808+
`polynomial(poly1) == n` took `0.465s` with `93` matches.
809+
new direct outer-scale rule also took `0.073s` with `0` matches.
810+
- Falsification probe:
811+
temporarily disabled only the direct repeated-product rule with an impossible
812+
guard. The canary dropped to `0.191s` total and `0.136s` rewrite time, but
813+
quality regressed from `after_params=8` to `after_params=10`.
814+
- Current conclusion:
815+
removing `POLYNOMIALS` changed the join shape of the repeated-product rule.
816+
The old table made the candidate polynomial keys available through
817+
`map_restrict_keys(counts, polynomials)`, so the expensive relation join
818+
happened inside a map primitive over terms already present in the polynomial.
819+
The direct replacement exposes `polynomial(poly1) == n` as a normal e-matcher
820+
join for each counted term, which is semantically correct but much more
821+
expensive on examples with many polynomial presentations. The main slowdown
822+
is therefore matcher join order/cardinality, not extraction, analysis, final
823+
e-graph size, or parameter-quality work after extraction.
824+
825+
## 2026-05-04: restore polynomial body cache with explicit index naming
826+
827+
- Status:
828+
accepted local fix candidate.
829+
- Change:
830+
restored the old `Num -> ContainerPolynomial` cache as
831+
`POLYNOMIAL_BODIES`, and documented it as a join-shaping index rather than
832+
as independent semantic state. `SOLE_MONOMIALS` remains the preferred
833+
singleton-alias cache, and `POLYNOMIAL_MONOMIALS` remains the singleton
834+
monomial-key index.
835+
- Concrete comments added:
836+
`POLYNOMIAL_BODIES` documents the repeated-product example
837+
`P + c*M*P + R -> P * (1 + c*M) + R`, where candidates should be terms
838+
already present in the polynomial rather than every `polynomial(body) == n`
839+
relation in the e-graph.
840+
`POLYNOMIAL_MONOMIALS` documents the exact nested-flatten example
841+
`a*polynomial(P) + R -> a*P + R`, where candidates should be existing
842+
singleton monomial keys rather than constructed `{n: 1}` keys for every
843+
polynomial relation.
844+
- Canary result:
845+
`pagie/Bingo/raw=17/algo_row=17/original` is back to the old shape:
846+
total `0.355s`, rewrite `0.173s`, extraction `0.110s`,
847+
e-graph total size `546`, `after_params=8`.
848+
This matches the prior `POLYNOMIALS` behavior and removes the direct-rule
849+
slowdown (`0.879s`, rewrite `0.694s`).
850+
- Focused validation:
851+
`uv run pytest ... -q --durations=8` over the 16 container regressions passed
852+
in `4.44s`; the slowest case was `1.56s`.
853+
- Sample comparison:
854+
first-50-per-dataset container sample matched the old `POLYNOMIALS` cache on
855+
all 100 `after_params` values. Median runtime was effectively similar
856+
(`169.9ms` old, `170.8ms` renamed cache); remaining per-row timing deltas
857+
look like benchmark noise because e-graph size and params are unchanged.
858+
859+
## 2026-05-04: slow container outlier with small final e-graph
860+
861+
- Status:
862+
diagnosis only. No rule changes kept.
863+
- Question:
864+
why does the row
865+
`0.103875 + 0.01063 * (exp(x0) - (exp(x1) + (x0 + x1 - x0*x0) * -7.894 + (x1 * -7.894 - 2.824*x0 + x1)))`
866+
take much longer in containers even though the container e-graph is smaller?
867+
- Direct run result:
868+
binary ran in about `0.56s`, with `passes=2`, e-graph size `9168`,
869+
`after_nodes=27`, and `after_params=4`.
870+
containers ran in about `13.2s` to `14.0s`, with `passes=2`, e-graph size
871+
`4017`, `after_nodes=23`, and `after_params=4`.
872+
- Phase split:
873+
binary total was `0.570s`: analysis `0.010s`, rewrite `0.191s`,
874+
extraction `0.359s`.
875+
container total was `13.282s`: analysis `2.641s`, rewrite `9.357s`,
876+
extraction `1.260s`.
877+
- Boundary evidence:
878+
the slowest container phases were late rewrite rounds, not construction or
879+
extraction: pass 1 rounds 26 through 30 each took about `0.79s` to `0.87s`
880+
while the total size moved only from about `3633` to `4017`.
881+
- Rule attribution from the container run:
882+
repeated scalar coefficient-magnitude factoring at `pipeline.py:518` took
883+
`1.93s` with `8870` matches.
884+
`SOLE_MONOMIALS` flattening at `pipeline.py:696` took `1.32s` with `6004`
885+
matches.
886+
Horner factoring at `pipeline.py:773` took `1.27s` with `4747` matches.
887+
representative coefficient factoring at `pipeline.py:425` took `1.22s` with
888+
`8128` matches.
889+
integer-ratio coefficient factoring at `pipeline.py:442` took `1.17s` with
890+
`3081` matches.
891+
- Falsified hypotheses:
892+
the slowdown is not caused by a larger final e-graph: containers end smaller
893+
(`4017` vs `9168`).
894+
it is not primarily extraction: container extraction is large (`1.26s`) but
895+
much smaller than container rewrite time (`9.36s`).
896+
it is not the restored `POLYNOMIAL_BODIES` lookup: the repeated-product rule
897+
using that table took only `0.35s` and the outer-scale lookup had `0`
898+
matches.
899+
- Current hypothesis:
900+
this row is slow because the container encoding exposes many equivalent
901+
coefficient-factor presentations inside one polynomial. The broad scalar
902+
coefficient factoring, sole-monomial flattening, Horner, and representative
903+
coefficient rules keep finding legal alternate presentations through late
904+
rounds. The result is smaller and reaches the same parameter count, but the
905+
schedule pays for repeated map scans and rematching near convergence.
906+
907+
## 2026-05-04: isolate unstaged hunk causing the slow container outlier
908+
909+
- Status:
910+
diagnosis only. Current `pipeline.py` was not changed by this probe.
911+
- Question:
912+
which unstaged change made the row above slow?
913+
- Method:
914+
used a detached `HEAD` worktree at `050e5fd`, copied the local compiled
915+
`egglog` extension artifacts into it, and selected Python source with
916+
`PYTHONPATH` so current and `HEAD` used the same runtime.
917+
- Results:
918+
`HEAD` ran the container case in `1.06s`, with e-graph size `862`,
919+
`after_nodes=23`, and `after_params=4`.
920+
Current dirty source ran it in `13.41s`, with e-graph size `4017`,
921+
`after_nodes=23`, and `after_params=4`.
922+
Copying the full current `pipeline.py` into the `HEAD` worktree reproduced
923+
the slow result: `13.35s`, size `4017`.
924+
Reverting only the post-normalization integer-residual split guard in that
925+
temp copy restored the fast result: `1.06s`, size `862`.
926+
- Isolated hunk:
927+
the slow hunk is in the rule
928+
`a*M + b*N + R -> a*polynomial(M + N) + (b - a)*N + R`.
929+
The unstaged version changed the suppression guard from the polynomial body
930+
cache to `SOLE_MONOMIALS`:
931+
`map_restrict_keys(counts, POLYNOMIALS).length() == 0`
932+
became
933+
`map_restrict_keys(counts, SOLE_MONOMIALS).length() == 0`.
934+
- Mechanism:
935+
the old guard suppresses integer-residual splitting when the current
936+
polynomial already contains a nested polynomial term. The new guard only
937+
suppresses when a term has a sole-monomial alias, so it lets the split rule
938+
run on nested-polynomial presentations. On this row that creates many extra
939+
coefficient-factor presentations, which then feed repeated scalar coefficient
940+
factoring, `SOLE_MONOMIALS` flattening, Horner, and representative
941+
coefficient factoring through late rounds.

0 commit comments

Comments
 (0)