Skip to content

Commit 050e5fd

Browse files
Remove duplicate rule and store sole monomials by cost
1 parent c6e4841 commit 050e5fd

9 files changed

Lines changed: 2645 additions & 3811 deletions

Cargo.lock

Lines changed: 14 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

python/egglog/exp/param_eq/PERFORMANCE_DEBUG_LOG.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -481,3 +481,179 @@ encoding is slower even when the final e-graph is smaller.
481481
keeps producing new equivalent binary presentations until the fixed
482482
`HASKELL_INNER_ITERATION_LIMIT=30` cap, not because Python declaration
483483
materialization or `.egg` logging dominates.
484+
485+
## 2026-05-02 follow-up: debug build was the wall-time regression
486+
487+
- Status:
488+
reran after the current changes were committed; rebuilt the Python extension
489+
with `uv run maturin develop --release`.
490+
- Initial state:
491+
the active `python/egglog/bindings.cpython-313-darwin.so` was `48M`, matching
492+
`target/debug/libegglog.dylib`. The existing release artifacts were `15M`.
493+
- Experiment 1:
494+
restored only `pipeline.py` from `HEAD~1` and reran
495+
`kotanchek/96/GOMEA/7/sympy`.
496+
This restored the old reported `egraph_total_size=398`, but wall time stayed
497+
around `10.0s` before rebuilding release. Disabling saved `.egg` logging did
498+
not materially change that.
499+
- Experiment 2:
500+
after `uv run maturin develop --release`, the same `HEAD~1` pipeline shape
501+
ran the canary in `0.837s`, `passes=2`, `egraph_total_size=398`,
502+
`after_params=4`.
503+
- Experiment 3:
504+
restored committed `HEAD` `pipeline.py` and reran with the release binding.
505+
The canary ran in `0.844s`, `passes=2`, `after_params=4`, with
506+
`total_size=8407`.
507+
- Top-five slow artifact rerun with committed `HEAD` plus release binding:
508+
`kotanchek/96/GOMEA/7/sympy`: old `1088.9ms`, new `831.3ms`;
509+
`kotanchek/180/SRjl/1/sympy`: old `971.1ms`, new `671.6ms`;
510+
`pagie/27/Bingo/27/sympy`: old `897.7ms`, new `622.9ms`;
511+
`kotanchek/181/SRjl/2/sympy`: old `855.8ms`, new `708.1ms`;
512+
`pagie/163/SBP/13/sympy`: old `835.7ms`, new `710.8ms`.
513+
All five preserved `after_params`.
514+
- Conclusion:
515+
no last-commit code revert is needed to recover binary wall-time
516+
performance; the wall-time regression was caused by running the debug
517+
extension build. The remaining difference in `egraph_total_size` is a
518+
reporting semantics change: committed `HEAD` tracks max size across passes,
519+
while the old pipeline overwrote the value with the final pass size.
520+
521+
## 2026-05-03: container inverse-factor normalization outlier
522+
523+
- Status:
524+
diagnosis only. Temporary rule edits used for falsification were reverted.
525+
- Canary:
526+
large expression from the notebook/dashboard request beginning
527+
`((((((((0.11064466475608078 + -0.010036545250561161 * ...`.
528+
This is the case the user described as taking about `50x` longer on
529+
containers.
530+
- Source-of-truth timing, direct production entrypoints:
531+
`run_paper_pipeline(parse_expression(EXPR))` took about `0.86s` to `0.95s`,
532+
`passes=2`, `egraph_total_size=10725`, `before_params=49`,
533+
`after_params=37`.
534+
`run_paper_pipeline_container(parse_expression_container(EXPR))` took about
535+
`31s` to `33s`, `passes=2`, `egraph_total_size=4586`,
536+
`before_params=49`, `after_params=32`.
537+
- Initial observation:
538+
the container result is semantically better by the param metric and has a
539+
smaller reported e-graph, so the slowdown is not explained by final graph
540+
size.
541+
- Phase split, binary:
542+
rewrite execution was about `0.28s`, extraction about `0.65s`, analysis
543+
about `0.02s`.
544+
- Phase split, container:
545+
rewrite execution was about `22.4s`, extraction about `2.3s`, analysis about
546+
`1.5s`, initial extraction about `0.08s`, and graph-size checks were
547+
effectively zero.
548+
- First meaningful divergence:
549+
container pass 1 rewrite rounds dominate. The graph grows from about `160` to
550+
about `4586`, but rewrite rounds remain expensive near convergence:
551+
roughly `2.2s` per rewrite round after the graph has almost stopped changing.
552+
Pass 2 is small.
553+
- Ruleset isolation:
554+
running only `shared_rewrite_ruleset` or `container_fun_rules` is fast.
555+
Running `container_basic_rules` alone reproduces the slowdown:
556+
about `32s`, `16` rounds, `rewrite=28.5s`, `extract=2.3s`,
557+
`after_params=32`.
558+
`shared_rewrite_ruleset | container_basic_rules` and the full container
559+
rewrite ruleset behave the same, so the issue is localized to
560+
`container_basic_rules`.
561+
- Rule-level aggregation from `RunReport.search_and_apply_time_per_rule` over
562+
the slow container pass:
563+
`pipeline.py:647` inverse-factor polynomial normalization:
564+
about `19.0s`, `328` matches.
565+
`pipeline.py:609` singleton/sole-monomial flattening:
566+
about `2.1s`, `2539` matches.
567+
`pipeline.py:737` greedy Horner factorization:
568+
about `1.5s`, `1951` matches.
569+
`pipeline.py:359` integer-ratio coefficient factoring:
570+
about `1.5s`, `847` matches.
571+
`pipeline.py:338` representative coefficient factoring:
572+
about `1.4s`, `7088` matches.
573+
`pipeline.py:440` repeated scalar coefficient factoring:
574+
about `0.9s`, `2016` matches.
575+
- Primary hypothesis:
576+
`pipeline.py:647` is the main regression source. The rule is useful for
577+
parameter quality, but operationally too broad. For every container
578+
polynomial candidate it folds all monomials, checks every term, and probes
579+
`POLYNOMIALS[term]` for reciprocal factors. The outlier combines enough
580+
reciprocal and singleton-polynomial presentations that this full-polynomial
581+
fold is expensive even with only hundreds of successful matches.
582+
- Falsification probe 1:
583+
temporarily disabled only the `pipeline.py:647` rule with an impossible
584+
guard. Container time dropped from about `26s` to about `12.2s`;
585+
rewrite time dropped from about `22.4s` to about `8.6s`; `after_params`
586+
stayed `32`. This confirms the rule is causal and contributes the largest
587+
share of the slowdown, but it is not the only cost.
588+
- Falsification probe 2:
589+
temporarily added a cheap eligibility guard so the rule only ran when the
590+
polynomial had a `-1` exponent term whose base appeared in `POLYNOMIALS`.
591+
Container time dropped to about `18.3s`; rewrite time dropped to about
592+
`14.6s`; `after_params` stayed `32`. This validates the guard direction, but
593+
the expression genuinely has many eligible reciprocal terms, so this guard is
594+
not sufficient.
595+
- Reduction probe:
596+
top-level fragments of the expression are individually fast. Several
597+
fragments run faster in containers than binary. The slowdown appears when the
598+
fragments are combined, consistent with cross-term accumulation of equivalent
599+
polynomial and reciprocal presentations rather than a single isolated
600+
arithmetic subexpression.
601+
- Current conclusion:
602+
the inverse-factor normalization rule is the primary hotspot. A likely fix
603+
should not remove it outright; it should narrow it from a whole-polynomial
604+
fold over all candidates to a one-reciprocal-factor-at-a-time rewrite, or add
605+
stronger eligibility/canonicality guards so it only runs when it can actually
606+
change a reciprocal term and not repeatedly rescan stable presentations.
607+
608+
## 2026-05-04: replace inverse-factor rule with scored `SOLE_MONOMIALS`
609+
610+
- Status:
611+
accepted local fix candidate.
612+
- Question:
613+
can the expensive inverse-factor normalization rule be removed if
614+
`SOLE_MONOMIALS` stops keeping a stale first alias?
615+
- Change tested:
616+
removed the broad reciprocal normalization rewrite from
617+
`container_basic_rules`. Changed `SOLE_MONOMIALS` from
618+
`Map[Num, ContainerMonomial]` to `Map[Num, Pair[i64, ContainerMonomial]]`.
619+
The merge now keeps the lower local score, with old-value tie breaking.
620+
- Witness after only removing the broad rule:
621+
`0.7096075864465768 / (-5.746100091872222*x0 +
622+
2.873050045936111*exp(x0) + 1.7208022848793882)` regressed to
623+
`-0.12349377405560805 / (-0.2994730786735579 + x0 - 0.5*exp(x0))`,
624+
`after_params=3`.
625+
- Alias evidence:
626+
the stale `SOLE_MONOMIALS` value for the denominator was
627+
`-5.746100091872222 * polynomial(-0.2994730786735579 + x0 -
628+
0.5*exp(x0))`. The better alias was
629+
`2.873050045936111 * polynomial(0.5989461573471158 - 2*x0 + exp(x0))`.
630+
The second form has one fewer non-integer coefficient after decoding.
631+
- Failed probe:
632+
latest-wins merge fixed the witness, but that is still scheduler-order based
633+
rather than a preference rule.
634+
- Accepted probe:
635+
a scored merge plus a specific one-level nested-polynomial score picked the
636+
better alias with score `2`; the stale alias scored worse. The focused
637+
witness now extracts
638+
`0.2469875481112161 / (0.5989461573471158 - 2.0*x0 + exp(x0))`,
639+
`after_params=2`.
640+
- Cycle regression:
641+
the self-factor cycle canary `-14.792753236262874*x1 + x0` stayed
642+
extractable through four container rewrite/analysis rounds.
643+
- Sample comparison:
644+
`make -C python/egglog/exp/param_eq compare-binary-container-sample
645+
SAMPLE_LIMIT_PER_DATASET=50` compared 100 shared rows. After params were
646+
`9 better / 0 worse / 91 same` for containers. Param-reduction still showed
647+
five worse rows because container lowering starts with fewer params on those
648+
rows; after-param float count did not regress.
649+
- Large outlier timing:
650+
rerunning `/tmp/param_eq_perf_expr.py` changed the prior container canary
651+
from about `31s` to about `13.0s` wall/report time, with
652+
`after_params=32` preserved and e-graph size about `4477`. Binary remained
653+
about `0.76s` to `0.85s`, now extracting `after_params=36` in this checkout.
654+
- Validation:
655+
`uv run pytest
656+
python/egglog/exp/param_eq/test_pipeline.py::test_container_uses_preferred_singleton_polynomial_for_inverse_scale
657+
python/egglog/exp/param_eq/test_pipeline.py::test_container_preserves_repeated_polynomial_products -q`
658+
passed.
659+
`uv run ruff check python/egglog/exp/param_eq/pipeline.py` passed.

0 commit comments

Comments
 (0)