@@ -481,3 +481,179 @@ encoding is slower even when the final e-graph is smaller.
481481 keeps producing new equivalent binary presentations until the fixed
482482 ` HASKELL_INNER_ITERATION_LIMIT=30 ` cap, not because Python declaration
483483 materialization or ` .egg ` logging dominates.
484+
485+ ## 2026-05-02 follow-up: debug build was the wall-time regression
486+
487+ - Status:
488+ reran after the current changes were committed; rebuilt the Python extension
489+ with ` uv run maturin develop --release ` .
490+ - Initial state:
491+ the active ` python/egglog/bindings.cpython-313-darwin.so ` was ` 48M ` , matching
492+ ` target/debug/libegglog.dylib ` . The existing release artifacts were ` 15M ` .
493+ - Experiment 1:
494+ restored only ` pipeline.py ` from ` HEAD~1 ` and reran
495+ ` kotanchek/96/GOMEA/7/sympy ` .
496+ This restored the old reported ` egraph_total_size=398 ` , but wall time stayed
497+ around ` 10.0s ` before rebuilding release. Disabling saved ` .egg ` logging did
498+ not materially change that.
499+ - Experiment 2:
500+ after ` uv run maturin develop --release ` , the same ` HEAD~1 ` pipeline shape
501+ ran the canary in ` 0.837s ` , ` passes=2 ` , ` egraph_total_size=398 ` ,
502+ ` after_params=4 ` .
503+ - Experiment 3:
504+ restored committed ` HEAD ` ` pipeline.py ` and reran with the release binding.
505+ The canary ran in ` 0.844s ` , ` passes=2 ` , ` after_params=4 ` , with
506+ ` total_size=8407 ` .
507+ - Top-five slow artifact rerun with committed ` HEAD ` plus release binding:
508+ ` kotanchek/96/GOMEA/7/sympy ` : old ` 1088.9ms ` , new ` 831.3ms ` ;
509+ ` kotanchek/180/SRjl/1/sympy ` : old ` 971.1ms ` , new ` 671.6ms ` ;
510+ ` pagie/27/Bingo/27/sympy ` : old ` 897.7ms ` , new ` 622.9ms ` ;
511+ ` kotanchek/181/SRjl/2/sympy ` : old ` 855.8ms ` , new ` 708.1ms ` ;
512+ ` pagie/163/SBP/13/sympy ` : old ` 835.7ms ` , new ` 710.8ms ` .
513+ All five preserved ` after_params ` .
514+ - Conclusion:
515+ no last-commit code revert is needed to recover binary wall-time
516+ performance; the wall-time regression was caused by running the debug
517+ extension build. The remaining difference in ` egraph_total_size ` is a
518+ reporting semantics change: committed ` HEAD ` tracks max size across passes,
519+ while the old pipeline overwrote the value with the final pass size.
520+
521+ ## 2026-05-03: container inverse-factor normalization outlier
522+
523+ - Status:
524+ diagnosis only. Temporary rule edits used for falsification were reverted.
525+ - Canary:
526+ large expression from the notebook/dashboard request beginning
527+ ` ((((((((0.11064466475608078 + -0.010036545250561161 * ... ` .
528+ This is the case the user described as taking about ` 50x ` longer on
529+ containers.
530+ - Source-of-truth timing, direct production entrypoints:
531+ ` run_paper_pipeline(parse_expression(EXPR)) ` took about ` 0.86s ` to ` 0.95s ` ,
532+ ` passes=2 ` , ` egraph_total_size=10725 ` , ` before_params=49 ` ,
533+ ` after_params=37 ` .
534+ ` run_paper_pipeline_container(parse_expression_container(EXPR)) ` took about
535+ ` 31s ` to ` 33s ` , ` passes=2 ` , ` egraph_total_size=4586 ` ,
536+ ` before_params=49 ` , ` after_params=32 ` .
537+ - Initial observation:
538+ the container result is semantically better by the param metric and has a
539+ smaller reported e-graph, so the slowdown is not explained by final graph
540+ size.
541+ - Phase split, binary:
542+ rewrite execution was about ` 0.28s ` , extraction about ` 0.65s ` , analysis
543+ about ` 0.02s ` .
544+ - Phase split, container:
545+ rewrite execution was about ` 22.4s ` , extraction about ` 2.3s ` , analysis about
546+ ` 1.5s ` , initial extraction about ` 0.08s ` , and graph-size checks were
547+ effectively zero.
548+ - First meaningful divergence:
549+ container pass 1 rewrite rounds dominate. The graph grows from about ` 160 ` to
550+ about ` 4586 ` , but rewrite rounds remain expensive near convergence:
551+ roughly ` 2.2s ` per rewrite round after the graph has almost stopped changing.
552+ Pass 2 is small.
553+ - Ruleset isolation:
554+ running only ` shared_rewrite_ruleset ` or ` container_fun_rules ` is fast.
555+ Running ` container_basic_rules ` alone reproduces the slowdown:
556+ about ` 32s ` , ` 16 ` rounds, ` rewrite=28.5s ` , ` extract=2.3s ` ,
557+ ` after_params=32 ` .
558+ ` shared_rewrite_ruleset | container_basic_rules ` and the full container
559+ rewrite ruleset behave the same, so the issue is localized to
560+ ` container_basic_rules ` .
561+ - Rule-level aggregation from ` RunReport.search_and_apply_time_per_rule ` over
562+ the slow container pass:
563+ ` pipeline.py:647 ` inverse-factor polynomial normalization:
564+ about ` 19.0s ` , ` 328 ` matches.
565+ ` pipeline.py:609 ` singleton/sole-monomial flattening:
566+ about ` 2.1s ` , ` 2539 ` matches.
567+ ` pipeline.py:737 ` greedy Horner factorization:
568+ about ` 1.5s ` , ` 1951 ` matches.
569+ ` pipeline.py:359 ` integer-ratio coefficient factoring:
570+ about ` 1.5s ` , ` 847 ` matches.
571+ ` pipeline.py:338 ` representative coefficient factoring:
572+ about ` 1.4s ` , ` 7088 ` matches.
573+ ` pipeline.py:440 ` repeated scalar coefficient factoring:
574+ about ` 0.9s ` , ` 2016 ` matches.
575+ - Primary hypothesis:
576+ ` pipeline.py:647 ` is the main regression source. The rule is useful for
577+ parameter quality, but operationally too broad. For every container
578+ polynomial candidate it folds all monomials, checks every term, and probes
579+ ` POLYNOMIALS[term] ` for reciprocal factors. The outlier combines enough
580+ reciprocal and singleton-polynomial presentations that this full-polynomial
581+ fold is expensive even with only hundreds of successful matches.
582+ - Falsification probe 1:
583+ temporarily disabled only the ` pipeline.py:647 ` rule with an impossible
584+ guard. Container time dropped from about ` 26s ` to about ` 12.2s ` ;
585+ rewrite time dropped from about ` 22.4s ` to about ` 8.6s ` ; ` after_params `
586+ stayed ` 32 ` . This confirms the rule is causal and contributes the largest
587+ share of the slowdown, but it is not the only cost.
588+ - Falsification probe 2:
589+ temporarily added a cheap eligibility guard so the rule only ran when the
590+ polynomial had a ` -1 ` exponent term whose base appeared in ` POLYNOMIALS ` .
591+ Container time dropped to about ` 18.3s ` ; rewrite time dropped to about
592+ ` 14.6s ` ; ` after_params ` stayed ` 32 ` . This validates the guard direction, but
593+ the expression genuinely has many eligible reciprocal terms, so this guard is
594+ not sufficient.
595+ - Reduction probe:
596+ top-level fragments of the expression are individually fast. Several
597+ fragments run faster in containers than binary. The slowdown appears when the
598+ fragments are combined, consistent with cross-term accumulation of equivalent
599+ polynomial and reciprocal presentations rather than a single isolated
600+ arithmetic subexpression.
601+ - Current conclusion:
602+ the inverse-factor normalization rule is the primary hotspot. A likely fix
603+ should not remove it outright; it should narrow it from a whole-polynomial
604+ fold over all candidates to a one-reciprocal-factor-at-a-time rewrite, or add
605+ stronger eligibility/canonicality guards so it only runs when it can actually
606+ change a reciprocal term and not repeatedly rescan stable presentations.
607+
608+ ## 2026-05-04: replace inverse-factor rule with scored ` SOLE_MONOMIALS `
609+
610+ - Status:
611+ accepted local fix candidate.
612+ - Question:
613+ can the expensive inverse-factor normalization rule be removed if
614+ ` SOLE_MONOMIALS ` stops keeping a stale first alias?
615+ - Change tested:
616+ removed the broad reciprocal normalization rewrite from
617+ ` container_basic_rules ` . Changed ` SOLE_MONOMIALS ` from
618+ ` Map[Num, ContainerMonomial] ` to ` Map[Num, Pair[i64, ContainerMonomial]] ` .
619+ The merge now keeps the lower local score, with old-value tie breaking.
620+ - Witness after only removing the broad rule:
621+ `0.7096075864465768 / (-5.746100091872222* x0 +
622+ 2.873050045936111* exp(x0) + 1.7208022848793882)` regressed to
623+ ` -0.12349377405560805 / (-0.2994730786735579 + x0 - 0.5*exp(x0)) ` ,
624+ ` after_params=3 ` .
625+ - Alias evidence:
626+ the stale ` SOLE_MONOMIALS ` value for the denominator was
627+ `-5.746100091872222 * polynomial(-0.2994730786735579 + x0 -
628+ 0.5* exp(x0))`. The better alias was
629+ ` 2.873050045936111 * polynomial(0.5989461573471158 - 2*x0 + exp(x0)) ` .
630+ The second form has one fewer non-integer coefficient after decoding.
631+ - Failed probe:
632+ latest-wins merge fixed the witness, but that is still scheduler-order based
633+ rather than a preference rule.
634+ - Accepted probe:
635+ a scored merge plus a specific one-level nested-polynomial score picked the
636+ better alias with score ` 2 ` ; the stale alias scored worse. The focused
637+ witness now extracts
638+ ` 0.2469875481112161 / (0.5989461573471158 - 2.0*x0 + exp(x0)) ` ,
639+ ` after_params=2 ` .
640+ - Cycle regression:
641+ the self-factor cycle canary ` -14.792753236262874*x1 + x0 ` stayed
642+ extractable through four container rewrite/analysis rounds.
643+ - Sample comparison:
644+ `make -C python/egglog/exp/param_eq compare-binary-container-sample
645+ SAMPLE_LIMIT_PER_DATASET=50` compared 100 shared rows. After params were
646+ ` 9 better / 0 worse / 91 same ` for containers. Param-reduction still showed
647+ five worse rows because container lowering starts with fewer params on those
648+ rows; after-param float count did not regress.
649+ - Large outlier timing:
650+ rerunning ` /tmp/param_eq_perf_expr.py ` changed the prior container canary
651+ from about ` 31s ` to about ` 13.0s ` wall/report time, with
652+ ` after_params=32 ` preserved and e-graph size about ` 4477 ` . Binary remained
653+ about ` 0.76s ` to ` 0.85s ` , now extracting ` after_params=36 ` in this checkout.
654+ - Validation:
655+ `uv run pytest
656+ python/egglog/exp/param_eq/test_pipeline.py::test_container_uses_preferred_singleton_polynomial_for_inverse_scale
657+ python/egglog/exp/param_eq/test_pipeline.py::test_container_preserves_repeated_polynomial_products -q`
658+ passed.
659+ ` uv run ruff check python/egglog/exp/param_eq/pipeline.py ` passed.
0 commit comments