Skip to content

Commit 37e3a3f

Browse files
Some perf improvements
1 parent 24965e0 commit 37e3a3f

4 files changed

Lines changed: 312 additions & 96 deletions

File tree

python/egglog/exp/param_eq/PERFORMANCE_DEBUG_LOG.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -939,3 +939,104 @@ encoding is slower even when the final e-graph is smaller.
939939
coefficient-factor presentations, which then feed repeated scalar coefficient
940940
factoring, `SOLE_MONOMIALS` flattening, Horner, and representative
941941
coefficient factoring through late rounds.
942+
943+
## 2026-05-04: coefficient-ratio outlier with large container e-graph
944+
945+
- Status:
946+
accepted local fix candidate.
947+
- Full expression:
948+
`0.02889 * x0^3 - 0.18081288 * x0^2 + 0.18081288 * x0 +
949+
0.14418036 * x1 - 0.00963 * exp(x1) + 0.179028 -
950+
0.00963 * exp(-15.767 * x0)`.
951+
- Baseline observation:
952+
binary ran in about `0.20s`, e-graph size `3443`, `after_nodes=29`,
953+
`after_params=5`.
954+
containers ran in about `1.35s`, e-graph size `1145`, `after_nodes=29`,
955+
`after_params=5`.
956+
- Reduced reproducer:
957+
replacing the exponentials with variables and dropping the unrelated `x1`
958+
term preserves the container slowdown:
959+
`0.02889*x0^3 - 0.18081288*x0^2 + 0.18081288*x0 +
960+
0.179028 - 0.00963*x2 - 0.00963*x3`.
961+
This ran in about `1.23s`, e-graph size `1035`, `after_nodes=21`,
962+
`after_params=3`.
963+
- Rule attribution on the reduced reproducer:
964+
`SOLE_MONOMIALS` flattening took `0.267s` with `4464` matches.
965+
coefficient-count factoring took `0.172s` with `2914` matches.
966+
Horner took `0.133s` with `1897` matches.
967+
representative coefficient factoring took `0.064s` with `2456` matches.
968+
- Failed probe:
969+
restoring the old `poly.length() <= 5` bound on `SOLE_MONOMIALS` flattening
970+
did not help. The same rule still fired on smaller equivalent presentations,
971+
and the full case got slightly slower (`1.35s -> 1.41s`).
972+
- Accepted changes:
973+
tightened integer-ratio coefficient factoring so a coefficient is counted
974+
only when it is a smaller exact integer divisor of another coefficient. This
975+
removes self-counting and equal-magnitude counting from that whole-polynomial
976+
scale rule; repeated/equal-and-opposite scalar subsets remain handled by the
977+
dedicated subset rule.
978+
also bounded the generic representative coefficient rule to polynomials with
979+
at most four monomials, leaving larger polynomials to the targeted
980+
coefficient rules.
981+
- Result:
982+
reduced reproducer improved from `1.23s`, size `1035` to `0.79s`, size
983+
`703`, with unchanged `after_nodes=21`, `after_params=3`.
984+
full expression improved from `1.35s`, size `1145` to `1.12s`, size `788`,
985+
with unchanged `after_nodes=29`, `after_params=5`.
986+
- Full corpus check:
987+
a fresh 714-row container run saturated every row. Compared with the existing
988+
container artifact, `after_params` was `0` better, `714` same, `0` worse.
989+
Compared with the binary artifact, `after_params` was `81` better, `633`
990+
same, `0` worse. Median runtime moved from `513.5ms` in the existing
991+
container artifact to `192.1ms`; median `after_params` stayed `5`.
992+
993+
## 2026-05-04: `pagie/144/Operon/25/original` exact-flatten slowdown
994+
995+
- Status:
996+
rejected local fix candidates and reverted exact-flatten edits. The row is
997+
still slower than desired, but the only zero-regression variants found either
998+
increased corpus runtime or depended on a shape-specific singleton rule.
999+
- Full expression:
1000+
`-5.5456266637e-06 + 1.0000027418136597 * ((exp(-4.225025177001953*x1*(1.425803780555725*x1)) + exp(-1.171636939048767*x0*(5.220480918884277*x0))) * 0.2746707499027252 - -1.9792732000350952 - (exp(-1.7990413904190063*x1*(0.5140095949172974*x1)) + exp(1.8088867664337158*x0*(-0.5105684995651245*x0))) * 1.2522673606872559)`.
1001+
- Baseline observation:
1002+
current accepted container rules ran the row at about `1.19s` in the corpus
1003+
artifact, e-graph size `699`, `after_nodes=33`, `after_params=7`.
1004+
A direct trace measured about `0.76s` report time, with analysis `0.10s`,
1005+
rewrite `0.48s`, and extraction `0.17s`.
1006+
- Reduced reproducer:
1007+
replacing the four exponentials with variables preserved the slowdown:
1008+
`-5.5456266637e-06 + 1.0000027418136597 *
1009+
(0.2746707499027252*(x0 + x1) - -1.9792732000350952 -
1010+
1.2522673606872559*(x2 + x3))`.
1011+
The reduced case ran at about `0.74s`, e-graph size `676`,
1012+
`after_nodes=13`, `after_params=3`.
1013+
- Hypothesis:
1014+
the slowdown is caused by exact nested-polynomial flattening expanding
1015+
isolated grouped sums, after which repeated scalar coefficient factoring
1016+
recreates the groups. This creates a flatten/refactor cycle even though the
1017+
extracted expression does not improve.
1018+
- Supporting probes:
1019+
disabling the repeated scalar coefficient subset rule made the target fast
1020+
(`~0.12s`, size `122`) but regressed focused parameter tests, so it was
1021+
rejected.
1022+
disabling exact nested flattening made the target fast (`~0.09s`, size `72`)
1023+
but regressed log-polynomial and coefficient canaries, so it was rejected.
1024+
limiting exact flattening to nested polynomials that contain constants made
1025+
the full corpus faster (`153.7s -> 148.2s`, size sum `54051 -> 48683`) but
1026+
introduced two `after_params` regressions:
1027+
`pagie/134/Operon/15/original` and `pagie/163/SBP/13/original`.
1028+
- Rejected fix candidates:
1029+
adding a generic single-nested-group helper preserved parameters but made the
1030+
full corpus slower (`153.7s -> 158.1s`).
1031+
adding singleton exact flatten for all constant-bearing bodies preserved
1032+
parameters but was much slower (`153.7s -> 176.8s`) and grew the e-graph
1033+
size sum (`54051 -> 59022`).
1034+
bounding that singleton rule by body length still preserved parameters only
1035+
at the shape needed by `pagie/163` and remained slower (`153.7s -> 171.9s`).
1036+
A final `poly1.length() == 7` singleton probe fixed the known regression but
1037+
was rejected as too example-shaped.
1038+
- Current decision:
1039+
keep the accepted baseline rules. Do not add a targeted exact-flatten rule
1040+
for this row. A future viable fix needs a general criterion for when
1041+
flattening a nested polynomial unlocks a downstream simplification, not a
1042+
body-length or row-shape condition.

0 commit comments

Comments
 (0)