Skip to content

Commit 9501fc2

Browse files
authored
Bug: Fix Regression from Peak to Have Statistical Significance (#178)
1 parent c7bcb65 commit 9501fc2

18 files changed

Lines changed: 1131 additions & 3803 deletions

.github/workflows/benchmarks-report.yml

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -51,15 +51,6 @@ jobs:
5151
git fetch origin main --depth=1
5252
git checkout origin/main -- tools/ci/bench/reporter/ 2>/dev/null || true
5353
54-
# Overlay main's bench-history.json onto the PR checkout so the
55-
# reporter's peak-attribution runs against the freshest history,
56-
# not whatever version was on the PR's branch point. `|| true` keeps
57-
# the step non-fatal if main has no file yet (first run after D3a).
58-
- name: Fetch latest bench-history.json from main
59-
run: |
60-
git fetch origin main --depth=1
61-
git show origin/main:tools/ci/bench/reporter/bench-history.json > tools/ci/bench/reporter/bench-history.json 2>/dev/null || true
62-
6354
- name: Download bench artifacts
6455
uses: dawidd6/action-download-artifact@v21
6556
with:
@@ -106,6 +97,9 @@ jobs:
10697
ENDED: ${{ github.event.workflow_run.updated_at }}
10798
run: |
10899
WALL_CLOCK=$(( $(date -d "$ENDED" +%s) - $(date -d "$STARTED" +%s) ))
100+
# --scope pr: peak attribution uses PR-iteration history only.
101+
# main-history is still loaded for drift quantification but excluded
102+
# from the comparison set.
109103
node tools/ci/bench/reporter/reporter.js \
110104
--results results \
111105
--sha '${{ github.event.workflow_run.head_sha }}' \
@@ -115,6 +109,7 @@ jobs:
115109
--base-ref 'main' \
116110
--repo '${{ github.repository }}' \
117111
--pr-history pr-history.json \
112+
--scope pr \
118113
--wall-clock "$WALL_CLOCK" \
119114
--out bench-report
120115
@@ -191,18 +186,21 @@ jobs:
191186

192187
- name: Append history entry
193188
run: |
194-
# The benched commit is the workflow_run's head_sha (the merge
195-
# commit on main). Parent comes from the git history we just
196-
# fetched. Timestamp is the bench run's completion time so the
197-
# history is ordered by when the measurement was taken, not
198-
# when the commit landed.
189+
# Benched commit is the workflow_run's head_sha; parent comes from
190+
# the depth-2 fetch above. Timestamp is the bench run's completion
191+
# so history is ordered by measurement, not commit-land time.
192+
# baseline-sha.txt is the sidecar uploaded next to each matrix
193+
# cell's tachometer JSON. Any cell's value works — all cells in
194+
# one workflow_run benched against the same baseline.
199195
BENCHED_SHA='${{ github.event.workflow_run.head_sha }}'
200196
PARENT_SHA=$(git rev-parse "$BENCHED_SHA^" 2>/dev/null || echo '')
197+
BASELINE_SHA=$(find results -name baseline-sha.txt -type f -exec cat {} \; -quit)
201198
node tools/ci/bench/reporter/append-history.js \
202199
--results results \
203200
--sha "$BENCHED_SHA" \
204201
--msg '${{ github.event.workflow_run.display_title }}' \
205202
--parent-sha "$PARENT_SHA" \
203+
--baseline-sha "$BASELINE_SHA" \
206204
--timestamp '${{ github.event.workflow_run.updated_at }}' \
207205
--history tools/ci/bench/reporter/bench-history.json
208206

.github/workflows/benchmarks.yml

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,19 +127,28 @@ jobs:
127127
# PR: baseline = base branch tip.
128128
# Push to main: baseline = this commit's parent (so the delta captures
129129
# the merged commit's effect; bench-history indexes the current
130-
# commit's absolute CI).
130+
# commit's absolute CI alongside the within-session percent-delta).
131+
#
132+
# Resolve baseline SHA inline so it can be written to the artifact as
133+
# a sidecar (baseline-sha.txt). The reporter pins each metric's
134+
# percent_delta_ci to that SHA — required for cross-iteration drift
135+
# detection (see tools/ci/bench/reporter/reporter.js:computeBaselineDrift).
131136
- name: Build baseline
132137
run: |
133138
if [ '${{ github.event_name }}' = 'push' ]; then
134139
# Fetch enough history to reach the parent commit locally.
135140
git fetch origin main --depth=2
141+
BASELINE_SHA=$(git rev-parse HEAD~1)
136142
git checkout HEAD~1 -- packages/*/src/
137143
else
138144
git fetch origin ${{ github.event.pull_request.base.ref }} --depth=1
145+
BASELINE_SHA=$(git rev-parse FETCH_HEAD)
139146
git checkout FETCH_HEAD -- packages/*/src/
140147
fi
141148
node packages/${{ matrix.entry.package }}/bench/tachometer/build-ci.js baseline
142149
git checkout HEAD -- packages/*/src/
150+
mkdir -p results
151+
echo "$BASELINE_SHA" > results/baseline-sha.txt
143152
144153
# Run just this matrix cell's single config.
145154
# Per-cell auto-sample tail is governed by the config's own `timeout`
@@ -156,4 +165,9 @@ jobs:
156165
uses: actions/upload-artifact@v7
157166
with:
158167
name: results-${{ matrix.entry.name }}
159-
path: results/*.json
168+
# Include baseline-sha.txt sidecar — the reporter and history
169+
# archiver read it to pin percent_delta_ci entries to their
170+
# baseline SHA.
171+
path: |
172+
results/*.json
173+
results/baseline-sha.txt

ai/plans/ROADMAP.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ Plans with an open PR or live pair work. Updated as ceremony when a PR opens; en
8383

8484
- [Release 0.18.0](active/release-0-18-0.md)[PR #122](https://github.com/Semantic-Org/Semantic-Next/pull/122) `docs/shippable` (menu trimming + audit pass pending). Ships the next tagged release; last was 0.17.0 in November.
8585
- [Signal Performance](active/signal-performance.md)[PR #150](https://github.com/Semantic-Org/Semantic-Next/pull/150) freeze-by-default. Perf story unresolved (see plan's Bench Results); release inclusion is the open call.
86+
- [Bench Peak Attribution](active/bench-peak-attribution.md)[PR #178](https://github.com/Semantic-Org/Semantic-Next/pull/178) methodology fix for cross-session absolute-ms comparisons. Eliminates phantom "Regressions from peak" on PRs.
8687

8788
---
8889

@@ -200,7 +201,7 @@ Slot in wherever there's a gap; not phase-gated.
200201
| P12 | [Template Spread Syntax](template-spread-syntax.md) | 4-8h | pair | scoped | `{>card ...friend}` — object spread in data passing. Ship when component templates demonstrate need. |
201202
| P13 | [Template Content Projection](template-wrapper-snippets.md) | 12-16h (1.5-2d) | pair | scoped | `{>content}` — content projection for snippets + subtemplates. Ship when component templates demonstrate need. |
202203
| P14 | [Template Let Bindings](template-let-bindings.md) | 10-14h (1-2d) | pair | scoped | `{#let}...{/let}` — snippet-for-vars. Ship when component templates demonstrate need. |
203-
| P15 | [Bench Reporter Overhaul](bench-reporter-overhaul.md) | 16-24h (2-3d) | pair | initial | Two coordinated tracks. **A — peak attribution correctness**: schema_v2 stores within-session percent-delta + tip-of-tree SHA; reporter peak compares same-session deltas; `--scope pr` drops main-history from PR comments. Fixes PR #174's 23 phantom regressions. **B — suite rationalization remainder** (from `icebox/tachometer-overhaul.md`): story-driven config reorg, triplet collapses, `wake-count-single-key` + `nested-mutation` micros, `timeout` final pass. Four PRs under `workflow_run` constraint. Supersedes the icebox plan. |
204+
| P15 | [Bench Peak Attribution](active/bench-peak-attribution.md) | 9-11h (1.5d) | pair | scoped | Fix the live peak-attribution bug. PR #174 (test-only, no perf changes) currently surfaces 25 phantom "Regressions from peak"; active perf PRs carry partial false-flagging too. Schema_v2 persists `percent_delta_ci` + `baseline_sha` per metric; reporter switches peak compare to same-session percent-delta; `--scope pr` drops main-history overlay on PR comments; drift flag with chain-of-percent-deltas when baselines differ. `bench-history.json` wiped to empty v2 (v1 entries fed the bug). Two PRs: methodology fix + suite cleanup (`toggle-{first,last}-10` + conditional `timeout` 3→2). |
204205

205206
---
206207

@@ -214,6 +215,6 @@ Plans drafted but not on the active roadmap. See `ai/plans/icebox/` for files.
214215
- [Signals TC39 Integration](icebox/signals-tc39-integration.md) — adopt native `Signal.State`/`Signal.Computed` as backing primitives when TC39 ships. Blocked on TC39 Stage 3+.
215216
- [Add Icon Stroke Width](icebox/add-icon-stroke-width.md) — power-user feature, post-1.0.
216217
- [Audit Fix Continuation](icebox/audit-fix-continuation.md) — process work for follow-up audits.
217-
- [Tachometer Overhaul — PR B remainder](icebox/tachometer-overhaul.md)suite rationalization + knob tuning + new benches. PR A (CI parallelization) and PR C (in-house Node reporter) shipped; PR B is the only outstanding piece.
218+
- [Bench Suite Expansion](icebox/bench-suite-expansion.md)file-scoped hot-path micros (`micro-expression-evaluator`, `micro-signal`, etc.) + new end-to-end benches (`wake-count-single-key`, `nested-mutation`, `hydrate-1000-card`). Surgical adds; lands when underlying perf work needs them.
218219
- [Contributing Surface](icebox/contributing-surface.md) — pre-1.0 stance + 1.0 graduation pass + post-1.0 triage flow (size + scope, GH-shaped vs md-shaped). Most icebox graduates at 1.0; the rest stays internal.
219220
- [Registry](icebox/registry.md) — community registry for components and behaviors, runtime + compile-time consumption from one source, author-namespaced publishing under `@sui-hub` with editorial canonical aliases above. Post-Phase 4.

0 commit comments

Comments
 (0)