Skip to content

Build: Improve Bench Robustness & Reporting#181

Merged
jlukic merged 26 commits intomainfrom
feat/bench-coverage-expansion
May 5, 2026
Merged

Build: Improve Bench Robustness & Reporting#181
jlukic merged 26 commits intomainfrom
feat/bench-coverage-expansion

Conversation

@jlukic
Copy link
Copy Markdown
Member

@jlukic jlukic commented May 5, 2026

This PR is designed to improve tachometer bench coverage and usability with tachometer on PRs. It adds descriptions for all tests, and additional information on wins/losses inside a PR between commits. It also adds missing benchmarks that help round out the suite.

Changes

  • Adds new benchmark suites
  • Improves robustness and resilience of CI scripts, particularly around editing comments of existing bench runs
  • Adds reactivity, renderer, and compiler bench suites
  • Adds glossary at end of bench comment explaining benchs with descriptions
  • Adds win/loss/drift columns to intra-PR perf reporting

Risk

0/10 - CI only changes, blast radius is CI runs only

How to Test

  • Confirm performance bot reports results correctly on next PR with changes to packages/

@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
semantic-next Ready Ready Preview, Comment May 5, 2026 8:09pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
mcp Ignored Ignored Preview May 5, 2026 8:09pm

Request Review

@github-actions github-actions Bot added the Docs Modifies documentation label May 5, 2026
@semantic-performance-bot
Copy link
Copy Markdown

⚪ No Meaningful Change for cce2347 on Benchmark Suite 📊

Base: main · Action: #25383773434 · Raw: bench-report.json

Harness/Build: Bench coverage plan and metric purposes

Note

This PR did not move any measured metrics.

✅ 0 faster · ❌ 0 slower · 🔍 10 unsure · ⚪ 30 no change


⚪ No Change (30)

Metrics where this PR measured within ±2% of main — no meaningful performance change detected.

metric Change
add-20 -0.1% – +0.0%
bulk-add-500 -0.7% – +0.4%
clear-completed-250 -1.9% – +0.7%
create-10k -0.3% – +0.4%
create-1k -0.4% – +1.6%
edit-cycle-5 -0.4% – +0.1%
edit-start-10 -1.9% – +0.6%
filter-cycle-20 -0.4% – +0.7%
hydrate-each-100 -1.1% – +0.6%
hydrate-each-100-mount -0.7% – +0.6%
hydrate-helper-100-mount -0.8% – +0.5%
remove-10-middle -1.3% – +0.4%
remove-5-front -1.7% – +0.9%
remove-first-10 -1.0% – +0.9%
remove-last-10 -0.1% – +0.4%
remove-middle-10 +0.1% – +0.7%
remove-row-back-10 -1.2% – +0.3%
remove-row-front-20 -1.0% – +0.3%
remove-row-middle-20 -2.0% – +0.5%
select-40 -1.3% – +0.4%
signal-computed-chain-10x60k -1.1% – +0.1%
signal-reactive-fanout-500x1200 -0.6% – +1.5%
signal-reactive-list-replace-1000x1000 -1.6% – +1.1%
signal-reactive-set-property-by-id-200 -1.7% – +0.2%
swap-rows-20 -1.1% – +0.3%
toggle-10 -0.7% – +0.1%
toggle-all-20 -0.0% – +0.2%
toggle-first-10 -0.0% – +0.3%
toggle-last-10 +0.0% – +0.6%
toggle-middle-10 -0.1% – +0.4%
🔍 Unsure (10)

Inconclusive (8)

The measured difference is small, and our sampling couldn't confidently place it above or below zero. Running more samples in a future run might settle these metrics.

metric Change Expected Noise
append-1k -1.7% – +3.1% ±1%
clear-10k -3.7% – +2.5% ±1%
replace-1k -1.3% – +2.2% ±2%
signal-reactive-list-filter-1000x300 -2.7% – -0.0% ±1%
signal-reactive-multi-read-5x160k -3.0% – +0.7% ±1%
signal-reactive-push-2000x20 -2.6% – +0.3% ±1%
signal-reactive-set-index-300 -3.7% – +0.5% ±1%
update-10th-10 -0.9% – +2.5% ±1%

Too Fast to Measure Precisely (2)

On benches this short, system jitter (scheduling, GC, JIT) masks sub-4% changes; larger deltas still resolve cleanly.

metric Change Test Time Expected Noise
hydrate-helper-100-state-change -4.9% – +12.7% ~6ms ±25%
remove-5-back -2.4% – +0.4% ~76ms ±2%

Sample size: 50 · Resolution floor: ±2% · Timeout: 3min · Wall-clock: 17m57s

@semantic-performance-bot
Copy link
Copy Markdown

semantic-performance-bot Bot commented May 5, 2026

⚪ No Meaningful Change for 28afa94 on Benchmark Suite 📊

Base: main · Action: #25399508379 · Raw: bench-report.json

Build: Improve Bench Robustness & Reporting

Note

This PR did not move any measured metrics.

✅ 0 faster · ❌ 0 slower · 🔍 5 unsure · ⚪ 58 no change


⚪ No Change (58)

Metrics where this PR measured within ±2% of main — no meaningful performance change detected.

metric Change
active-indicator-200 0.0% – 0.0%
active-indicator-nested-200 -0.0% – +0.0%
add-20 -0.1% – +0.1%
bulk-add-500 -0.6% – +0.5%
clear-completed-250 -1.5% – +0.2%
create-10k -0.5% – +0.5%
create-1k -0.6% – +1.6%
edit-cycle-5 -1.0% – +0.6%
edit-start-10 -1.3% – +1.5%
filter-cycle-20 -1.2% – +1.1%
hydrate-each-100 -0.6% – +1.8%
hydrate-each-100-mount -0.8% – +1.0%
hydrate-helper-100-mount -1.2% – +1.3%
hydrate-helper-100-state-change -1.1% – +0.6%
micro-build-html-string-10k -0.4% – +1.8%
micro-compiler-ast-walk-5k -1.6% – +1.8%
micro-compiler-parse-cold-complex-200 -0.7% – +1.1%
micro-compiler-parse-cold-normal-500 -1.9% – +0.7%
micro-compiler-snippet-args-5k -0.8% – +0.7%
micro-dom-walker-1000x15 -1.3% – +1.4%
micro-expr-js-10k -1.6% – +1.0%
micro-expr-lisp-50k -0.7% – +1.3%
micro-expr-simple-100k -1.9% – -0.2%
reaction-coalesce-200x100 -1.1% – +0.3%
reaction-dep-diff-30k -0.9% – +1.3%
reaction-flush-noop-5m -1.9% – +0.3%
remove-10-middle -1.5% – +0.4%
remove-5-back -1.2% – +1.2%
remove-first-10 -1.4% – +1.5%
remove-last-10 -0.3% – +0.2%
remove-middle-10 -0.7% – +0.4%
remove-row-back-10 -0.9% – +0.2%
remove-row-front-20 -0.4% – +0.8%
remove-row-middle-20 -1.1% – +0.9%
rename-50 -0.1% – +0.1%
select-40 -0.1% – +1.1%
signal-computed-chain-10x60k -0.5% – +0.6%
signal-reactive-fanout-500x1200 -0.8% – +1.1%
signal-reactive-list-filter-1000x300 -1.6% – +0.9%
signal-reactive-list-replace-1000x1000 -1.5% – +0.8%
signal-reactive-multi-read-5x160k -0.6% – +0.9%
signal-reactive-set-index-300 -0.6% – +1.3%
signal-reactive-set-property-by-id-200 -1.0% – +0.3%
signal-set-same-10m -1.2% – +0.7%
signal-sub-unsub-100k -0.0% – +1.2%
snippet-args-per-key-100 -0.0% – 0.0%
snippet-in-subtemplate-100 0.0% – 0.0%
stable-ref-mutate-500 0.0% – 0.0%
subtemplate-data-blob-100 -0.0% – +0.0%
subtemplate-reactive-data-100 -0.4% – +0.1%
subtemplate-shorthand-props-100 -0.0% – +0.1%
swap-rows-20 -0.9% – +0.5%
toggle-10 -0.7% – +0.2%
toggle-all-20 -0.1% – +0.2%
toggle-first-10 -0.2% – +0.1%
toggle-last-10 -0.3% – +0.4%
toggle-middle-10 -0.1% – +0.7%
update-10th-10 -1.3% – +1.5%
🔍 Unsure (5)

Inconclusive (3)

The measured difference is small, and our sampling couldn't confidently place it above or below zero. Running more samples in a future run might settle these metrics.

metric Change Expected Noise
append-1k -0.4% – +4.3% ±1%
clear-10k -2.1% – +5.6% ±1%
signal-reactive-push-2000x20 -0.3% – +2.3% ±1%

Too Fast to Measure Precisely (2)

On benches this short, system jitter (scheduling, GC, JIT) masks sub-4% changes; larger deltas still resolve cleanly.

metric Change Test Time Expected Noise
remove-5-front -2.5% – -0.2% ~87ms ±2%
replace-1k +0.3% – +2.4% ~100ms ±2%

Sample size: 50 · Resolution floor: ±2% · Timeout: 3min · Wall-clock: 16m48s

… index, declared compiler devDep, dead-link fixes
@jlukic jlukic changed the title Harness/Build: Bench coverage plan and metric purposes Build: Bench Coverage Expansion May 5, 2026
@jlukic jlukic changed the title Build: Bench Coverage Expansion Build: Improve Bench Robustness & Reporting May 5, 2026
Workspace is gitignored, so any tracked file linking into it is dead by
definition. Inline the voice rules that were parked behind the link, and
drop the calibration-log-artifact pointer from the sessions table — the
substantive findings already follow.
jlukic added 2 commits May 5, 2026 15:36
Workload measured regex throughput more than reactive dispatch, and the
FGR plan it was speculatively designed for doesn't ship expression-eval
memoization. Drop pre-merge rather than ship a metric that won't move
under the work it's measuring.
snippet-in-subtemplate-100: 25 cards each invoking 4 inner snippets,
mutate parent prop's source. Tests dataDep pollution from receivesData:
true subtemplate into inner snippet bodies. Distinct from top-level
snippet-args-per-key-100.

active-indicator-nested-200: 5×10×4 nav-menu shape with external
currentUrl helper. First bench exercising cross-layer isolation across
three nested each blocks.

rename-50: pure setProperty(id, 'title') flood on the existing todoItem
subtemplate, no editingId co-fires. Tightens the partial coverage in
edit-cycle-5.
@jlukic jlukic merged commit 1ba5e85 into main May 5, 2026
21 checks passed
@jlukic jlukic deleted the feat/bench-coverage-expansion branch May 5, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI modifies continuous integration Docs Modifies documentation Tests Modifies tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant