openspec: define performance-teaching-hardening change

LessUp · Copilot · LessUp · commit 53edff413c34 · 2026-04-28T10:02:43.000+08:00
Add the first-wave hardening OpenSpec change covering:
- SIMD runtime dispatch closure (example + test)
- Vectorization diagnostics surfaced in reader docs
- Sanitizer preset workflow in docs site and README
- Benchmark regression comparison script

Spec deltas added for simd-vectorization, benchmark-framework,
documentation, and ci-quality-assurance capabilities.

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;
diff --git a/openspec/changes/performance-teaching-hardening/design.md b/openspec/changes/performance-teaching-hardening/design.md
@@ -0,0 +1,57 @@
+# Design: Performance Teaching Hardening
+
+## Overview
+
+This change closes four teaching gaps on existing surfaces without introducing new modules. All work is bounded to `examples/04-simd-vectorization/`, `scripts/`, `docs/`, and module README files.
+
+## Design decisions
+
+### 1. SIMD runtime dispatch
+
+**Goal**: Show readers how to select the fastest available instruction path at runtime rather than at compile time.
+
+**Approach**: Add `examples/04-simd-vectorization/src/runtime_dispatch.cpp` with a `dispatch_add_arrays` function that uses `cpuid` (via `__builtin_cpu_supports` on GCC/Clang) to select AVX2, SSE2, or scalar at runtime. Export the function through the existing `simd_utils` interface library so the existing test runner can reach it.
+
+**Rationale**: `__builtin_cpu_supports` is available on GCC ≥ 4.8 and Clang ≥ 3.7, covers the C++17 baseline, and avoids a platform-specific CPUID wrapper. The function name stays within the `hpc::simd` namespace. A companion `tests/` entry validates correctness against the scalar reference.
+
+**Trade-off**: Does not use `ifunc` or a separate DSO; runtime dispatch is done once via a function pointer set at call site. This is simpler and sufficient for a teaching example.
+
+### 2. Vectorization diagnostics workflow
+
+**Goal**: Make compiler vectorization reports reachable without reading CMakeLists.txt.
+
+**Approach**: Add a "Vectorization Diagnostics" section to `examples/04-simd-vectorization/README.md` that shows the exact build commands (`cmake --preset=debug -DCMAKE_CXX_FLAGS="-fopt-info-vec"` / `-Rpass=loop-vectorize`) and how to read the output. Mirror a condensed version in `docs/` under the SIMD learning path entry.
+
+**Trade-off**: We do not add a new CMake preset for this; a reader-visible flag override is sufficient and avoids preset sprawl.
+
+### 3. Sanitizer workflow in reader-facing docs
+
+**Goal**: A reader can find and run ASan/TSan/UBSan without knowing the preset names in advance.
+
+**Approach**: Add a "Validation and Safety" page (or expand an existing section) in the VitePress docs site. Reference the three preset names (`asan`, `tsan`, `ubsan`) with copy-pasteable commands. Cross-link from the repository README quick-start.
+
+**Trade-off**: Keep this as documentation only. Do not add a new composite preset or script; the existing presets are complete.
+
+### 4. Benchmark regression comparison
+
+**Goal**: A maintainer can compare two benchmark JSON runs and see which benchmarks regressed.
+
+**Approach**: Add `scripts/compare_benchmarks.py` (Python 3, stdlib only — no third-party packages) that accepts two JSON files (baseline and candidate) and prints a table of benchmark name, baseline ns/iter, candidate ns/iter, and delta %. Exit code 1 if any benchmark regresses by more than a configurable threshold (default 10%). Add a "Regression Comparison" section to `examples/02-memory-cache/README.md` and the relevant benchmark docs entry showing the capture-and-compare workflow.
+
+**Rationale**: stdlib-only ensures the script works without a virtualenv. The threshold flag makes it usable in CI without hardcoding expected values.
+
+**Trade-off**: Does not publish results to a dashboard (out of scope). Does not integrate into GitHub Actions in this change (would be a follow-on if needed).
+
+## File surface
+
+| Path | Change |
+|------|--------|
+| `examples/04-simd-vectorization/src/runtime_dispatch.cpp` | New: runtime CPU dispatch example |
+| `examples/04-simd-vectorization/CMakeLists.txt` | Extend: wire `runtime_dispatch` target |
+| `tests/` (simd subdir) | New: correctness test for `dispatch_add_arrays` |
+| `examples/04-simd-vectorization/README.md` | Extend: vectorization diagnostics section |
+| `docs/` (SIMD learning path entry) | Extend: vectorization diagnostics, sanitizer link |
+| `docs/` (validation/safety page or section) | New or extend: sanitizer preset workflow |
+| `README.md` | Extend: cross-link to sanitizer docs |
+| `scripts/compare_benchmarks.py` | New: benchmark regression comparison script |
+| `benchmarks/` README or docs entry | Extend: capture-and-compare workflow |
diff --git a/openspec/changes/performance-teaching-hardening/proposal.md b/openspec/changes/performance-teaching-hardening/proposal.md
@@ -0,0 +1,36 @@
+# Proposal: Performance Teaching Hardening
+
+## Summary
+
+Harden the first wave of existing teaching surfaces: close the gap between the SIMD module's compile-time wrapper and runtime dispatch, surface vectorization and sanitizer workflows in reader-facing documentation, and establish a reproducible benchmark regression path.
+
+## Why
+
+The repository has solid example code and CI scaffolding but three gaps remain that reduce teaching value and maintainability:
+
+1. The SIMD module has a compile-time `FloatVec` alias that selects one instruction set at compile time. There is no runtime dispatch example, so readers who want portable SIMD code for heterogeneous deployments have no guide.
+2. Vectorization diagnostics (`-fopt-info-vec`, `-Rpass=loop-vectorize`) and sanitizer workflows (ASan/TSan/UBSan) are reachable via CMake presets but are not surfaced in reader-facing documentation.
+3. The benchmark suite produces JSON output but there is no documented or scripted path for comparing runs across commits, making regression detection manual and fragile.
+
+## Scope
+
+### In scope
+
+- Runtime CPU dispatch closure for the SIMD module (example + test)
+- Vectorization diagnostics workflow documented for readers
+- Sanitizer workflow surfaced in reader-facing docs and the docs site
+- Benchmark regression comparison: documented workflow and script
+
+### Out of scope
+
+- New teaching modules (concurrency, memory, modern-cpp extensions)
+- CI benchmark publishing to external dashboards
+- AVX-512 masking or gather/scatter intrinsics
+- Windows or macOS port validation
+
+## Success criteria
+
+- A `runtime_dispatch.cpp` example under `examples/04-simd-vectorization/src/` compiles and the corresponding test passes in the debug preset.
+- The example README and docs site entry for the SIMD module explain how to see compiler vectorization reports.
+- The docs site has a visible validation / sanitizer path that a reader can follow without reading CMakeLists.txt.
+- A `scripts/compare_benchmarks.py` script accepts two Google Benchmark JSON files and prints a human-readable regression report; the script is referenced from the benchmark module README.
diff --git a/openspec/changes/performance-teaching-hardening/specs/benchmark-framework/spec.md b/openspec/changes/performance-teaching-hardening/specs/benchmark-framework/spec.md
@@ -0,0 +1,22 @@
+# Benchmark Framework
+
+## ADDED Requirements
+
+### Requirement: Benchmark Regression Comparison
+
+THE HPC_Guide SHALL provide a script to compare two Google Benchmark JSON output files and identify regressions.
+
+#### Scenario: No regressions detected
+
+- **WHEN** `scripts/compare_benchmarks.py` is run with two JSON files where all benchmarks are within the threshold
+- **THEN** the script prints a comparison table and exits with code 0
+
+#### Scenario: Regression detected
+
+- **WHEN** `scripts/compare_benchmarks.py` is run with two JSON files where one or more benchmarks exceed the regression threshold
+- **THEN** the script prints the offending benchmarks and exits with code 1
+
+#### Scenario: Threshold configurable
+
+- **WHEN** the script is invoked with `--threshold N`
+- **THEN** the regression threshold is set to N percent rather than the default 10 percent
diff --git a/openspec/changes/performance-teaching-hardening/specs/ci-quality-assurance/spec.md b/openspec/changes/performance-teaching-hardening/specs/ci-quality-assurance/spec.md
@@ -0,0 +1,12 @@
+# CI and Quality Assurance
+
+## ADDED Requirements
+
+### Requirement: Benchmark Regression Script Testable
+
+THE Build_System SHALL allow the benchmark regression comparison script to be smoke-tested without a full benchmark run.
+
+#### Scenario: Script smoke test passes
+
+- **WHEN** `scripts/compare_benchmarks.py` is invoked with two synthesised JSON inputs (one stable, one regressed)
+- **THEN** it exits 0 for the stable case and exits 1 for the regressed case, confirming the script is functional
diff --git a/openspec/changes/performance-teaching-hardening/specs/documentation/spec.md b/openspec/changes/performance-teaching-hardening/specs/documentation/spec.md
@@ -0,0 +1,28 @@
+# Documentation
+
+## ADDED Requirements
+
+### Requirement: Sanitizer Workflow Visibility
+
+THE Documentation SHALL surface the sanitizer preset workflow so readers can find and run ASan, TSan, and UBSan without reading CMakeLists.txt or CMakePresets.json.
+
+#### Scenario: Reader finds sanitizer instructions
+
+- **WHEN** a reader opens the docs site validation section
+- **THEN** they find the `asan`, `tsan`, and `ubsan` preset names with copy-pasteable build-and-run commands
+
+#### Scenario: README cross-link present
+
+- **WHEN** a reader opens the root README quick-start
+- **THEN** there is a visible link to the sanitizer workflow documentation
+
+---
+
+### Requirement: Vectorization Diagnostics Reachable from Docs
+
+THE Documentation SHALL link readers from the docs site SIMD entry to the vectorization diagnostics workflow.
+
+#### Scenario: Docs site SIMD entry links diagnostics
+
+- **WHEN** a reader navigates to the SIMD module entry on the docs site
+- **THEN** they can reach instructions for enabling compiler vectorization reports
diff --git a/openspec/changes/performance-teaching-hardening/specs/simd-vectorization/spec.md b/openspec/changes/performance-teaching-hardening/specs/simd-vectorization/spec.md
@@ -0,0 +1,28 @@
+# SIMD Vectorization
+
+## ADDED Requirements
+
+### Requirement: Runtime CPU Dispatch
+
+THE Example_Module SHALL provide a runtime CPU dispatch example that selects the highest-available SIMD instruction set at runtime.
+
+#### Scenario: Runtime dispatch selects correct path
+
+- **WHEN** `dispatch_add_arrays` is called on a system with AVX2
+- **THEN** the AVX2 code path is selected and results match the scalar reference within floating-point tolerance
+
+#### Scenario: Runtime dispatch falls back gracefully
+
+- **WHEN** `dispatch_add_arrays` is called on a system without AVX2 or SSE2
+- **THEN** the scalar fallback path is used and results are correct
+
+---
+
+### Requirement: Vectorization Diagnostics Workflow
+
+THE Documentation SHALL document how to obtain compiler vectorization reports for the SIMD examples.
+
+#### Scenario: Reader enables vectorization diagnostics
+
+- **WHEN** a reader builds the SIMD examples with GCC (`-fopt-info-vec`) or Clang (`-Rpass=loop-vectorize`)
+- **THEN** the module README provides the exact command and an explanation of the output
diff --git a/openspec/changes/performance-teaching-hardening/tasks.md b/openspec/changes/performance-teaching-hardening/tasks.md
@@ -0,0 +1,24 @@
+# Tasks: Performance Teaching Hardening
+
+## 1. SIMD runtime dispatch
+
+- [ ] 1.1 Add `examples/04-simd-vectorization/src/runtime_dispatch.cpp` with `hpc::simd::dispatch_add_arrays` using `__builtin_cpu_supports` for AVX2/SSE2/scalar selection
+- [ ] 1.2 Register `runtime_dispatch` target in `examples/04-simd-vectorization/CMakeLists.txt` via `hpc_add_example`
+- [ ] 1.3 Add a correctness test under `tests/` that calls `dispatch_add_arrays` and validates results against the scalar reference
+- [ ] 1.4 Verify `cmake --preset=debug && cmake --build build/debug && ctest --preset=debug` passes with the new target and test
+
+## 2. Vectorization diagnostics documentation
+
+- [ ] 2.1 Add a "Vectorization Diagnostics" section to `examples/04-simd-vectorization/README.md` with GCC (`-fopt-info-vec`) and Clang (`-Rpass=loop-vectorize`) flag examples and sample output
+- [ ] 2.2 Add or extend a docs site page for the SIMD module to surface the vectorization diagnostics workflow for readers
+
+## 3. Sanitizer workflow documentation
+
+- [ ] 3.1 Add a "Validation and Safety" section to the VitePress docs site documenting the `asan`, `tsan`, and `ubsan` presets with copy-pasteable commands
+- [ ] 3.2 Cross-link the sanitizer section from the root `README.md` quick-start
+
+## 4. Benchmark regression comparison
+
+- [ ] 4.1 Add `scripts/compare_benchmarks.py`: accepts two Google Benchmark JSON files, prints a regression table (name, baseline, candidate, delta%), exits 1 if any benchmark exceeds the threshold (default 10%, configurable via `--threshold`)
+- [ ] 4.2 Add a "Regression Comparison" section to the benchmarks docs entry or `benchmarks/` README showing the capture-and-compare workflow
+- [ ] 4.3 Smoke-test the script with two synthesised JSON inputs to confirm it exits 0 on stable and 1 on a regressed run