Skip to content

Commit 53edff4

Browse files
LessUpCopilot
andcommitted
openspec: define performance-teaching-hardening change
Add the first-wave hardening OpenSpec change covering: - SIMD runtime dispatch closure (example + test) - Vectorization diagnostics surfaced in reader docs - Sanitizer preset workflow in docs site and README - Benchmark regression comparison script Spec deltas added for simd-vectorization, benchmark-framework, documentation, and ci-quality-assurance capabilities. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent b878021 commit 53edff4

7 files changed

Lines changed: 207 additions & 0 deletions

File tree

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Design: Performance Teaching Hardening
2+
3+
## Overview
4+
5+
This change closes four teaching gaps on existing surfaces without introducing new modules. All work is bounded to `examples/04-simd-vectorization/`, `scripts/`, `docs/`, and module README files.
6+
7+
## Design decisions
8+
9+
### 1. SIMD runtime dispatch
10+
11+
**Goal**: Show readers how to select the fastest available instruction path at runtime rather than at compile time.
12+
13+
**Approach**: Add `examples/04-simd-vectorization/src/runtime_dispatch.cpp` with a `dispatch_add_arrays` function that uses `cpuid` (via `__builtin_cpu_supports` on GCC/Clang) to select AVX2, SSE2, or scalar at runtime. Export the function through the existing `simd_utils` interface library so the existing test runner can reach it.
14+
15+
**Rationale**: `__builtin_cpu_supports` is available on GCC ≥ 4.8 and Clang ≥ 3.7, covers the C++17 baseline, and avoids a platform-specific CPUID wrapper. The function name stays within the `hpc::simd` namespace. A companion `tests/` entry validates correctness against the scalar reference.
16+
17+
**Trade-off**: Does not use `ifunc` or a separate DSO; runtime dispatch is done once via a function pointer set at call site. This is simpler and sufficient for a teaching example.
18+
19+
### 2. Vectorization diagnostics workflow
20+
21+
**Goal**: Make compiler vectorization reports reachable without reading CMakeLists.txt.
22+
23+
**Approach**: Add a "Vectorization Diagnostics" section to `examples/04-simd-vectorization/README.md` that shows the exact build commands (`cmake --preset=debug -DCMAKE_CXX_FLAGS="-fopt-info-vec"` / `-Rpass=loop-vectorize`) and how to read the output. Mirror a condensed version in `docs/` under the SIMD learning path entry.
24+
25+
**Trade-off**: We do not add a new CMake preset for this; a reader-visible flag override is sufficient and avoids preset sprawl.
26+
27+
### 3. Sanitizer workflow in reader-facing docs
28+
29+
**Goal**: A reader can find and run ASan/TSan/UBSan without knowing the preset names in advance.
30+
31+
**Approach**: Add a "Validation and Safety" page (or expand an existing section) in the VitePress docs site. Reference the three preset names (`asan`, `tsan`, `ubsan`) with copy-pasteable commands. Cross-link from the repository README quick-start.
32+
33+
**Trade-off**: Keep this as documentation only. Do not add a new composite preset or script; the existing presets are complete.
34+
35+
### 4. Benchmark regression comparison
36+
37+
**Goal**: A maintainer can compare two benchmark JSON runs and see which benchmarks regressed.
38+
39+
**Approach**: Add `scripts/compare_benchmarks.py` (Python 3, stdlib only — no third-party packages) that accepts two JSON files (baseline and candidate) and prints a table of benchmark name, baseline ns/iter, candidate ns/iter, and delta %. Exit code 1 if any benchmark regresses by more than a configurable threshold (default 10%). Add a "Regression Comparison" section to `examples/02-memory-cache/README.md` and the relevant benchmark docs entry showing the capture-and-compare workflow.
40+
41+
**Rationale**: stdlib-only ensures the script works without a virtualenv. The threshold flag makes it usable in CI without hardcoding expected values.
42+
43+
**Trade-off**: Does not publish results to a dashboard (out of scope). Does not integrate into GitHub Actions in this change (would be a follow-on if needed).
44+
45+
## File surface
46+
47+
| Path | Change |
48+
|------|--------|
49+
| `examples/04-simd-vectorization/src/runtime_dispatch.cpp` | New: runtime CPU dispatch example |
50+
| `examples/04-simd-vectorization/CMakeLists.txt` | Extend: wire `runtime_dispatch` target |
51+
| `tests/` (simd subdir) | New: correctness test for `dispatch_add_arrays` |
52+
| `examples/04-simd-vectorization/README.md` | Extend: vectorization diagnostics section |
53+
| `docs/` (SIMD learning path entry) | Extend: vectorization diagnostics, sanitizer link |
54+
| `docs/` (validation/safety page or section) | New or extend: sanitizer preset workflow |
55+
| `README.md` | Extend: cross-link to sanitizer docs |
56+
| `scripts/compare_benchmarks.py` | New: benchmark regression comparison script |
57+
| `benchmarks/` README or docs entry | Extend: capture-and-compare workflow |
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Proposal: Performance Teaching Hardening
2+
3+
## Summary
4+
5+
Harden the first wave of existing teaching surfaces: close the gap between the SIMD module's compile-time wrapper and runtime dispatch, surface vectorization and sanitizer workflows in reader-facing documentation, and establish a reproducible benchmark regression path.
6+
7+
## Why
8+
9+
The repository has solid example code and CI scaffolding but three gaps remain that reduce teaching value and maintainability:
10+
11+
1. The SIMD module has a compile-time `FloatVec` alias that selects one instruction set at compile time. There is no runtime dispatch example, so readers who want portable SIMD code for heterogeneous deployments have no guide.
12+
2. Vectorization diagnostics (`-fopt-info-vec`, `-Rpass=loop-vectorize`) and sanitizer workflows (ASan/TSan/UBSan) are reachable via CMake presets but are not surfaced in reader-facing documentation.
13+
3. The benchmark suite produces JSON output but there is no documented or scripted path for comparing runs across commits, making regression detection manual and fragile.
14+
15+
## Scope
16+
17+
### In scope
18+
19+
- Runtime CPU dispatch closure for the SIMD module (example + test)
20+
- Vectorization diagnostics workflow documented for readers
21+
- Sanitizer workflow surfaced in reader-facing docs and the docs site
22+
- Benchmark regression comparison: documented workflow and script
23+
24+
### Out of scope
25+
26+
- New teaching modules (concurrency, memory, modern-cpp extensions)
27+
- CI benchmark publishing to external dashboards
28+
- AVX-512 masking or gather/scatter intrinsics
29+
- Windows or macOS port validation
30+
31+
## Success criteria
32+
33+
- A `runtime_dispatch.cpp` example under `examples/04-simd-vectorization/src/` compiles and the corresponding test passes in the debug preset.
34+
- The example README and docs site entry for the SIMD module explain how to see compiler vectorization reports.
35+
- The docs site has a visible validation / sanitizer path that a reader can follow without reading CMakeLists.txt.
36+
- A `scripts/compare_benchmarks.py` script accepts two Google Benchmark JSON files and prints a human-readable regression report; the script is referenced from the benchmark module README.
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Benchmark Framework
2+
3+
## ADDED Requirements
4+
5+
### Requirement: Benchmark Regression Comparison
6+
7+
THE HPC_Guide SHALL provide a script to compare two Google Benchmark JSON output files and identify regressions.
8+
9+
#### Scenario: No regressions detected
10+
11+
- **WHEN** `scripts/compare_benchmarks.py` is run with two JSON files where all benchmarks are within the threshold
12+
- **THEN** the script prints a comparison table and exits with code 0
13+
14+
#### Scenario: Regression detected
15+
16+
- **WHEN** `scripts/compare_benchmarks.py` is run with two JSON files where one or more benchmarks exceed the regression threshold
17+
- **THEN** the script prints the offending benchmarks and exits with code 1
18+
19+
#### Scenario: Threshold configurable
20+
21+
- **WHEN** the script is invoked with `--threshold N`
22+
- **THEN** the regression threshold is set to N percent rather than the default 10 percent
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# CI and Quality Assurance
2+
3+
## ADDED Requirements
4+
5+
### Requirement: Benchmark Regression Script Testable
6+
7+
THE Build_System SHALL allow the benchmark regression comparison script to be smoke-tested without a full benchmark run.
8+
9+
#### Scenario: Script smoke test passes
10+
11+
- **WHEN** `scripts/compare_benchmarks.py` is invoked with two synthesised JSON inputs (one stable, one regressed)
12+
- **THEN** it exits 0 for the stable case and exits 1 for the regressed case, confirming the script is functional
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Documentation
2+
3+
## ADDED Requirements
4+
5+
### Requirement: Sanitizer Workflow Visibility
6+
7+
THE Documentation SHALL surface the sanitizer preset workflow so readers can find and run ASan, TSan, and UBSan without reading CMakeLists.txt or CMakePresets.json.
8+
9+
#### Scenario: Reader finds sanitizer instructions
10+
11+
- **WHEN** a reader opens the docs site validation section
12+
- **THEN** they find the `asan`, `tsan`, and `ubsan` preset names with copy-pasteable build-and-run commands
13+
14+
#### Scenario: README cross-link present
15+
16+
- **WHEN** a reader opens the root README quick-start
17+
- **THEN** there is a visible link to the sanitizer workflow documentation
18+
19+
---
20+
21+
### Requirement: Vectorization Diagnostics Reachable from Docs
22+
23+
THE Documentation SHALL link readers from the docs site SIMD entry to the vectorization diagnostics workflow.
24+
25+
#### Scenario: Docs site SIMD entry links diagnostics
26+
27+
- **WHEN** a reader navigates to the SIMD module entry on the docs site
28+
- **THEN** they can reach instructions for enabling compiler vectorization reports
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# SIMD Vectorization
2+
3+
## ADDED Requirements
4+
5+
### Requirement: Runtime CPU Dispatch
6+
7+
THE Example_Module SHALL provide a runtime CPU dispatch example that selects the highest-available SIMD instruction set at runtime.
8+
9+
#### Scenario: Runtime dispatch selects correct path
10+
11+
- **WHEN** `dispatch_add_arrays` is called on a system with AVX2
12+
- **THEN** the AVX2 code path is selected and results match the scalar reference within floating-point tolerance
13+
14+
#### Scenario: Runtime dispatch falls back gracefully
15+
16+
- **WHEN** `dispatch_add_arrays` is called on a system without AVX2 or SSE2
17+
- **THEN** the scalar fallback path is used and results are correct
18+
19+
---
20+
21+
### Requirement: Vectorization Diagnostics Workflow
22+
23+
THE Documentation SHALL document how to obtain compiler vectorization reports for the SIMD examples.
24+
25+
#### Scenario: Reader enables vectorization diagnostics
26+
27+
- **WHEN** a reader builds the SIMD examples with GCC (`-fopt-info-vec`) or Clang (`-Rpass=loop-vectorize`)
28+
- **THEN** the module README provides the exact command and an explanation of the output
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Tasks: Performance Teaching Hardening
2+
3+
## 1. SIMD runtime dispatch
4+
5+
- [ ] 1.1 Add `examples/04-simd-vectorization/src/runtime_dispatch.cpp` with `hpc::simd::dispatch_add_arrays` using `__builtin_cpu_supports` for AVX2/SSE2/scalar selection
6+
- [ ] 1.2 Register `runtime_dispatch` target in `examples/04-simd-vectorization/CMakeLists.txt` via `hpc_add_example`
7+
- [ ] 1.3 Add a correctness test under `tests/` that calls `dispatch_add_arrays` and validates results against the scalar reference
8+
- [ ] 1.4 Verify `cmake --preset=debug && cmake --build build/debug && ctest --preset=debug` passes with the new target and test
9+
10+
## 2. Vectorization diagnostics documentation
11+
12+
- [ ] 2.1 Add a "Vectorization Diagnostics" section to `examples/04-simd-vectorization/README.md` with GCC (`-fopt-info-vec`) and Clang (`-Rpass=loop-vectorize`) flag examples and sample output
13+
- [ ] 2.2 Add or extend a docs site page for the SIMD module to surface the vectorization diagnostics workflow for readers
14+
15+
## 3. Sanitizer workflow documentation
16+
17+
- [ ] 3.1 Add a "Validation and Safety" section to the VitePress docs site documenting the `asan`, `tsan`, and `ubsan` presets with copy-pasteable commands
18+
- [ ] 3.2 Cross-link the sanitizer section from the root `README.md` quick-start
19+
20+
## 4. Benchmark regression comparison
21+
22+
- [ ] 4.1 Add `scripts/compare_benchmarks.py`: accepts two Google Benchmark JSON files, prints a regression table (name, baseline, candidate, delta%), exits 1 if any benchmark exceeds the threshold (default 10%, configurable via `--threshold`)
23+
- [ ] 4.2 Add a "Regression Comparison" section to the benchmarks docs entry or `benchmarks/` README showing the capture-and-compare workflow
24+
- [ ] 4.3 Smoke-test the script with two synthesised JSON inputs to confirm it exits 0 on stable and 1 on a regressed run

0 commit comments

Comments
 (0)