You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: openspec/changes/performance-teaching-hardening/design.md
+25-6Lines changed: 25 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,16 +4,35 @@
4
4
5
5
This change closes four teaching gaps on existing surfaces without introducing new modules. All work is bounded to `examples/04-simd-vectorization/`, `scripts/`, `docs/`, and module README files.
**Goal**: Show readers how to select the fastest available instruction path at runtime rather than at compile time.
12
29
13
-
**Approach**: Add `examples/04-simd-vectorization/src/runtime_dispatch.cpp` with a `dispatch_add_arrays` function that uses `cpuid` (via `__builtin_cpu_supports` on GCC/Clang) to select AVX2, SSE2, or scalar at runtime. Export the function through the existing `simd_utils`interface library so the existing test runner can reach it.
30
+
**Approach**: Add `examples/04-simd-vectorization/src/runtime_dispatch.cpp` with a `dispatch_add_arrays` function that uses `cpuid` (via `__builtin_cpu_supports` on GCC/Clang) to select AVX2, SSE2, or scalar at runtime. The function is compiled into a new **`STATIC` library target `simd_dispatch`** (not the existing `simd_utils`INTERFACE target) that links `simd_utils` for headers. The example executable and the corresponding test both link `simd_dispatch`. `simd_utils` remains header-only.
14
31
15
32
**Rationale**: `__builtin_cpu_supports` is available on GCC ≥ 4.8 and Clang ≥ 3.7, covers the C++17 baseline, and avoids a platform-specific CPUID wrapper. The function name stays within the `hpc::simd` namespace. A companion `tests/` entry validates correctness against the scalar reference.
16
33
34
+
**Compiler guard**: `__builtin_cpu_supports` is a GCC/Clang extension. The implementation must wrap dispatch logic in `#if defined(__GNUC__) || defined(__clang__)`. On any other compiler (e.g., MSVC) the code falls through to the scalar path unconditionally. MSVC-specific CPUID dispatch is explicitly out of scope for this change.
35
+
17
36
**Trade-off**: Does not use `ifunc` or a separate DSO; runtime dispatch is done once via a function pointer set at call site. This is simpler and sufficient for a teaching example.
18
37
19
38
### 2. Vectorization diagnostics workflow
@@ -36,7 +55,7 @@ This change closes four teaching gaps on existing surfaces without introducing n
36
55
37
56
**Goal**: A maintainer can compare two benchmark JSON runs and see which benchmarks regressed.
38
57
39
-
**Approach**: Add `scripts/compare_benchmarks.py` (Python 3, stdlib only — no third-party packages) that accepts two JSON files (baseline and candidate) and prints a table of benchmark name, baseline ns/iter, candidate ns/iter, and delta %. Exit code 1 if any benchmark regresses by more than a configurable threshold (default 10%). Add a "Regression Comparison" section to `examples/02-memory-cache/README.md` and the relevant benchmark docs entry showing the capture-and-compare workflow.
58
+
**Approach**: Add `scripts/compare_benchmarks.py` (Python 3, stdlib only — no third-party packages) that accepts two JSON files (baseline and candidate) and prints a table of benchmark name, baseline ns/iter, candidate ns/iter, and delta %. Exit code 1 if any benchmark regresses by more than a configurable threshold (default 10%). Add a "Regression Comparison" section to `benchmarks/README.md` showing the capture-and-compare workflow.
40
59
41
60
**Rationale**: stdlib-only ensures the script works without a virtualenv. The threshold flag makes it usable in CI without hardcoding expected values.
42
61
@@ -47,11 +66,11 @@ This change closes four teaching gaps on existing surfaces without introducing n
47
66
| Path | Change |
48
67
|------|--------|
49
68
|`examples/04-simd-vectorization/src/runtime_dispatch.cpp`| New: runtime CPU dispatch example |
Copy file name to clipboardExpand all lines: openspec/changes/performance-teaching-hardening/tasks.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,22 +3,22 @@
3
3
## 1. SIMD runtime dispatch
4
4
5
5
-[ ] 1.1 Add `examples/04-simd-vectorization/src/runtime_dispatch.cpp` with `hpc::simd::dispatch_add_arrays` using `__builtin_cpu_supports` for AVX2/SSE2/scalar selection
6
-
-[ ] 1.2 Register `runtime_dispatch`target in `examples/04-simd-vectorization/CMakeLists.txt`via `hpc_add_example`
6
+
-[ ] 1.2 Add a `simd_dispatch` STATIC library target in `examples/04-simd-vectorization/CMakeLists.txt`(separate from the INTERFACE `simd_utils` target) that compiles `runtime_dispatch.cpp` and links `simd_utils` for headers
7
7
-[ ] 1.3 Add a correctness test under `tests/` that calls `dispatch_add_arrays` and validates results against the scalar reference
8
8
-[ ] 1.4 Verify `cmake --preset=debug && cmake --build build/debug && ctest --preset=debug` passes with the new target and test
9
9
10
10
## 2. Vectorization diagnostics documentation
11
11
12
12
-[ ] 2.1 Add a "Vectorization Diagnostics" section to `examples/04-simd-vectorization/README.md` with GCC (`-fopt-info-vec`) and Clang (`-Rpass=loop-vectorize`) flag examples and sample output
13
-
-[ ] 2.2 Add or extend a docs site page for the SIMD module to surface the vectorization diagnostics workflow for readers
13
+
-[ ] 2.2 Extend `docs/en/guides/learning-path.md` (SIMD section) to surface the vectorization diagnostics workflow for readers
14
14
15
15
## 3. Sanitizer workflow documentation
16
16
17
-
-[ ] 3.1 Add a "Validation and Safety" section to the VitePress docs site documenting the `asan`, `tsan`, and `ubsan` presets with copy-pasteable commands
17
+
-[ ] 3.1 Add `docs/en/guides/validation.md`documenting the `asan`, `tsan`, and `ubsan` presets with copy-pasteable commands; add entry to VitePress nav
18
18
-[ ] 3.2 Cross-link the sanitizer section from the root `README.md` quick-start
19
19
20
20
## 4. Benchmark regression comparison
21
21
22
22
-[ ] 4.1 Add `scripts/compare_benchmarks.py`: accepts two Google Benchmark JSON files, prints a regression table (name, baseline, candidate, delta%), exits 1 if any benchmark exceeds the threshold (default 10%, configurable via `--threshold`)
23
-
-[ ] 4.2 Add a "Regression Comparison" section to the benchmarks docs entry or `benchmarks/`README showing the capture-and-compare workflow
23
+
-[ ] 4.2 Add a "Regression Comparison" section to `benchmarks/README.md` showing the capture-and-compare workflow
24
24
-[ ] 4.3 Smoke-test the script with two synthesised JSON inputs to confirm it exits 0 on stable and 1 on a regressed run
0 commit comments