Skip to content

Commit c89e1be

Browse files
committed
feat(simd): add runtime dispatch and vectorization diagnostics
- Add SIMD runtime dispatch with AVX2/SSE2/scalar selection - Add vectorization diagnostics documentation (GCC/Clang flags) - Add sanitizer workflow documentation (validation.md) - Add benchmark regression comparison script - Update learning path with SIMD diagnostics workflow - Cross-link sanitizer docs from README quick-start
1 parent c532ab5 commit c89e1be

17 files changed

Lines changed: 631 additions & 1 deletion

File tree

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,9 @@ Run one benchmark:
6161
./build/release/examples/02-memory-cache/aos_soa_bench
6262
```
6363

64+
Need sanitizer-specific guidance after the quick start? See
65+
[`docs/en/guides/validation.md`](docs/en/guides/validation.md).
66+
6467
## Validation commands
6568

6669
```bash
@@ -78,6 +81,7 @@ cmake --preset=ubsan && cmake --build build/ubsan && ctest --preset=ubsan
7881
- **Quick start:** `docs/en/getting-started/quickstart.md`
7982
- **Learning path:** `docs/en/guides/learning-path.md`
8083
- **Profiling guide:** `docs/en/guides/profiling-guide.md`
84+
- **Validation & sanitizers:** `docs/en/guides/validation.md`
8185
- **Chinese entry:** `README.zh-CN.md` and `docs/zh/`
8286

8387
## Development workflow

README.zh-CN.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,9 @@ cmake --build build/release
6161
./build/release/examples/02-memory-cache/aos_soa_bench
6262
```
6363

64+
如果你想在快速开始之后直接使用 sanitizer,请查看
65+
[`docs/zh/guides/validation.md`](docs/zh/guides/validation.md)
66+
6467
## 常用验证命令
6568

6669
```bash
@@ -78,6 +81,7 @@ cmake --preset=ubsan && cmake --build build/ubsan && ctest --preset=ubsan
7881
- **快速开始:** `docs/zh/getting-started/quickstart.md`
7982
- **学习路径:** `docs/zh/guides/learning-path.md`
8083
- **性能分析指南:** `docs/zh/guides/profiling-guide.md`
84+
- **验证与 Sanitizer:** `docs/zh/guides/validation.md`
8185
- **英文入口:** `README.md``docs/en/`
8286

8387
## 开发流程

benchmarks/README.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Benchmarks
2+
3+
This directory holds shared benchmark utilities plus the local workflow for
4+
capturing and comparing Google Benchmark JSON output.
5+
6+
---
7+
8+
## Regression Comparison
9+
10+
1. Build the optimized benchmark targets:
11+
12+
```bash
13+
cmake --preset=release
14+
cmake --build build/release
15+
```
16+
17+
2. Capture a baseline run:
18+
19+
```bash
20+
./build/release/examples/04-simd-vectorization/simd_bench \
21+
--benchmark_format=json \
22+
--benchmark_out=simd-baseline.json
23+
```
24+
25+
3. Capture a candidate run after your change:
26+
27+
```bash
28+
./build/release/examples/04-simd-vectorization/simd_bench \
29+
--benchmark_format=json \
30+
--benchmark_out=simd-candidate.json
31+
```
32+
33+
4. Compare the two runs:
34+
35+
```bash
36+
python3 scripts/compare_benchmarks.py simd-baseline.json simd-candidate.json --threshold 10
37+
```
38+
39+
The script prints a per-benchmark table with baseline time, candidate time, and
40+
delta percentage. It exits with code `1` when any benchmark regresses by more
41+
than the threshold, which makes it suitable for local smoke checks or future CI
42+
gating.

docs/.vitepress/config.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ export default defineConfig({
5858
{ text: 'Learning Path', link: '/en/guides/learning-path' },
5959
{ text: 'Optimization Decision Tree', link: '/en/guides/optimization-decision-tree' },
6060
{ text: 'Profiling Guide', link: '/en/guides/profiling-guide' },
61+
{ text: 'Validation & Sanitizers', link: '/en/guides/validation' },
6162
{ text: 'Best Practices', link: '/en/guides/best-practices' },
6263
],
6364
},
@@ -94,6 +95,7 @@ export default defineConfig({
9495
{ text: 'Learning Path', link: '/en/guides/learning-path' },
9596
{ text: 'Optimization Decision Tree', link: '/en/guides/optimization-decision-tree' },
9697
{ text: 'Profiling Guide', link: '/en/guides/profiling-guide' },
98+
{ text: 'Validation & Sanitizers', link: '/en/guides/validation' },
9799
{ text: 'Best Practices', link: '/en/guides/best-practices' },
98100
],
99101
},
@@ -160,6 +162,7 @@ export default defineConfig({
160162
{ text: '学习路径', link: '/zh/guides/learning-path' },
161163
{ text: '优化决策树', link: '/zh/guides/optimization-decision-tree' },
162164
{ text: '性能分析指南', link: '/zh/guides/profiling-guide' },
165+
{ text: '验证与 Sanitizer', link: '/zh/guides/validation' },
163166
{ text: '最佳实践', link: '/zh/guides/best-practices' },
164167
],
165168
},
@@ -196,6 +199,7 @@ export default defineConfig({
196199
{ text: '学习路径', link: '/zh/guides/learning-path' },
197200
{ text: '优化决策树', link: '/zh/guides/optimization-decision-tree' },
198201
{ text: '性能分析指南', link: '/zh/guides/profiling-guide' },
202+
{ text: '验证与 Sanitizer', link: '/zh/guides/validation' },
199203
{ text: '最佳实践', link: '/zh/guides/best-practices' },
200204
],
201205
},

docs/en/guides/learning-path.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,17 @@ Let the compiler do the work.
186186
-Rpass=loop-vectorize
187187
```
188188

189+
**Repository workflow:**
190+
```bash
191+
cmake --preset=release -DHPC_VECTORIZE_REPORT=ON
192+
cmake --build build/release --target auto_vectorize
193+
```
194+
195+
`HPC_VECTORIZE_REPORT` enables the same compiler-specific diagnostics for the
196+
example target while keeping the default preset list stable. For sanitizer-led
197+
verification after SIMD changes, see
198+
[Validation & Sanitizers](./validation.md).
199+
189200
### 4.2 SIMD Intrinsics
190201

191202
Manual vectorization for maximum control.
@@ -203,6 +214,7 @@ Readable SIMD code.
203214
- Abstracting intrinsics
204215
- Scalar fallback implementations
205216
- Type-safe SIMD operations
217+
- Runtime dispatch for mixed CPU fleets
206218

207219
---
208220

docs/en/guides/validation.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Validation & Sanitizers
2+
3+
Use the preset-driven validation path first, then pick the sanitizer that
4+
matches the failure mode you are investigating.
5+
6+
---
7+
8+
## Quick reference
9+
10+
| Preset | Best for | Notes |
11+
| --- | --- | --- |
12+
| `asan` | heap/stack overflows, use-after-free, double free | Benchmarks are disabled in this preset |
13+
| `tsan` | data races, unsafe synchronization | This preset switches to `clang` / `clang++` |
14+
| `ubsan` | undefined behavior, invalid shifts, signed overflow | Good follow-up after functional fixes |
15+
16+
---
17+
18+
## AddressSanitizer
19+
20+
```bash
21+
cmake --preset=asan
22+
cmake --build build/asan
23+
ctest --preset=asan
24+
```
25+
26+
Use `asan` when you suspect invalid memory access, lifetime bugs, or accidental
27+
buffer overruns.
28+
29+
## ThreadSanitizer
30+
31+
```bash
32+
cmake --preset=tsan
33+
cmake --build build/tsan
34+
ctest --preset=tsan
35+
```
36+
37+
Use `tsan` for concurrent code paths. The preset already selects `clang` /
38+
`clang++`, which is the supported toolchain in this repository.
39+
40+
## UndefinedBehaviorSanitizer
41+
42+
```bash
43+
cmake --preset=ubsan
44+
cmake --build build/ubsan
45+
ctest --preset=ubsan
46+
```
47+
48+
Use `ubsan` to surface undefined behavior that may stay invisible in normal
49+
debug or release builds.
50+
51+
---
52+
53+
## Suggested workflow
54+
55+
1. Start with `debug` or `release` to reproduce the issue normally.
56+
2. Run `asan` for memory-safety problems.
57+
3. Run `tsan` for concurrency changes or flaky parallel tests.
58+
4. Run `ubsan` before closing work that touches low-level arithmetic, casts, or
59+
layout assumptions.
60+
61+
The repository keeps these as separate presets on purpose: they stay easy to
62+
discover, easy to automate, and do not hide compiler-specific sanitizer
63+
constraints behind extra wrapper scripts.

docs/zh/guides/learning-path.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,16 @@ flowchart LR
186186
-Rpass=loop-vectorize
187187
```
188188

189+
**仓库内推荐工作流:**
190+
```bash
191+
cmake --preset=release -DHPC_VECTORIZE_REPORT=ON
192+
cmake --build build/release --target auto_vectorize
193+
```
194+
195+
`HPC_VECTORIZE_REPORT` 会为示例目标开启同一套编译器向量化诊断,同时不新增
196+
默认 preset。若需要在 SIMD 修改后继续做 sanitizer 验证,请参考
197+
[验证与 Sanitizer](./validation.md)
198+
189199
### 4.2 SIMD 内在函数
190200

191201
手动向量化以获得最大控制力。
@@ -203,6 +213,7 @@ flowchart LR
203213
- 封装内在函数
204214
- 标量回退实现
205215
- 类型安全的 SIMD 操作
216+
- 面向混合 CPU 环境的运行时分发
206217

207218
---
208219

docs/zh/guides/validation.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# 验证与 Sanitizer
2+
3+
先走 preset 驱动的常规验证路径,再根据你正在排查的问题选择对应的
4+
sanitizer。
5+
6+
---
7+
8+
## 快速参考
9+
10+
| Preset | 适合发现的问题 | 备注 |
11+
| --- | --- | --- |
12+
| `asan` | 堆/栈越界、use-after-free、double free | 该 preset 会关闭 benchmark |
13+
| `tsan` | 数据竞争、同步错误 | 该 preset 会切换到 `clang` / `clang++` |
14+
| `ubsan` | 未定义行为、非法移位、有符号溢出 | 很适合作为功能修复后的补充验证 |
15+
16+
---
17+
18+
## AddressSanitizer
19+
20+
```bash
21+
cmake --preset=asan
22+
cmake --build build/asan
23+
ctest --preset=asan
24+
```
25+
26+
当你怀疑存在非法内存访问、对象生命周期错误或缓冲区越界时,优先使用
27+
`asan`
28+
29+
## ThreadSanitizer
30+
31+
```bash
32+
cmake --preset=tsan
33+
cmake --build build/tsan
34+
ctest --preset=tsan
35+
```
36+
37+
当修改涉及并发路径时使用 `tsan`。该 preset 已经为仓库切换到了受支持的
38+
`clang` / `clang++` 工具链。
39+
40+
## UndefinedBehaviorSanitizer
41+
42+
```bash
43+
cmake --preset=ubsan
44+
cmake --build build/ubsan
45+
ctest --preset=ubsan
46+
```
47+
48+
当你想发现常规 debug/release 构建中不易显现的未定义行为时,使用
49+
`ubsan`
50+
51+
---
52+
53+
## 建议工作流
54+
55+
1. 先用 `debug``release` 复现问题。
56+
2. 内存安全问题优先跑 `asan`
57+
3. 并发变更或偶发并行失败优先跑 `tsan`
58+
4. 涉及底层算术、类型转换、布局假设的修改,在收尾前补跑 `ubsan`
59+
60+
仓库刻意把这些能力保留为独立 preset:更容易发现、更容易接入自动化,也
61+
不会通过额外脚本把编译器相关的 sanitizer 约束隐藏起来。

examples/04-simd-vectorization/CMakeLists.txt

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,19 @@
44
add_library(simd_utils INTERFACE)
55
target_include_directories(simd_utils INTERFACE ${CMAKE_CURRENT_SOURCE_DIR}/include)
66

7+
add_library(simd_dispatch STATIC
8+
src/runtime_dispatch.cpp
9+
)
10+
target_link_libraries(simd_dispatch PUBLIC simd_utils)
11+
hpc_set_compiler_options(simd_dispatch)
12+
hpc_enable_sanitizers(simd_dispatch)
13+
14+
hpc_add_example(
15+
NAME dispatch_example
16+
SOURCES src/dispatch_example_main.cpp
17+
LIBRARIES simd_dispatch
18+
)
19+
720
# Auto-vectorization example
821
hpc_add_example(
922
NAME auto_vectorize

examples/04-simd-vectorization/README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ flowchart TD
7777
|------|-------|-------------|
7878
| `src/auto_vectorize.cpp` | Auto-Vectorization | Compiler-friendly patterns |
7979
| `src/intrinsics_intro.cpp` | SIMD Intrinsics | Manual SSE/AVX/AVX-512 |
80+
| `src/runtime_dispatch.cpp` | Runtime Dispatch | One binary, best available path |
81+
| `src/dispatch_example_main.cpp` | Dispatch Demo | Runtime-gated array addition |
8082
| `include/simd_wrapper.hpp` | SIMD Wrapper | Readable abstractions |
8183

8284
## Key Concepts
@@ -157,6 +159,20 @@ void add_wrapped(float* a, const float* b, const float* c, size_t n) {
157159
}
158160
```
159161

162+
### Runtime Dispatch
163+
164+
Keep one binary and pick the best available path at runtime:
165+
166+
```bash
167+
cmake --preset=release
168+
cmake --build build/release --target dispatch_example
169+
./build/release/examples/04-simd-vectorization/dispatch_example
170+
```
171+
172+
`dispatch_add_arrays()` selects AVX2, SSE2, or scalar code at runtime. The
173+
teaching goal is not to hide intrinsics, but to show how a small dispatch layer
174+
lets one executable stay portable across mixed x86 CPUs.
175+
160176
## Instruction Sets
161177

162178
| ISA | Register Width | Floats/Op | Doubles/Op |
@@ -200,6 +216,40 @@ cat /proc/cpuinfo | grep flags
200216
# Look for: sse, sse2, sse4_1, avx, avx2, avx512f
201217
```
202218

219+
## Vectorization Diagnostics
220+
221+
Use the repository-native vectorization report toggle so optimized targets emit
222+
compiler feedback while keeping the default presets unchanged:
223+
224+
```bash
225+
cmake --preset=release -DHPC_VECTORIZE_REPORT=ON
226+
cmake --build build/release --target auto_vectorize 2>&1 | tee build/release/vectorization.log
227+
```
228+
229+
`HPC_VECTORIZE_REPORT` expands to the compiler-specific flags used in the
230+
project:
231+
232+
```bash
233+
# GCC
234+
-fopt-info-vec-optimized
235+
236+
# Clang
237+
-Rpass=loop-vectorize
238+
```
239+
240+
Sample output you should expect while compiling:
241+
242+
```text
243+
# GCC
244+
auto_vectorize.cpp:37:26: optimized: loop vectorized using 32 byte vectors
245+
246+
# Clang
247+
auto_vectorize.cpp:37:5: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize]
248+
```
249+
250+
If you do not see vectorization remarks, confirm you are using an optimized
251+
preset (`release` or `relwithdebinfo`) rather than `debug`.
252+
203253
## Further Reading
204254

205255
- [Intel Intrinsics Guide](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/)

0 commit comments

Comments
 (0)