LessUp
diff --git a/‎README.md‎
Lines changed: 4 additions & 0 deletions b/‎README.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎README.zh-CN.md‎
Lines changed: 4 additions & 0 deletions b/‎README.zh-CN.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎benchmarks/README.md‎
Lines changed: 42 additions & 0 deletions b/‎benchmarks/README.md‎
Lines changed: 42 additions & 0 deletions
diff --git a/‎docs/.vitepress/config.ts‎
Lines changed: 4 additions & 0 deletions b/‎docs/.vitepress/config.ts‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/en/guides/learning-path.md‎
Lines changed: 12 additions & 0 deletions b/‎docs/en/guides/learning-path.md‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎docs/en/guides/validation.md‎
Lines changed: 63 additions & 0 deletions b/‎docs/en/guides/validation.md‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎docs/zh/guides/learning-path.md‎
Lines changed: 11 additions & 0 deletions b/‎docs/zh/guides/learning-path.md‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎docs/zh/guides/validation.md‎
Lines changed: 61 additions & 0 deletions b/‎docs/zh/guides/validation.md‎
Lines changed: 61 additions & 0 deletions
diff --git a/‎examples/04-simd-vectorization/CMakeLists.txt‎
Lines changed: 13 additions & 0 deletions b/‎examples/04-simd-vectorization/CMakeLists.txt‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎examples/04-simd-vectorization/README.md‎
Lines changed: 50 additions & 0 deletions b/‎examples/04-simd-vectorization/README.md‎
Lines changed: 50 additions & 0 deletions
@@ -61,6 +61,9 @@ Run one benchmark:
 ./build/release/examples/02-memory-cache/aos_soa_bench
 ```
 
+Need sanitizer-specific guidance after the quick start? See
+[`docs/en/guides/validation.md`](docs/en/guides/validation.md).
+
 ## Validation commands
 
 ```bash
@@ -78,6 +81,7 @@ cmake --preset=ubsan && cmake --build build/ubsan && ctest --preset=ubsan
 - **Quick start:** `docs/en/getting-started/quickstart.md`
 - **Learning path:** `docs/en/guides/learning-path.md`
 - **Profiling guide:** `docs/en/guides/profiling-guide.md`
+- **Validation & sanitizers:** `docs/en/guides/validation.md`
 - **Chinese entry:** `README.zh-CN.md` and `docs/zh/`
 
 ## Development workflow
 
@@ -61,6 +61,9 @@ cmake --build build/release
 ./build/release/examples/02-memory-cache/aos_soa_bench
 ```
 
+如果你想在快速开始之后直接使用 sanitizer，请查看
+[`docs/zh/guides/validation.md`](docs/zh/guides/validation.md)。
+
 ## 常用验证命令
 
 ```bash
@@ -78,6 +81,7 @@ cmake --preset=ubsan && cmake --build build/ubsan && ctest --preset=ubsan
 - **快速开始：** `docs/zh/getting-started/quickstart.md`
 - **学习路径：** `docs/zh/guides/learning-path.md`
 - **性能分析指南：** `docs/zh/guides/profiling-guide.md`
+- **验证与 Sanitizer：** `docs/zh/guides/validation.md`
 - **英文入口：** `README.md` 与 `docs/en/`
 
 ## 开发流程
 
@@ -0,0 +1,42 @@
+# Benchmarks
+
+This directory holds shared benchmark utilities plus the local workflow for
+capturing and comparing Google Benchmark JSON output.
+
+---
+
+## Regression Comparison
+
+1. Build the optimized benchmark targets:
+
+```bash
+cmake --preset=release
+cmake --build build/release
+```
+
+2. Capture a baseline run:
+
+```bash
+./build/release/examples/04-simd-vectorization/simd_bench \
+  --benchmark_format=json \
+  --benchmark_out=simd-baseline.json
+```
+
+3. Capture a candidate run after your change:
+
+```bash
+./build/release/examples/04-simd-vectorization/simd_bench \
+  --benchmark_format=json \
+  --benchmark_out=simd-candidate.json
+```
+
+4. Compare the two runs:
+
+```bash
+python3 scripts/compare_benchmarks.py simd-baseline.json simd-candidate.json --threshold 10
+```
+
+The script prints a per-benchmark table with baseline time, candidate time, and
+delta percentage. It exits with code `1` when any benchmark regresses by more
+than the threshold, which makes it suitable for local smoke checks or future CI
+gating.
@@ -58,6 +58,7 @@ export default defineConfig({
               { text: 'Learning Path', link: '/en/guides/learning-path' },
               { text: 'Optimization Decision Tree', link: '/en/guides/optimization-decision-tree' },
               { text: 'Profiling Guide', link: '/en/guides/profiling-guide' },
+              { text: 'Validation & Sanitizers', link: '/en/guides/validation' },
               { text: 'Best Practices', link: '/en/guides/best-practices' },
             ],
           },
@@ -94,6 +95,7 @@ export default defineConfig({
                 { text: 'Learning Path', link: '/en/guides/learning-path' },
                 { text: 'Optimization Decision Tree', link: '/en/guides/optimization-decision-tree' },
                 { text: 'Profiling Guide', link: '/en/guides/profiling-guide' },
+                { text: 'Validation & Sanitizers', link: '/en/guides/validation' },
                 { text: 'Best Practices', link: '/en/guides/best-practices' },
               ],
             },
@@ -160,6 +162,7 @@ export default defineConfig({
               { text: '学习路径', link: '/zh/guides/learning-path' },
               { text: '优化决策树', link: '/zh/guides/optimization-decision-tree' },
               { text: '性能分析指南', link: '/zh/guides/profiling-guide' },
+              { text: '验证与 Sanitizer', link: '/zh/guides/validation' },
               { text: '最佳实践', link: '/zh/guides/best-practices' },
             ],
           },
@@ -196,6 +199,7 @@ export default defineConfig({
                 { text: '学习路径', link: '/zh/guides/learning-path' },
                 { text: '优化决策树', link: '/zh/guides/optimization-decision-tree' },
                 { text: '性能分析指南', link: '/zh/guides/profiling-guide' },
+                { text: '验证与 Sanitizer', link: '/zh/guides/validation' },
                 { text: '最佳实践', link: '/zh/guides/best-practices' },
               ],
             },
 
@@ -186,6 +186,17 @@ Let the compiler do the work.
 -Rpass=loop-vectorize
 ```
 
+**Repository workflow:**
+```bash
+cmake --preset=release -DHPC_VECTORIZE_REPORT=ON
+cmake --build build/release --target auto_vectorize
+```
+
+`HPC_VECTORIZE_REPORT` enables the same compiler-specific diagnostics for the
+example target while keeping the default preset list stable. For sanitizer-led
+verification after SIMD changes, see
+[Validation & Sanitizers](./validation.md).
+
 ### 4.2 SIMD Intrinsics
 
 Manual vectorization for maximum control.
@@ -203,6 +214,7 @@ Readable SIMD code.
 - Abstracting intrinsics
 - Scalar fallback implementations
 - Type-safe SIMD operations
+- Runtime dispatch for mixed CPU fleets
 
 ---
 
 
@@ -0,0 +1,63 @@
+# Validation & Sanitizers
+
+Use the preset-driven validation path first, then pick the sanitizer that
+matches the failure mode you are investigating.
+
+---
+
+## Quick reference
+
+| Preset | Best for | Notes |
+| --- | --- | --- |
+| `asan` | heap/stack overflows, use-after-free, double free | Benchmarks are disabled in this preset |
+| `tsan` | data races, unsafe synchronization | This preset switches to `clang` / `clang++` |
+| `ubsan` | undefined behavior, invalid shifts, signed overflow | Good follow-up after functional fixes |
+
+---
+
+## AddressSanitizer
+
+```bash
+cmake --preset=asan
+cmake --build build/asan
+ctest --preset=asan
+```
+
+Use `asan` when you suspect invalid memory access, lifetime bugs, or accidental
+buffer overruns.
+
+## ThreadSanitizer
+
+```bash
+cmake --preset=tsan
+cmake --build build/tsan
+ctest --preset=tsan
+```
+
+Use `tsan` for concurrent code paths. The preset already selects `clang` /
+`clang++`, which is the supported toolchain in this repository.
+
+## UndefinedBehaviorSanitizer
+
+```bash
+cmake --preset=ubsan
+cmake --build build/ubsan
+ctest --preset=ubsan
+```
+
+Use `ubsan` to surface undefined behavior that may stay invisible in normal
+debug or release builds.
+
+---
+
+## Suggested workflow
+
+1. Start with `debug` or `release` to reproduce the issue normally.
+2. Run `asan` for memory-safety problems.
+3. Run `tsan` for concurrency changes or flaky parallel tests.
+4. Run `ubsan` before closing work that touches low-level arithmetic, casts, or
+   layout assumptions.
+
+The repository keeps these as separate presets on purpose: they stay easy to
+discover, easy to automate, and do not hide compiler-specific sanitizer
+constraints behind extra wrapper scripts.
@@ -186,6 +186,16 @@ flowchart LR
 -Rpass=loop-vectorize
 ```
 
+**仓库内推荐工作流：**
+```bash
+cmake --preset=release -DHPC_VECTORIZE_REPORT=ON
+cmake --build build/release --target auto_vectorize
+```
+
+`HPC_VECTORIZE_REPORT` 会为示例目标开启同一套编译器向量化诊断，同时不新增
+默认 preset。若需要在 SIMD 修改后继续做 sanitizer 验证，请参考
+[验证与 Sanitizer](./validation.md)。
+
 ### 4.2 SIMD 内在函数
 
 手动向量化以获得最大控制力。
@@ -203,6 +213,7 @@ flowchart LR
 - 封装内在函数
 - 标量回退实现
 - 类型安全的 SIMD 操作
+- 面向混合 CPU 环境的运行时分发
 
 ---
 
 
@@ -0,0 +1,61 @@
+# 验证与 Sanitizer
+
+先走 preset 驱动的常规验证路径，再根据你正在排查的问题选择对应的
+sanitizer。
+
+---
+
+## 快速参考
+
+| Preset | 适合发现的问题 | 备注 |
+| --- | --- | --- |
+| `asan` | 堆/栈越界、use-after-free、double free | 该 preset 会关闭 benchmark |
+| `tsan` | 数据竞争、同步错误 | 该 preset 会切换到 `clang` / `clang++` |
+| `ubsan` | 未定义行为、非法移位、有符号溢出 | 很适合作为功能修复后的补充验证 |
+
+---
+
+## AddressSanitizer
+
+```bash
+cmake --preset=asan
+cmake --build build/asan
+ctest --preset=asan
+```
+
+当你怀疑存在非法内存访问、对象生命周期错误或缓冲区越界时，优先使用
+`asan`。
+
+## ThreadSanitizer
+
+```bash
+cmake --preset=tsan
+cmake --build build/tsan
+ctest --preset=tsan
+```
+
+当修改涉及并发路径时使用 `tsan`。该 preset 已经为仓库切换到了受支持的
+`clang` / `clang++` 工具链。
+
+## UndefinedBehaviorSanitizer
+
+```bash
+cmake --preset=ubsan
+cmake --build build/ubsan
+ctest --preset=ubsan
+```
+
+当你想发现常规 debug/release 构建中不易显现的未定义行为时，使用
+`ubsan`。
+
+---
+
+## 建议工作流
+
+1. 先用 `debug` 或 `release` 复现问题。
+2. 内存安全问题优先跑 `asan`。
+3. 并发变更或偶发并行失败优先跑 `tsan`。
+4. 涉及底层算术、类型转换、布局假设的修改，在收尾前补跑 `ubsan`。
+
+仓库刻意把这些能力保留为独立 preset：更容易发现、更容易接入自动化，也
+不会通过额外脚本把编译器相关的 sanitizer 约束隐藏起来。
@@ -4,6 +4,19 @@
 add_library(simd_utils INTERFACE)
 target_include_directories(simd_utils INTERFACE ${CMAKE_CURRENT_SOURCE_DIR}/include)
 
+add_library(simd_dispatch STATIC
+    src/runtime_dispatch.cpp
+)
+target_link_libraries(simd_dispatch PUBLIC simd_utils)
+hpc_set_compiler_options(simd_dispatch)
+hpc_enable_sanitizers(simd_dispatch)
+
+hpc_add_example(
+    NAME dispatch_example
+    SOURCES src/dispatch_example_main.cpp
+    LIBRARIES simd_dispatch
+)
+
 # Auto-vectorization example
 hpc_add_example(
     NAME auto_vectorize
 
@@ -77,6 +77,8 @@ flowchart TD
 |------|-------|-------------|
 | `src/auto_vectorize.cpp` | Auto-Vectorization | Compiler-friendly patterns |
 | `src/intrinsics_intro.cpp` | SIMD Intrinsics | Manual SSE/AVX/AVX-512 |
+| `src/runtime_dispatch.cpp` | Runtime Dispatch | One binary, best available path |
+| `src/dispatch_example_main.cpp` | Dispatch Demo | Runtime-gated array addition |
 | `include/simd_wrapper.hpp` | SIMD Wrapper | Readable abstractions |
 
 ## Key Concepts
@@ -157,6 +159,20 @@ void add_wrapped(float* a, const float* b, const float* c, size_t n) {
 }
 ```
 
+### Runtime Dispatch
+
+Keep one binary and pick the best available path at runtime:
+
+```bash
+cmake --preset=release
+cmake --build build/release --target dispatch_example
+./build/release/examples/04-simd-vectorization/dispatch_example
+```
+
+`dispatch_add_arrays()` selects AVX2, SSE2, or scalar code at runtime. The
+teaching goal is not to hide intrinsics, but to show how a small dispatch layer
+lets one executable stay portable across mixed x86 CPUs.
+
 ## Instruction Sets
 
 | ISA | Register Width | Floats/Op | Doubles/Op |
@@ -200,6 +216,40 @@ cat /proc/cpuinfo | grep flags
 # Look for: sse, sse2, sse4_1, avx, avx2, avx512f
 ```
 
+## Vectorization Diagnostics
+
+Use the repository-native vectorization report toggle so optimized targets emit
+compiler feedback while keeping the default presets unchanged:
+
+```bash
+cmake --preset=release -DHPC_VECTORIZE_REPORT=ON
+cmake --build build/release --target auto_vectorize 2>&1 | tee build/release/vectorization.log
+```
+
+`HPC_VECTORIZE_REPORT` expands to the compiler-specific flags used in the
+project:
+
+```bash
+# GCC
+-fopt-info-vec-optimized
+
+# Clang
+-Rpass=loop-vectorize
+```
+
+Sample output you should expect while compiling:
+
+```text
+# GCC
+auto_vectorize.cpp:37:26: optimized: loop vectorized using 32 byte vectors
+
+# Clang
+auto_vectorize.cpp:37:5: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize]
+```
+
+If you do not see vectorization remarks, confirm you are using an optimized
+preset (`release` or `relwithdebinfo`) rather than `debug`.
+
 ## Further Reading
 
 - [Intel Intrinsics Guide](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/)