Skip to content

Commit 82976e7

Browse files
committed
updating benchmark images
1 parent d4878e0 commit 82976e7

3 files changed

Lines changed: 49 additions & 12 deletions

File tree

.codecov.yml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
codecov:
2+
require_ci_to_pass: false
3+
4+
coverage:
5+
status:
6+
project:
7+
default:
8+
target: auto
9+
threshold: 1%
10+
patch:
11+
default:
12+
target: 70%
13+
14+
ignore:
15+
- "tests/"
16+
- "bench/"
17+
- "docs/"
18+
- "scripts/"
19+
- "cmake/"
20+
- "include/"
21+
- "features/*/tests/"
22+
- "features/*/benchmarks/"
23+
- "src/dynemit_features.c"

.github/CONTRIBUTING.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ When implementing new SIMD features:
135135
2. **Use target attributes**: `__attribute__((target("avx2")))`
136136
3. **Disable auto-vectorization for scalar**: Use `DYNEMIT_NO_AUTOVECTORIZE`
137137
4. **Use ifunc resolvers**: Implement runtime dispatch with `__attribute__((ifunc(...)))`
138-
5. **Follow existing patterns**: See `features/vector_add/` or `features/vector_mul/` as examples
138+
5. **Follow existing patterns**: See `features/add/` or `features/mul/` as examples
139139

140140
See [docs/ADDING_FEATURES.md](../docs/ADDING_FEATURES.md) for detailed instructions.
141141

@@ -183,12 +183,12 @@ ctest --verbose
183183
### Running Benchmarks
184184

185185
```bash
186-
# Run benchmark
187-
cd build/bench
188-
./benchmark_vector_mul
186+
# Run a single feature benchmark
187+
./build/features/mul/bench_mul_f32
188+
./build/features/mul/bench_mul_f32 --auto-detect
189189

190-
# Save results
191-
./benchmark_vector_mul --csv > results.csv
190+
# Run all benchmarks and generate charts
191+
sudo ./scripts/run_all_benchmarks.sh
192192
```
193193

194194
## Pull Request Process

README.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
[![CMake](https://img.shields.io/badge/CMake-3.16+-green.svg)](https://cmake.org/)
88
[![GCC](https://img.shields.io/badge/GCC-13%2B-green.svg)](https://gcc.gnu.org/)
99
[![Clang](https://img.shields.io/badge/Clang-16%2B-green.svg)](https://clang.llvm.org/)
10+
[![arch](https://img.shields.io/badge/arch-x86__64%20%7C%20aarch64-orange.svg)](https://en.wikipedia.org/wiki/Comparison_of_instruction_set_architectures)
1011
[![License: Boost](https://img.shields.io/badge/License-Boost_1.0-lightblue.svg)](https://www.boost.org/LICENSE_1_0.txt)
1112

1213
libdynemit leverages the ifunc resolver (supported by both GCC and Clang on Linux) to automatically select optimal SIMD implementations at program startup, delivering portable code without sacrificing performance. Thread-safe SIMD detection and dlopen-safe resolver utilities ensure robust operation in multi-threaded applications and dynamic library loading scenarios.
@@ -27,12 +28,25 @@ entropy_u32(data, n);
2728
2829
## Same build, best performance
2930
30-
Benchmark charts are generated per-feature and per-CPU under `bench/`. After running benchmarks, you will find:
31+
![Vector Multiply Benchmark](docs/img/benchmark_vector_mul.png)
32+
*Benchmark comparing vector multiplication performance across different CPU architectures using the same build binary. The library automatically detected and utilized each CPU's highest supported SIMD instruction set (AVX-512F, AVX2, AVX or SSE4.2) at runtime. Lower execution time indicates better performance. Each data point represents the median of 10 trials, with error bars showing ±1 standard deviation.*
33+
34+
## Forced SIMD instructions without dynamic dispatch
35+
36+
<table>
37+
<tr>
38+
<td align="center"><b>x86_64</b> — AMD Ryzen 9 9950X3D</td>
39+
<td align="center"><b>aarch64</b> — ARM Neoverse V2</td>
40+
</tr>
41+
<tr>
42+
<td><img src="bench/cpus/x86_64/amd_ryzen_9_9950x3d/features/max_u32/timing.png" alt="max_u32 SIMD timings on x86_64" width="100%"></td>
43+
<td><img src="bench/cpus/aarch64/arm_neoverse_v2/features/max_u32/timing.png" alt="max_u32 SIMD timings on aarch64" width="100%"></td>
44+
</tr>
45+
</table>
46+
47+
*Performance scaling of `max_u32` across SIMD levels on two architectures, x86_64 (Scalar → SSE2 → SSE4.2 → AVX → AVX2 → AVX-512F) and aarch64 (Scalar → NEON → SVE → SVE2). Each implementation is compiled into the same binary and the ifunc resolver selects the best one at startup. Lower execution time is better, each point is the median of 3 trials with ±1 standard deviation error bars.*
3148
32-
- **CPU comparison** charts at `bench/features/{variant}/timing.png` and `throughput.png`
33-
- **SIMD comparison** charts at `bench/cpus/{arch}/{cpu}/features/{variant}/timing.png` and `throughput.png`
3449
35-
Run `sudo ./scripts/run_all_benchmarks.sh` to generate all data and charts. See [docs/BENCHMARKING.md](docs/BENCHMARKING.md) for details.
3650
3751
## Installation
3852
@@ -222,7 +236,7 @@ sudo make install
222236

223237
Currently the library ships SIMD-accelerated features organized into four categories. Every function automatically dispatches to the best available instruction set at program startup.
224238

225-
<details open>
239+
<details>
226240
<summary><b>Vector Operations</b></summary>
227241

228242
Element-wise operations on `float` arrays.
@@ -256,7 +270,7 @@ Convenience header `<dynemit/stats.h>` includes all of the above.
256270

257271
</details>
258272

259-
<details open>
273+
<details>
260274
<summary><b>Distribution & Diversity Metrics</b></summary>
261275

262276
| Function | Description |

0 commit comments

Comments
 (0)