You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add SIMD implementations of kpm::freak::pyramid::downsample gated behind the project's existing feature flags, with criterion benchmarks proving the speedup over the scalar baseline.
Context
Follow-up to #130 (M8 step 1). The PR landed a clean scalar baseline. The downsample inner loop is a textbook SIMD target — for each output pixel it reads 4 source pixels and computes (sum + 2) >> 2.
The project already has feature flags ready for this:
Summary
Add SIMD implementations of
kpm::freak::pyramid::downsamplegated behind the project's existing feature flags, withcriterionbenchmarks proving the speedup over the scalar baseline.Context
Follow-up to #130 (M8 step 1). The PR landed a clean scalar baseline. The downsample inner loop is a textbook SIMD target — for each output pixel it reads 4 source pixels and computes
(sum + 2) >> 2.The project already has feature flags ready for this:
simd-x86-avx2simd-x86-sse41simd-wasm32Canonical pattern from CLAUDE.md §3:
Depends on
Scope
downsample_avx2(process 32 output bytes per iteration via 256-bit lanes).downsample_sse41(16 output bytes per iteration via 128-bit lanes).downsample_wasm32usingcore::arch::wasm32SIMD intrinsics.unsafe+cfg(target_arch)+cfg(feature = ...)+ runtime detection (is_x86_feature_detected!).unsafeblock prefixed with a// SAFETY:comment explaining the invariant.Acceptance criteria
kpm::freak::pyramidtests still pass.cargo clippy --all-targets --all-featuresclean.Out of scope
rayonparallelization (could be a separate issue or bundled here — open question).References