Skip to content

perf(kpm): SIMD path for Pyramid downsample (AVX2/SSE4.1/wasm32) #132

@kalwalt

Description

@kalwalt

Summary

Add SIMD implementations of kpm::freak::pyramid::downsample gated behind the project's existing feature flags, with criterion benchmarks proving the speedup over the scalar baseline.

Context

Follow-up to #130 (M8 step 1). The PR landed a clean scalar baseline. The downsample inner loop is a textbook SIMD target — for each output pixel it reads 4 source pixels and computes (sum + 2) >> 2.

The project already has feature flags ready for this:

  • simd-x86-avx2
  • simd-x86-sse41
  • simd-wasm32

Canonical pattern from CLAUDE.md §3:

#[cfg(all(target_arch = "x86_64", feature = "simd-x86-avx2"))]
if is_x86_feature_detected!("avx2") {
    return unsafe { avx2_impl(input) };
}
// scalar fallback
scalar_impl(input)

Depends on

Scope

  • Implement downsample_avx2 (process 32 output bytes per iteration via 256-bit lanes).
  • Implement downsample_sse41 (16 output bytes per iteration via 128-bit lanes).
  • Implement downsample_wasm32 using core::arch::wasm32 SIMD intrinsics.
  • Each path behind unsafe + cfg(target_arch) + cfg(feature = ...) + runtime detection (is_x86_feature_detected!).
  • Each unsafe block prefixed with a // SAFETY: comment explaining the invariant.
  • Output must be byte-identical to scalar — add a property test that compares scalar vs SIMD on random inputs.

Acceptance criteria

  • Output of each SIMD path is byte-identical to the scalar baseline for all tested input sizes (including odd dimensions and small images).
  • Each SIMD path shows a measurable speedup over scalar on its target architecture, reported in the PR description with concrete ns/iter numbers from perf(kpm): add criterion benchmark for Pyramid downsample #131's benchmark.
  • All existing kpm::freak::pyramid tests still pass.
  • cargo clippy --all-targets --all-features clean.

Out of scope

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions