Skip to content

Commit 0c2ba68

Browse files
committed
kernels/custom: add on-device verifier for the NEON custom kernels
Standalone aarch64 binary that exercises both NEON kernels (grid_sampler_2d and sum.IntList_out) across fp32 and fp16 inputs on the shapes the polycam depth model actually uses. Opt-in via -DEXECUTORCH_BUILD_CUSTOM_VERIFY=ON so default builds (including the AAR) are not affected. The reference for fp16 tests is portable run on up-cast fp32 inputs, then down-cast to fp16 — independent of whatever portable's fp16 path happens to do. That keeps the test meaningful whether or not the upstream portable-fp16 fix (pytorch#19117) has landed yet. Pass/fail uses numpy.testing.assert_allclose semantics: |a - b| <= abs_tol + rel_tol * |b| Avoids the "relative error explodes at zero crossings" trap for mean-zero reductions and bilinear samples near cancellation points. Usage: cmake -DEXECUTORCH_BUILD_CUSTOM_VERIFY=ON ... cmake --build <out> --target verify_custom_kernels adb push <out>/kernels/optimized/verify_custom_kernels /data/local/tmp/ adb shell /data/local/tmp/verify_custom_kernels
1 parent 995f6e3 commit 0c2ba68

2 files changed

Lines changed: 553 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)