Commit 0c2ba68
committed
kernels/custom: add on-device verifier for the NEON custom kernels
Standalone aarch64 binary that exercises both NEON kernels (grid_sampler_2d
and sum.IntList_out) across fp32 and fp16 inputs on the shapes the polycam
depth model actually uses. Opt-in via -DEXECUTORCH_BUILD_CUSTOM_VERIFY=ON
so default builds (including the AAR) are not affected.
The reference for fp16 tests is portable run on up-cast fp32 inputs, then
down-cast to fp16 — independent of whatever portable's fp16 path happens
to do. That keeps the test meaningful whether or not the upstream
portable-fp16 fix (pytorch#19117) has landed yet.
Pass/fail uses numpy.testing.assert_allclose semantics:
|a - b| <= abs_tol + rel_tol * |b|
Avoids the "relative error explodes at zero crossings" trap for
mean-zero reductions and bilinear samples near cancellation points.
Usage:
cmake -DEXECUTORCH_BUILD_CUSTOM_VERIFY=ON ...
cmake --build <out> --target verify_custom_kernels
adb push <out>/kernels/optimized/verify_custom_kernels /data/local/tmp/
adb shell /data/local/tmp/verify_custom_kernels1 parent 995f6e3 commit 0c2ba68
2 files changed
Lines changed: 553 additions & 0 deletions
0 commit comments