Skip to content

Commit 8721bfa

Browse files
committed
kernels/optimized: add on-device verify_optimized_kernels binary
Standalone aarch64 binary that cross-checks opt_grid_sampler_2d_out and opt_sum_dim_out against an fp32 reference derived from the portable kernel (portable run on up-cast fp32 inputs, then down-cast to fp16). Reference is independent of portable's own fp16 path, so the test stays meaningful regardless of pytorch#19117's merge state. Pass/fail uses numpy.testing.assert_allclose semantics: |a - b| <= abs_tol + rel_tol * |b| Avoids the "relative error explodes at zero crossings" trap for mean-zero reductions and bilinear samples near cancellation points. Opt-in via -DEXECUTORCH_BUILD_OPTIMIZED_VERIFY=ON so default builds are unaffected. Build + run: cmake -DEXECUTORCH_BUILD_OPTIMIZED_VERIFY=ON ... cmake --build <out> --target verify_optimized_kernels adb push <out>/kernels/optimized/verify_optimized_kernels /data/local/tmp/ adb shell /data/local/tmp/verify_optimized_kernels Exits 0 on all-pass; reports max_abs / max_rel(far) / near_zero / viol per test case. 12 test cases across grid_sampler and sum, covering the shapes the polycam depth model uses plus a few edge cases (odd channel count, align_corners=1, multi-batch).
1 parent f4086e3 commit 8721bfa

2 files changed

Lines changed: 557 additions & 0 deletions

File tree

kernels/optimized/CMakeLists.txt

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,30 @@ gen_operators_lib(
9898
executorch_core
9999
)
100100

101+
# On-device verifier for optimized grid_sampler_2d / sum.IntList_out.
102+
# Opt-in via -DEXECUTORCH_BUILD_OPTIMIZED_VERIFY=ON so it doesn't affect
103+
# default AAR / library builds. Cross-checks both ops against an fp32
104+
# reference derived from the portable kernel; non-zero exit on divergence.
105+
if(EXECUTORCH_BUILD_OPTIMIZED_VERIFY)
106+
add_executable(
107+
verify_optimized_kernels ${EXECUTORCH_ROOT}/kernels/optimized/verify.cpp
108+
)
109+
target_link_libraries(
110+
verify_optimized_kernels
111+
PRIVATE optimized_kernels portable_kernels executorch_core
112+
)
113+
target_compile_options(
114+
verify_optimized_kernels PRIVATE ${_common_compile_options}
115+
)
116+
if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64|arm64"
117+
OR ANDROID_ABI STREQUAL "arm64-v8a"
118+
)
119+
target_compile_options(
120+
verify_optimized_kernels PRIVATE -march=armv8.2-a+fp16
121+
)
122+
endif()
123+
endif()
124+
101125
install(
102126
# eigen_blas doesn't export itself, so we have to do our own install to export
103127
# it.

0 commit comments

Comments
 (0)