Commit 8721bfa
committed
kernels/optimized: add on-device verify_optimized_kernels binary
Standalone aarch64 binary that cross-checks opt_grid_sampler_2d_out and
opt_sum_dim_out against an fp32 reference derived from the portable
kernel (portable run on up-cast fp32 inputs, then down-cast to fp16).
Reference is independent of portable's own fp16 path, so the test stays
meaningful regardless of pytorch#19117's merge state.
Pass/fail uses numpy.testing.assert_allclose semantics:
|a - b| <= abs_tol + rel_tol * |b|
Avoids the "relative error explodes at zero crossings" trap for
mean-zero reductions and bilinear samples near cancellation points.
Opt-in via -DEXECUTORCH_BUILD_OPTIMIZED_VERIFY=ON so default builds are
unaffected. Build + run:
cmake -DEXECUTORCH_BUILD_OPTIMIZED_VERIFY=ON ...
cmake --build <out> --target verify_optimized_kernels
adb push <out>/kernels/optimized/verify_optimized_kernels /data/local/tmp/
adb shell /data/local/tmp/verify_optimized_kernels
Exits 0 on all-pass; reports max_abs / max_rel(far) / near_zero / viol
per test case. 12 test cases across grid_sampler and sum, covering the
shapes the polycam depth model uses plus a few edge cases (odd channel
count, align_corners=1, multi-batch).1 parent f4086e3 commit 8721bfa
2 files changed
Lines changed: 557 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
101 | 125 | | |
102 | 126 | | |
103 | 127 | | |
| |||
0 commit comments