Commit 93c93c1
optimized/grid_sampler_2d: address review feedback
Three changes consolidated for review:
1. Move the forward declaration of grid_sampler_2d_bilinear_fp16_hw out
of op_grid_sampler_2d.cpp into a new header
kernels/optimized/cpu/op_grid_sampler_2d_fp16_hw.h. The function has
external linkage (the dispatcher in op_grid_sampler_2d.cpp calls into
it across translation units), and prior to this its definition site
had no prior prototype visible — which trips -Wmissing-prototypes on
build configurations that enable it. Both .cpp files now include the
shared header. The function body stays in op_grid_sampler_2d_fp16_hw.cpp
because that TU is the only one compiled with -march=armv8.2-a+fp16,
so it cannot be inlined into a header. The header itself uses void* for
input/output buffers and is fp16-free, so callers don't need the
+fp16 march flag just to declare or call it.
2. Split the fp16 HW path into its own CMake target. Previously the
-march=armv8.2-a+fp16 flag was scoped per-source-file via
set_source_files_properties on the sole TU inside the optimized_kernels
library. That works for a clean non-LTO build, but with ThinLTO or
cross-TU optimizations the flag boundary becomes fuzzy and the fallback
path in op_grid_sampler_2d.cpp could in principle be auto-vectorized
into fp16 NEON instructions — exactly the SIGILL hazard the runtime
dispatch is meant to prevent. Build the file as an OBJECT library
(grid_sampler_2d_fp16_hw_impl) with target-scoped -march flag and link
it into optimized_kernels via $<BUILD_LOCAL_INTERFACE:...> so the
object code is baked into liboptimized_kernels.a at archive time and
the OBJECT target is kept out of the install EXPORT set. Mirrors the
existing buck `grid_sampler_2d_fp16_hw_impl` cxx_library.
3. Gate the optimized fast paths on input/grid/out dtype match. Each fast
path assumes a single dtype across all three tensors:
fp32 NEON path: data_ptr<float>() on all three
fp16 HW path: void* pointers reinterpret_cast<__fp16*> on all three
fp16 SW NEON: data_ptr<c10::Half>() on all three
Until now the dispatcher gated only on input.scalar_type(). The
reinterpret_casts in the fp16 HW kernel are particularly load-bearing
because their behavior on a mismatched dtype would be silent
corruption (reading int64/double bytes as __fp16 stride). The
data_ptr<T>() runtime check exists but is not guaranteed in release
builds. Add a dtypes_match clause at the top of the fast-path
eligibility check that requires all three scalar types equal; fall
back to the portable kernel otherwise. The portable kernel handles
arbitrary dtype combinations correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 4e98f5a commit 93c93c1
5 files changed
Lines changed: 80 additions & 27 deletions
File tree
- kernels/optimized
- cpu
- shim_et/xplat/executorch/build
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
85 | 88 | | |
86 | 89 | | |
87 | 90 | | |
88 | | - | |
| 91 | + | |
| 92 | + | |
89 | 93 | | |
90 | | - | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
91 | 107 | | |
92 | 108 | | |
93 | 109 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| |||
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | 60 | | |
79 | 61 | | |
80 | 62 | | |
| |||
361 | 343 | | |
362 | 344 | | |
363 | 345 | | |
364 | | - | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
365 | 357 | | |
366 | 358 | | |
367 | 359 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
271 | | - | |
272 | 271 | | |
273 | 272 | | |
274 | 273 | | |
| |||
0 commit comments