Commit b2bdc94
committed
optimized: fallback to portable grid_sampler_2d for non-default layouts
The NEON fast path indexes input/grid/out directly assuming contiguous
NCHW default-dim-order layout — no use of .strides() or .dim_order().
If the caller passes anything else (NHWC, transposed, strided, channels-
last), we'd read wrong memory and silently produce garbage output.
Add the same check pattern op_sum.cpp already uses at L150-151:
tensor_is_default_dim_order + tensor_is_contiguous on input, grid, and
out. If any fails, delegate to the portable kernel (which handles
arbitrary strides / dim orders correctly via .strides()).
No perf impact on the hot path — the checks are a handful of scalar
comparisons run once per call, and the common polycam depth model case
is already default-contiguous so the fast path is still taken.1 parent 8721bfa commit b2bdc94
1 file changed
Lines changed: 13 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
294 | 294 | | |
295 | 295 | | |
296 | 296 | | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
297 | 308 | | |
298 | | - | |
299 | | - | |
| 309 | + | |
| 310 | + | |
300 | 311 | | |
301 | 312 | | |
302 | 313 | | |
| |||
0 commit comments