Commit 51ad43c
committed
fix(npyiter): ForEach/ExecuteGeneric/ExecuteReducing read past end without EXTERNAL_LOOP
Three symptoms of one bug in NpyIter.Execution.cs. The driver loops —
ForEach, ExecuteGeneric(Single/Multi), and ExecuteReducing — pulled
their per-call count from `GetInnerLoopSizePtr()`, which always returns
`&_state->Shape[NDim - 1]` when the iterator isn't BUFFER'd. In EXLOOP
mode that's correct: `iternext` (via ExternalLoopNext) advances
`IterIndex` by `Shape[NDim - 1]` per call.
But in the default non-EXLOOP non-BUFFER mode, `iternext` (via
StandardNext) only advances by one element per call — `state.Advance()`
increments `IterIndex` by 1. The kernel was still told `count =
Shape[NDim - 1]`, so:
1. The kernel reads `Shape[NDim - 1]` elements starting at the current
data pointer, which extends past the last valid element of the
source array.
2. The driver then calls iternext, which advances the pointer by one
element.
3. The next kernel call reads `Shape[NDim - 1]` elements starting one
element later — again past the end — and so on.
Net effect: an N-element 1-D array triggers N kernel invocations, each
reading N "elements" (with massive overlap), the last ~N-1 of which
read uninitialized memory. For `np.array([1, 2, NaN, 4, 5])` the
returned NanSum was 46 instead of 12 because the kernel saw the array
plus four trailing garbage floats added together four times over.
Discovered during the Phase 2 migration when wiring the NaN reduction
kernels into NpyIter. Worked around at the call sites by always passing
`NpyIterGlobalFlags.EXTERNAL_LOOP`, which keeps iterNext and
GetInnerLoopSizePtr in agreement.
This commit fixes the bug at the source so future callers don't need
the workaround. Approach:
- New helper `ResolveInnerLoopCount()` returns the correct count given
the current flag combination:
BUFFER: _state->BufIterEnd
EXLOOP: _state->Shape[NDim - 1]
else: 1
- ForEach, ExecuteGenericSingle, ExecuteGenericMulti, ExecuteReducing
use ResolveInnerLoopCount instead of dereferencing
GetInnerLoopSizePtr. BUFFER mode still reads the pointer per
iteration because buffer fills can shrink at the tail.
Both EXLOOP and non-EXLOOP paths now produce correct results. The
existing Phase 2 call sites keep EXLOOP because it's the SIMD-optimal
mode (one call covers the whole inner dimension), but callers who omit
the flag no longer get silently-wrong output.
Test impact: 6,748 / 6,748 passing on net8.0 and net10.0, plus the
bug-repro smoke test (NanSum over a strided 1-D array without
EXTERNAL_LOOP) now returns the correct sum on the fly.1 parent 7264173 commit 51ad43c
1 file changed
Lines changed: 75 additions & 9 deletions
Lines changed: 75 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
132 | | - | |
| 132 | + | |
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
| 136 | + | |
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
141 | 154 | | |
142 | 155 | | |
143 | | - | |
| 156 | + | |
144 | 157 | | |
145 | 158 | | |
146 | 159 | | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
147 | 191 | | |
148 | 192 | | |
149 | 193 | | |
| |||
170 | 214 | | |
171 | 215 | | |
172 | 216 | | |
173 | | - | |
| 217 | + | |
174 | 218 | | |
175 | 219 | | |
176 | 220 | | |
| |||
179 | 223 | | |
180 | 224 | | |
181 | 225 | | |
182 | | - | |
183 | 226 | | |
184 | 227 | | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
185 | 239 | | |
186 | 240 | | |
187 | | - | |
| 241 | + | |
188 | 242 | | |
189 | 243 | | |
190 | 244 | | |
| |||
216 | 270 | | |
217 | 271 | | |
218 | 272 | | |
219 | | - | |
220 | 273 | | |
221 | 274 | | |
222 | 275 | | |
223 | 276 | | |
224 | | - | |
| 277 | + | |
225 | 278 | | |
226 | 279 | | |
227 | 280 | | |
228 | 281 | | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
229 | 295 | | |
230 | 296 | | |
231 | | - | |
| 297 | + | |
232 | 298 | | |
233 | 299 | | |
234 | 300 | | |
| |||
0 commit comments