Commit 5ba4ec3
perf: pre-allocate output in _apply_sos / _parallel_reduce_axis1
The previous dispatch had each parallel worker return ``(c0, c1, block)``
tuples; the calling thread then allocated the output array and copied each
block into place. That post-collection allocate-and-copy is wasted work
since the channel/time slices are non-overlapping — workers can write
directly into a pre-allocated output.
Measured on a (30000, 384) float32 chunk with sosfiltfilt and
n_workers=5:
pattern wall (ms) speedup
E. sequential 173.89 1.00×
A. submit + collect + alloc + copy 75.66 2.30× (current)
B. pre-alloc, write in place 60.51 2.87× (this PR)
C. pool.map, write in place 63.55 2.74×
D. manual threading.Thread 64.76 2.69×
So we save ~15 ms wall per `_apply_sos` call (likewise for
`_parallel_reduce_axis1`) by dropping the redundant copy. Ideal 5×
scaling would be 34.78 ms; the remaining gap to ideal is the GIL-held
Python wrapper inside scipy's sosfiltfilt — pattern doesn't matter there
(B/C/D are all within noise), so we keep the simpler submit/result form.
Same pattern applied to common_reference._parallel_reduce_axis1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent b7788e4 commit 5ba4ec3
2 files changed
Lines changed: 26 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
244 | 244 | | |
245 | 245 | | |
246 | 246 | | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
247 | 250 | | |
248 | 251 | | |
249 | 252 | | |
| |||
258 | 261 | | |
259 | 262 | | |
260 | 263 | | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
261 | 269 | | |
262 | | - | |
| 270 | + | |
263 | 271 | | |
264 | 272 | | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
| 273 | + | |
| 274 | + | |
270 | 275 | | |
271 | 276 | | |
272 | 277 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
241 | 241 | | |
242 | 242 | | |
243 | 243 | | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
244 | 250 | | |
245 | 251 | | |
246 | 252 | | |
| |||
251 | 257 | | |
252 | 258 | | |
253 | 259 | | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
254 | 266 | | |
255 | | - | |
| 267 | + | |
256 | 268 | | |
257 | 269 | | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
| 270 | + | |
| 271 | + | |
265 | 272 | | |
266 | 273 | | |
267 | 274 | | |
| |||
0 commit comments