Commit ce5c583
miranov25
feat(AliasDataFrame): Phase 9e - Arrow compute disabled, NumPy path optimized
Findings from micro-benchmark:
- PyArrow compute is 16x slower than NumPy for element-wise math
- NumPy: 2.6ms vs Arrow: 41.2ms (no conversion overhead)
- Root cause: Arrow lacks expression fusion, 4 kernel launches vs 1 fused NumPy loop
Changes:
- Add _analyze_expression() for unified AST analysis
- Disable Arrow compute path (returns None immediately)
- Keep Phase 9b Arrow scatter (pc.take) - still beneficial
- Code cleanup in _do_materialize()
Performance (real data, 13.5M rows):
- Materialize time: 11.7s
- Memory: 2.78GB → 4.18GB
Strategy: Arrow for I/O + scatter, NumPy for compute
Known issue: cleanTemporary not removing intermediate columns (pre-existing bug)
Tests: 727+ passed
Reviewed-by: Claude (Architect), GPT, Gemini1 parent a788a68 commit ce5c583
1 file changed
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3381 | 3381 | | |
3382 | 3382 | | |
3383 | 3383 | | |
| 3384 | + | |
| 3385 | + | |
3384 | 3386 | | |
| 3387 | + | |
| 3388 | + | |
| 3389 | + | |
| 3390 | + | |
| 3391 | + | |
| 3392 | + | |
| 3393 | + | |
| 3394 | + | |
| 3395 | + | |
| 3396 | + | |
| 3397 | + | |
| 3398 | + | |
| 3399 | + | |
| 3400 | + | |
3385 | 3401 | | |
3386 | 3402 | | |
3387 | 3403 | | |
| |||
0 commit comments