Commit 635246c
authored
* perf(geotiff): batch _nvcomp_batch_compress allocations and D2H readback (#1712)
Two related anti-patterns in the GPU compress path remained after the
decode-side fix in #1552 and the device-pointer fix in #1659:
1. Per-tile cupy.empty allocations for the compressed-output buffers.
For an N-tile write, this issued N memory-pool calls. Replace with
one contiguous allocation of size n_tiles * max_cs plus per-tile
slab views; the per-tile pointer array still lets nvCOMP write
independent slabs in parallel.
2. Per-tile .get().tobytes() in the result-collection loop. Each
.get() was a separate D2H transfer on the default stream, and the
per-DMA setup cost dominated wall time for large-N writes (the
exact pattern #1552 fixed on the decode side). Replace with a
single cupy.concatenate of trimmed slabs and one .get(), then
slice the host buffer by cumulative offsets to peel out per-tile
payloads. The adler32 deflate-wrap step is unchanged.
Real-world benchmark on an Ampere-class GPU: 2048x2048 float32 zstd
GPU write at tile_size=64 (1024 tiles) drops median from 84.3ms to
54.7ms (~35% reduction).
Tests cover the structural change (regressions back to the per-tile
patterns fail loudly) plus end-to-end round-trip equality at
deflate and zstd compression. _check_gpu_memory now bounds the new
contiguous allocation in the same way the decode-side fix does.
* Address Copilot review feedback on #1729
- Add _check_gpu_memory guard before nvcomp staging-buffer concatenate
(mirrors _batched_d2h_to_bytes pattern)
- Tighten GPU skip to require cupy.cuda.is_available()
1 parent 09ac06a commit 635246c
2 files changed
Lines changed: 213 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2453 | 2453 | | |
2454 | 2454 | | |
2455 | 2455 | | |
2456 | | - | |
2457 | | - | |
| 2456 | + | |
| 2457 | + | |
| 2458 | + | |
| 2459 | + | |
| 2460 | + | |
| 2461 | + | |
| 2462 | + | |
| 2463 | + | |
| 2464 | + | |
| 2465 | + | |
| 2466 | + | |
| 2467 | + | |
| 2468 | + | |
| 2469 | + | |
| 2470 | + | |
| 2471 | + | |
| 2472 | + | |
| 2473 | + | |
| 2474 | + | |
| 2475 | + | |
| 2476 | + | |
2458 | 2477 | | |
2459 | 2478 | | |
2460 | 2479 | | |
2461 | | - | |
| 2480 | + | |
2462 | 2481 | | |
2463 | 2482 | | |
2464 | 2483 | | |
| |||
2518 | 2537 | | |
2519 | 2538 | | |
2520 | 2539 | | |
2521 | | - | |
| 2540 | + | |
| 2541 | + | |
| 2542 | + | |
| 2543 | + | |
| 2544 | + | |
| 2545 | + | |
| 2546 | + | |
| 2547 | + | |
2522 | 2548 | | |
| 2549 | + | |
| 2550 | + | |
| 2551 | + | |
| 2552 | + | |
| 2553 | + | |
| 2554 | + | |
| 2555 | + | |
| 2556 | + | |
| 2557 | + | |
| 2558 | + | |
| 2559 | + | |
| 2560 | + | |
| 2561 | + | |
| 2562 | + | |
| 2563 | + | |
| 2564 | + | |
| 2565 | + | |
| 2566 | + | |
| 2567 | + | |
| 2568 | + | |
| 2569 | + | |
| 2570 | + | |
| 2571 | + | |
| 2572 | + | |
| 2573 | + | |
2523 | 2574 | | |
2524 | 2575 | | |
2525 | | - | |
2526 | | - | |
2527 | | - | |
| 2576 | + | |
2528 | 2577 | | |
2529 | 2578 | | |
2530 | 2579 | | |
2531 | 2580 | | |
2532 | | - | |
2533 | 2581 | | |
2534 | 2582 | | |
2535 | 2583 | | |
| |||
Lines changed: 157 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
0 commit comments