Commit 33a8b7d
authored
* perf(geotiff): batch _try_kvikio_read_tiles pread submissions (#1688)
Replaces the per-tile cupy.empty + blocking IOFuture.get() inside the
kvikio GDS path with a single contiguous device buffer, batched pread
submissions, and a _check_gpu_memory guard up front.
The old loop alternated submit -> wait -> submit -> wait, so the kvikio
worker pool only saw one outstanding pread at a time and the per-tile
cupy.empty() setup cost compounded across all tiles. The new pattern
allocates once, submits every pread before the first .get(), and lets
the worker pool overlap the reads.
Microbench with 8-worker pool simulation, 256 tiles @ 1ms IO latency:
old 256ms vs new 38.7ms (~6.6x). Single-thread simulation: 28.5ms (9x).
Adds 9 unit tests covering the kvikio-absent path, single-buffer pointer
arithmetic, submit-before-get ordering, memory guard contract, partial-
read fallback, end-to-end data round-trip, and zero-size / all-sparse
tile edge cases. The fake CuFile lets the structural checks run on
hosts without a real GDS install.
* Address PR #1693 review: tighten batching test + flake8 + comment accuracy
* test_all_preads_submitted_before_any_get now records both submit and
get events into a single ordered timeline and asserts every submit
occurs before the first get. The prior version asserted on per-event
lists ([0,1,2,3] each), which the legacy interleaved
submit->get->submit->get loop also satisfies, so the test could not
catch a regression to that pattern. Verified by temporarily reverting
_try_kvikio_read_tiles to the interleaved pattern: new assertion
fails with a clear "preads and gets are interleaved" message showing
the [submit,get,submit,get,...] timeline.
* Removed the unused ``import sys`` and the no-op ``fake_mod_obj``
lines from test_all_zero_size_tiles_returns_zero_length_views.
flake8 now reports no F401/F841 on the test file.
* Reworded the MemoryError comment in _try_kvikio_read_tiles. The
previous wording claimed the CPU-mmap fallback "does not pre-allocate
the full compressed payload", but gpu_decode_tiles still calls
``d_comp = cupy.asarray(comp_buf_host)`` over ``total_comp`` bytes.
The new wording explains the fallback skips the GDS-specific
contiguous read buffer but still pays the bulk device allocation.
1 parent 71d4f51 commit 33a8b7d
3 files changed
Lines changed: 478 additions & 10 deletions
File tree
- .claude
- xrspatial/geotiff
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
945 | 945 | | |
946 | 946 | | |
947 | 947 | | |
948 | | - | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
949 | 964 | | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
950 | 970 | | |
951 | 971 | | |
952 | 972 | | |
953 | 973 | | |
954 | 974 | | |
955 | 975 | | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
956 | 988 | | |
957 | | - | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
958 | 993 | | |
959 | | - | |
960 | | - | |
961 | | - | |
962 | | - | |
963 | | - | |
964 | | - | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
965 | 1011 | | |
966 | | - | |
| 1012 | + | |
967 | 1013 | | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
968 | 1018 | | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
969 | 1027 | | |
970 | 1028 | | |
971 | 1029 | | |
| |||
0 commit comments