|
36 | 36 | "GetResult", |
37 | 37 | "PreparedWrite", |
38 | 38 | "SupportsChunkCodec", |
39 | | - "SupportsChunkPacking", |
| 39 | + "SupportsChunkMapping", |
40 | 40 | "SupportsSyncCodec", |
41 | 41 | ] |
42 | 42 |
|
@@ -100,21 +100,26 @@ def encode_chunk(self, chunk_array: NDBuffer) -> Buffer | None: ... |
100 | 100 |
|
101 | 101 |
|
102 | 102 | @runtime_checkable |
103 | | -class SupportsChunkPacking(Protocol): |
104 | | - """Protocol for codecs that can pack/unpack inner chunks into a storage blob |
105 | | - and manage the prepare/finalize IO lifecycle. |
106 | | -
|
107 | | - `BytesCodec` and `ShardingCodec` implement this protocol. The pipeline |
108 | | - uses it to separate IO (prepare/finalize) from compute (encode/decode), |
109 | | - enabling the compute phase to run in a thread pool. |
110 | | -
|
111 | | - The lifecycle is: |
112 | | -
|
113 | | - 1. **Prepare**: fetch existing bytes from the store (if partial write), |
114 | | - unpack into per-inner-chunk buffers → `PreparedWrite` |
115 | | - 2. **Compute**: iterate `PreparedWrite.indexer`, decode each inner chunk, |
116 | | - merge new data, re-encode, update `PreparedWrite.chunk_dict` |
117 | | - 3. **Finalize**: pack `chunk_dict` back into a blob and write to store |
| 103 | +class SupportsChunkMapping(Protocol): |
| 104 | + """Protocol for codecs that expose their stored data as a mapping |
| 105 | + from chunk coordinates to encoded buffers. |
| 106 | +
|
| 107 | + A single store key holds a blob. This protocol defines how to |
| 108 | + interpret that blob as a ``dict[tuple[int, ...], Buffer | None]`` — |
| 109 | + a mapping from inner-chunk coordinates to their encoded bytes. |
| 110 | +
|
| 111 | + For a non-sharded codec (``BytesCodec``), the mapping is trivial: |
| 112 | + one entry at ``(0,)`` containing the entire blob. For a sharded |
| 113 | + codec, the mapping has one entry per inner chunk, derived from the |
| 114 | + shard index embedded in the blob. The pipeline doesn't need to know |
| 115 | + which case it's dealing with — it operates on the mapping uniformly. |
| 116 | +
|
| 117 | + This abstraction enables the three-phase IO/compute/IO pattern: |
| 118 | +
|
| 119 | + 1. **IO**: fetch the blob from the store. |
| 120 | + 2. **Compute**: unpack the blob into the chunk mapping, decode/merge/ |
| 121 | + re-encode entries, pack back into a blob. All pure compute. |
| 122 | + 3. **IO**: write the blob to the store. |
118 | 123 | """ |
119 | 124 |
|
120 | 125 | @property |
|
0 commit comments