[vulkan] Add a buffer index offset during codegen to allow arbitrary crop offsets #8954
Conversation
…cessing buffers to avoid restrictive alignment constraints. CodeGen now adds one int32 buffer offset param for each buffer after all other scalar args The runtime packs these params into the uniform buffer for each storage buffer Crop device now computes an index offset (instead of a byte offset Copy to/from device recomputes a byte offset from this index offset
|
Can you explain what the difference exactly is between the |
Adjusted relative offsets to use indexing for buffer copies. Added RegionAllocation and RegionIndexing to clarify mapping. Updated all affected interfaces.
The refactoring cleanup should help. The original "offset" was a "byte offset" pointing to the reserved area within the "block" that the "region" occupied. This is the actual allocation, and we insure this is aligned to an appropriate address for the device memory pool. Previously, we also had a special "range" that was used for cropping regions, that also added a "head_offset" and "tail_offset" in bytes to the existing "byte offset" for where in the block the cropped region should occupy. This was overly complicated, and ultimately not compatible across all device memory pools. Some have specific alignment requirements, which are usually much bigger than a byte, so using a byte offset could trigger errors. This PR introduces an "index offset" shader parameter to support crops at arbitrary offsets. This is now defined in the MemoryRegion.indexing field. |
|
I still don't understand why there is even a sub-region in a "parent block"? Why not always reference the "parent" and just use a byte-granular offset into the shader? I understand that the shader-parameter will be necessary in any case. I don't understand why the way of referencing this memory seems to be: instead of: and remove this extra "layer" of subregioning. |
We can't do byte-granular offsets in the shader, since not every device supports 8-bit or 16-bit scalar types. So we have to use the natural indexing. The block base address is the address allocated by the device driver for a chunk of device memory. One allocation has to be mapped to multiple vk_buffers, each of which are used as backing storage for the halide_buffer_t. The region offset is managed by our vulkan allocator and simply adjusts offsets for mapping vk_buffer objects that point into the allocated block. So we need to maintain this offset everytime we bind a vk_buffer. |
This PR enables arbitrary offsets for accessing buffers and avoids restrictive Vulkan alignment constraints.
In particular, if the device memory requirements have a minimum alignment that is larger than the natural scalar size of a channel, we can encounter errors when trying to access arbitrary crops since we may bind a vk_buffer with an offset less than the minimum required alignment. To avoid this, we always provide an index offset to all storage buffers, which provides a mechanism to handle crops in a more generic manner.
Addresses #8944