Skip to content

[vulkan] Add a buffer index offset during codegen to allow arbitrary crop offsets #8954

Merged
derek-gerstmann merged 3 commits into
mainfrom
dg/vulkan_buffer_offset_param
Feb 20, 2026
Merged

[vulkan] Add a buffer index offset during codegen to allow arbitrary crop offsets #8954
derek-gerstmann merged 3 commits into
mainfrom
dg/vulkan_buffer_offset_param

Conversation

@derek-gerstmann
Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann commented Feb 18, 2026

This PR enables arbitrary offsets for accessing buffers and avoids restrictive Vulkan alignment constraints.

In particular, if the device memory requirements have a minimum alignment that is larger than the natural scalar size of a channel, we can encounter errors when trying to access arbitrary crops since we may bind a vk_buffer with an offset less than the minimum required alignment. To avoid this, we always provide an index offset to all storage buffers, which provides a mechanism to handle crops in a more generic manner.

  • CodeGen now adds one int32 buffer offset param for each buffer after all other scalar args.
  • The runtime packs these params into the uniform buffer for each storage buffer.
  • Crop device now computes an index offset (instead of a byte offset).
  • Copy to/from device recomputes a byte offset from this index offset.

Addresses #8944

…cessing buffers to avoid restrictive alignment constraints.

CodeGen now adds one int32 buffer offset param for each buffer after all other scalar args
The runtime packs these params into the uniform buffer for each storage buffer
Crop device now computes an index offset (instead of a byte offset
Copy to/from device recomputes a byte offset from this index offset
@mcourteaux
Copy link
Copy Markdown
Contributor

Can you explain what the difference exactly is between the offset field and the index_offset field in the MemoryRegion struct? Seems odd that'd you'd need two scalar offsets?

Adjusted relative offsets to use indexing for buffer copies.
Added RegionAllocation and RegionIndexing to clarify mapping.
Updated all affected interfaces.
@derek-gerstmann
Copy link
Copy Markdown
Contributor Author

Can you explain what the difference exactly is between the offset field and the index_offset field in the MemoryRegion struct? Seems odd that'd you'd need two scalar offsets?

The refactoring cleanup should help. The original "offset" was a "byte offset" pointing to the reserved area within the "block" that the "region" occupied. This is the actual allocation, and we insure this is aligned to an appropriate address for the device memory pool.

Previously, we also had a special "range" that was used for cropping regions, that also added a "head_offset" and "tail_offset" in bytes to the existing "byte offset" for where in the block the cropped region should occupy. This was overly complicated, and ultimately not compatible across all device memory pools. Some have specific alignment requirements, which are usually much bigger than a byte, so using a byte offset could trigger errors.

This PR introduces an "index offset" shader parameter to support crops at arbitrary offsets. This is now defined in the MemoryRegion.indexing field.

@mcourteaux
Copy link
Copy Markdown
Contributor

I still don't understand why there is even a sub-region in a "parent block"? Why not always reference the "parent" and just use a byte-granular offset into the shader? I understand that the shader-parameter will be necessary in any case. I don't understand why the way of referencing this memory seems to be:

block.base_address + region.offset + extra_index_offset_introduced_in_this_pr

instead of:

block.base_address + index_offset_introduced_in_this_pr

and remove this extra "layer" of subregioning.

@derek-gerstmann
Copy link
Copy Markdown
Contributor Author

derek-gerstmann commented Feb 20, 2026

I still don't understand why there is even a sub-region in a "parent block"? Why not always reference the "parent" and just use a byte-granular offset into the shader? I understand that the shader-parameter will be necessary in any case. I don't understand why the way of referencing this memory seems to be:

block.base_address + region.offset + extra_index_offset_introduced_in_this_pr

instead of:

block.base_address + index_offset_introduced_in_this_pr

and remove this extra "layer" of subregioning.

We can't do byte-granular offsets in the shader, since not every device supports 8-bit or 16-bit scalar types. So we have to use the natural indexing.

The block base address is the address allocated by the device driver for a chunk of device memory. One allocation has to be mapped to multiple vk_buffers, each of which are used as backing storage for the halide_buffer_t. The region offset is managed by our vulkan allocator and simply adjusts offsets for mapping vk_buffer objects that point into the allocated block. So we need to maintain this offset everytime we bind a vk_buffer.

@derek-gerstmann derek-gerstmann merged commit d5a30fb into main Feb 20, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants