Skip to content

Commit 6ebaa05

Browse files
ssjiaSS-JIA
authored andcommitted
[ET-VK] Fix missing memory barrier for first-use writes on aliased tensors
Pull Request resolved: #17309 Tensors sharing physical memory via SharedObject each track their own `last_access_` independently. When a tensor's first access is a write, `prev_stage` is `NO_STAGE`, causing `transition()` to use `TOP_OF_PIPE_BIT` as `srcStageMask` with no `srcAccessMask` — effectively a no-op barrier. If the same physical memory was previously written through a different aliased tensor handle, this creates a WAW hazard where the new write may execute before or concurrently with the prior write, producing non-deterministic results. This was observed as non-deterministic q8ta_conv2d output in ResNet50: running the model twice with the same input produced slightly different quantized int8 values. Adding a debug print shader after each conv2d dispatch masked the issue because the print node's read-after-write barrier serialized GPU work. The fix: when `prev_stage` is `NO_STAGE` and the current access is a write, use `COMPUTE_SHADER_BIT` with `SHADER_WRITE_BIT` instead of `TOP_OF_PIPE_BIT` with no access flags. This ensures all prior compute shader work completes and its writes are made visible before the new write begins. Authored with Claude. ghstack-source-id: 339884030 @exported-using-ghexport Differential Revision: [D92715369](https://our.internmc.facebook.com/intern/diff/D92715369/)
1 parent 90e6e4c commit 6ebaa05

1 file changed

Lines changed: 18 additions & 10 deletions

File tree

backends/vulkan/runtime/api/containers/Tensor.cpp

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -775,9 +775,22 @@ void vTensorStorage::transition(
775775
// RAR: no need for synchronization
776776
if (prev_written || cur_written || layout_changed) {
777777
VkPipelineStageFlags src_stage = vkapi::vk_stage(prev_stage);
778+
VkAccessFlags src_access = vkapi::vk_access(prev_stage, prev_access);
779+
778780
if (0u == src_stage) {
779-
src_stage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
781+
if (cur_written) {
782+
// First access through this tensor handle, and it's a write. The
783+
// underlying memory may have been previously written through a
784+
// different aliased tensor handle (via SharedObject). Wait for all
785+
// prior compute work and make those writes available to prevent WAW
786+
// hazards on aliased memory.
787+
src_stage = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;
788+
src_access = VK_ACCESS_SHADER_WRITE_BIT;
789+
} else {
790+
src_stage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
791+
}
780792
}
793+
781794
VkPipelineStageFlags dst_stage = vkapi::vk_stage(cur_stage);
782795
if (0u == dst_stage) {
783796
dst_stage = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
@@ -786,20 +799,15 @@ void vTensorStorage::transition(
786799
pipeline_barrier.stage.src |= src_stage;
787800
pipeline_barrier.stage.dst |= dst_stage;
788801

802+
VkAccessFlags dst_access = vkapi::vk_access(cur_stage, cur_access);
803+
789804
if (image_) {
790805
pipeline_barrier.images.emplace_back(
791-
vkapi::vk_access(prev_stage, prev_access),
792-
vkapi::vk_access(cur_stage, cur_access),
793-
cur_layout,
794-
new_layout,
795-
image_);
806+
src_access, dst_access, cur_layout, new_layout, image_);
796807

797808
image_.set_layout(new_layout);
798809
} else if (buffer_) {
799-
pipeline_barrier.buffers.emplace_back(
800-
vkapi::vk_access(prev_stage, prev_access),
801-
vkapi::vk_access(cur_stage, cur_access),
802-
buffer_);
810+
pipeline_barrier.buffers.emplace_back(src_access, dst_access, buffer_);
803811
}
804812
}
805813

0 commit comments

Comments
 (0)