Update on "[ET-VK] Fix use-after-free in PrepackNode when TensorRefs are shared"

ssjia · ssjia · commit 5fd2bae5f9b7 · 2026-04-15T10:09:12.000-07:00
When a model has shared/tied weights (e.g. tied embeddings in transformers), the serialization deduplicates them into a single TensorRef that multiple PrepackNodes reference. Previously, `PrepackNode::create_staging_buffer()` called `tref->free_buffer()` unconditionally after copying weight data to a GPU staging buffer. This meant the first PrepackNode to execute would free the underlying host memory, and subsequent PrepackNodes sharing the same TensorRef would read from a dangling pointer — producing garbage/NaN values in prepacked weight and bias tensors on the GPU. The fix adds a `prepack_use_count` field to `TensorRef` that tracks how many PrepackNodes still need to read from it. Each PrepackNode increments the count in its constructor and decrements it after copying data. The buffer is only freed when the count reaches zero. This preserves the original eager-free behavior for non-shared weights (freeing immediately after the single consumer copies) while correctly deferring the free for shared weights until the last consumer is done — avoiding both the use-after-free and unnecessary peak memory increase. Differential Revision: [D101009402](https://our.internmc.facebook.com/intern/diff/D101009402/) [ghstack-poisoned]
diff --git a/backends/vulkan/runtime/graph/containers/Constant.h b/backends/vulkan/runtime/graph/containers/Constant.h
@@ -33,7 +33,7 @@ struct TensorRef final {
   // this reaches 0, the buffer can be safely freed. This prevents
   // use-after-free when multiple PrepackNodes reference the same TensorRef
   // (e.g. shared/tied weights).
-  uint32_t prepack_use_count{0};
+  int32_t prepack_use_count{0};
 
   explicit TensorRef(
       const std::vector<int64_t>& t_sizes,