[ET-VK] Deduplicate transition clone nodes in TagMemoryMetaPass#18771
[ET-VK] Deduplicate transition clone nodes in TagMemoryMetaPass#18771meta-codesync[bot] merged 2 commits intogh/SS-JIA/517/basefrom
Conversation
When the same tensor is consumed by multiple ops that need a different storage representation, the pass previously inserted a separate clone transition for each consumer. Now it caches transition clones keyed by (source_node, target_storage_type, target_layout) and reuses existing clones when the same transition is needed again. For Qwen3 0.6B (8da4w fp16), the embedding output (BUFFER due to vocab_size exceeding texture limits) feeds both rms_norm and add which need TEXTURE. Previously 2 clones were inserted; now 1 clone is shared. Authored by Claude. Differential Revision: [D100004700](https://our.internmc.facebook.com/intern/diff/D100004700/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18771
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 1 Pending, 3 Unrelated FailuresAs of commit 1710eed with merge base 4afd7f9 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
…aPass" When the same tensor is consumed by multiple ops that need a different storage representation, the pass previously inserted a separate clone transition for each consumer. Now it caches transition clones keyed by (source_node, target_storage_type, target_layout) and reuses existing clones when the same transition is needed again. For Qwen3 0.6B (8da4w fp16), the embedding output (BUFFER due to vocab_size exceeding texture limits) feeds both rms_norm and add which need TEXTURE. Previously 2 clones were inserted; now 1 clone is shared. Authored by Claude. Differential Revision: [D100004700](https://our.internmc.facebook.com/intern/diff/D100004700/) [ghstack-poisoned]
5221944
into
gh/SS-JIA/517/base
Pull Request resolved: #18771 When the same tensor is consumed by multiple ops that need a different storage representation, the pass previously inserted a separate clone transition for each consumer. Now it caches transition clones keyed by (source_node, target_storage_type, target_layout) and reuses existing clones when the same transition is needed again. For Qwen3 0.6B (8da4w fp16), the embedding output (BUFFER due to vocab_size exceeding texture limits) feeds both rms_norm and add which need TEXTURE. Previously 2 clones were inserted; now 1 clone is shared. Authored by Claude. ghstack-source-id: 364280900 @exported-using-ghexport Differential Revision: [D100004700](https://our.internmc.facebook.com/intern/diff/D100004700/)
Pull Request resolved: #18771 When the same tensor is consumed by multiple ops that need a different storage representation, the pass previously inserted a separate clone transition for each consumer. Now it caches transition clones keyed by (source_node, target_storage_type, target_layout) and reuses existing clones when the same transition is needed again. For Qwen3 0.6B (8da4w fp16), the embedding output (BUFFER due to vocab_size exceeding texture limits) feeds both rms_norm and add which need TEXTURE. Previously 2 clones were inserted; now 1 clone is shared. Authored by Claude. ghstack-source-id: 364280900 @exported-using-ghexport Differential Revision: [D100004700](https://our.internmc.facebook.com/intern/diff/D100004700/)
Pull Request resolved: pytorch#18771 When the same tensor is consumed by multiple ops that need a different storage representation, the pass previously inserted a separate clone transition for each consumer. Now it caches transition clones keyed by (source_node, target_storage_type, target_layout) and reuses existing clones when the same transition is needed again. For Qwen3 0.6B (8da4w fp16), the embedding output (BUFFER due to vocab_size exceeding texture limits) feeds both rms_norm and add which need TEXTURE. Previously 2 clones were inserted; now 1 clone is shared. Authored by Claude. ghstack-source-id: 364280900 @exported-using-ghexport Differential Revision: [D100004700](https://our.internmc.facebook.com/intern/diff/D100004700/)
Stack from ghstack (oldest at bottom):
When the same tensor is consumed by multiple ops that need a different
storage representation, the pass previously inserted a separate clone
transition for each consumer. Now it caches transition clones keyed by
(source_node, target_storage_type, target_layout) and reuses existing
clones when the same transition is needed again.
For Qwen3 0.6B (8da4w fp16), the embedding output (BUFFER due to
vocab_size exceeding texture limits) feeds both rms_norm and add which
need TEXTURE. Previously 2 clones were inserted; now 1 clone is shared.
Authored by Claude.
Differential Revision: D100004700