You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ez][ET-VK][partitioner] Allow layout-agnostic ops to accept quantized layouts
Pull Request resolved: #19395
Two changes that together let the partitioner keep PACKED_INT8 layouts flowing through identity-like ops, eliminating spurious clone dispatches:
1. utils.py: ANY_STORAGE_INCL_PACKED_INT8 (renamed from ALL_STORAGES_REPSET) previously claimed every layout (including PACKED_INT8_*) on the texture side, but PACKED_INT8 is buffer-only by convention — the texture indexing helpers and required_image_extents don't know about quantized layouts. Narrow the texture side to all_memory_layouts (float-only). Every existing call site is either an intersection identity or a wildcard for non-tensor / not-yet-prepacked args, so this narrow is non-breaking; and now the repset can act as a true universal set when intersected against quant-aware repsets. The new name slots cleanly next to ANY_STORAGE / ANY_BUFFER / ANY_TEXTURE and tells the reader exactly what is added: "like ANY_STORAGE, but also admits PACKED_INT8 (on the buffer side)".
2. op_registry.py: switch view_copy / clone / _clone_dim_order / alias_copy from inputs_storage=ANY_STORAGE to inputs_storage=ANY_STORAGE_INCL_PACKED_INT8. ANY_STORAGE is float-only, so when one of these no-op identity ops sits between two q8ta ops the BFS in TagMemoryMetaPass.constrain_op_*_repset short-circuits (zero overlap with PACKED_INT8_BUFFER) and forces transitions on both sides. With ANY_STORAGE_INCL_PACKED_INT8 they now admit both float and quantized layouts and the redundant-op transform folds them away.
The 31 other ops using ANY_STORAGE are real compute ops (binaryop, comparison, softmax, argreduce, permute_copy, etc.) whose float-only kernels do not accept quantized int8x4 layouts (q8ta_* are separate ops); leaving those alone.
On RefineNet 24feat (1x3x256x144) the 8 _clone_dim_order ops the partitioner had been inserting around the 4 fused q8ta_pixel_shuffle nodes are now folded by the delegate. Runtime q8ta_clone dispatches drop from 11 to 3 (the 3 residuals are unrelated, from the original model graph).
ghstack-source-id: 379519734
@exported-using-ghexport
Differential Revision: [D103770022](https://our.internmc.facebook.com/intern/diff/D103770022/)
0 commit comments