Skip to content

Commit 89b938a

Browse files
ssjiaSS-JIA
authored andcommitted
[ET-VK][matmul] Re-implement fp32/fp16 matmul and linear with tiled compute and blocked weight packing
Pull Request resolved: #18171 Replace all existing matmul/linear operator implementations with new ones built from the ground up using a tiled compute approach. Delete all legacy implementations (MatMulLegacy.cpp, LinearLegacy.cpp, addmm_optimized.glsl, addmm_naive_*.glsl). New matmul (mm/bmm/addmm): - Single matmul.glsl shader handles mm, bmm, and addmm using FPInputTile, FPWeightTile, FPOutTile infrastructure from SDPA - Adaptive tile size selection (TILE_M=4/2/1) based on GPU occupancy - When mat2 is a constant tensor, automatically routes through the linear path for blocked weight packing New linear: - Custom 4OC×4IC blocked weight prepacking via pack_fp_linear_weight.glsl for optimal cache line utilization during tiled matmul - Supports both transposed [N,K] and non-transposed [K,N] weights with batch dimension support - Separate texture2d weight storage with automatic buffer fallback for large dimensions Performance on Adreno 750 (fp16, vs legacy): - Linear [4096,1024]x[256,1024]: 1.33x faster (texture) - Linear [4096,64]x[128,64]: 2.67x faster (texture) - BMM [1,4096,256]x[1,256,1024]: 1.63x faster (texture) ghstack-source-id: 352051371 @exported-using-ghexport Differential Revision: [D96488384](https://our.internmc.facebook.com/intern/diff/D96488384/)
1 parent 8bec69b commit 89b938a

30 files changed

Lines changed: 2290 additions & 1406 deletions

backends/vulkan/runtime/graph/ops/glsl/addmm_naive_buffer.glsl

Lines changed: 0 additions & 86 deletions
This file was deleted.

backends/vulkan/runtime/graph/ops/glsl/addmm_naive_texture3d.glsl

Lines changed: 0 additions & 189 deletions
This file was deleted.

backends/vulkan/runtime/graph/ops/glsl/addmm_naive_texture3d.yaml

Lines changed: 0 additions & 24 deletions
This file was deleted.

0 commit comments

Comments
 (0)