You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope
Add TIR builtins for Metal cooperative_tensor operations (MetalPerformancePrimitives):
- cooperative_tensor_fill: fill a cooperative_tensor with a value
- cooperative_tensor_load: load from device/threadgroup memory
- cooperative_tensor_store: store to device/threadgroup memory
- cooperative_tensor_multiply_accumulate: matrix multiply-accumulate via matmul2d
Add metal.cooperative_tensor storage scope (StorageRank::kMetalCooperativeTensor)
for buffers backed by MPP cooperative_tensor registers, analogous to the existing
metal.simdgroup scope but targeting the Metal 4 tensor operations API.
These primitives enable code generation for MetalPerformancePrimitives matmul2d,
which routes to NAX tensor cores on Apple M5 and falls back to simdgroup matrix
instructions on M1-M4.
0 commit comments