Commit 266a791
[Codegen][CPU] Flatten contiguous trailing dims of transfers before unrolling.
`VectorTransferLoweringPass` runs the MLIR transfer-lowering patterns
with `maxTransferRank=1` plus full-unroll, fully unrolling any rank-N>1
transfer to one rank-1 transfer per outer index. For a packed tile
whose trailing dim is a tiny contiguous chunk that turns a single wide
load into many narrow ones plus a shuffle chain to rebuild the wide
register. Concretely, a bf16xbf16->f32 inner_tiled matmul (N=16,
K_inner=2) loads each `<16x2xbf16>` RHS K-step as 16 separate
`<2xbf16>` loads + a `vpermt2d`/`vpermt2q` chain -- ~3 cycles of extra
work per K-step on top of the 29 dpbf16ps.
Apply `populateFlattenVectorTransferPatterns` *before* rank reduction,
gated on the target's natural word size (the pointer size, via
`DataLayout`): flatten only when the trailing dim is *sub-word*.
Sub-word loads in bulk are pathological; word-and-up trailing dims
(`<2xf32>` ... `<16xf32>`) are already good standalone loads, and
flattening *them* fuses register-sized rows into an oversized 1-D
transfer + a `vector.shape_cast` re-split, regressing whole-model
.vmfb size. (Not `native_vector_size`: that is the *widest* useful
vector, not the smallest non-pathological load.)
Measured: bf16 4096x4096 inner_tiled matmul on Zen 4, 80.8 -> 67.1 ms
per fragment; combined with the m_bcst-fold broadcast routing in a
sibling commit, the full matmul reaches ukernel parity (~50 ms). The
`sdxl/clip_compstat_cpu` size guard is unchanged at 583k bytes / 2130
dispatches (golden 650k / 2130).
Test fallout: `transpose_mask` in vector_lowering now writes a constant
`vector<4x2xi1>` mask as a single flat `vector<8xi1>` store; updated
the CHECK lines.
Progress towards #24515.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>1 parent 3936fb5 commit 266a791
3 files changed
Lines changed: 34 additions & 1 deletion
File tree
- compiler/src/iree/compiler/Codegen
- Common
- LLVMCPU/test
Lines changed: 28 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
8 | 10 | | |
9 | 11 | | |
10 | 12 | | |
| |||
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
41 | 69 | | |
42 | 70 | | |
43 | 71 | | |
| |||
Lines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
83 | 86 | | |
84 | 87 | | |
85 | 88 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
159 | 161 | | |
160 | 162 | | |
161 | 163 | | |
| |||
0 commit comments