Commit 0fb4f29
committed
ggml-ve graph compiler: wildcard the attn-mask ne[0] in the cgraph signature
Companion to the mask-CPY skip. The cgraph signature already treats KV/mask
growing dims as wildcards, but only ne[1..2] -- correct for the KV cache
(growing dim ne[1], ne[0]=head_dim is stable) but NOT for the attention mask,
whose growing dim is ne[0] (= padded n_kv). So the decode signature still
changed at every mask-pad boundary, forcing a needless re-trace + .so reload
(no NCC recompile after the source fix, but still per-boundary churn).
Since the mask no longer affects the generated source at all (its CPY is
skipped; VE flash-attn masks causally via seq_len), wildcard ALL the mask
spatial dims. The decode graph is now traced + compiled ONCE and reused for
the whole generation regardless of context length.
Verified: cold 250-token VEBP generation does 4 NCC compiles + 4 re-traces
total (the distinct with-/without-output-projection shapes + 2 trivial),
not growing with length. Was re-tracing at every pad boundary. Not pushed.1 parent 1685dd2 commit 0fb4f29
1 file changed
Lines changed: 10 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
136 | 144 | | |
137 | 145 | | |
138 | 146 | | |
139 | 147 | | |
140 | 148 | | |
141 | | - | |
| 149 | + | |
| 150 | + | |
142 | 151 | | |
143 | 152 | | |
144 | 153 | | |
| |||
0 commit comments