Commit 23c5236
committed
ggml-ve graph compiler: allow cross-fragment inputs by default (VEBP 3.2x)
The self-containment gate refused any cgraph reading a computed intermediate
produced by another subgraph. That was a defensive measure added while
un-implemented YaRN rope made fragmented compiles garble. With YaRN + the
chunked-codegen fixes in place, that staging is correct: execute() stages
the cross-fragment input host<->HBM and the producing subgraph (interpreted,
or a prior compiled graph) writes the host tensor before this one runs.
This was the last thing keeping the VEBP ternary model out of the compiler:
its token_embd is F16, so GET_ROWS runs on CPU (VE GET_ROWS supports only
BF16/F32 src) and produces embd as a cross-fragment input. The gate refused
the whole 1266-node decode graph over that one input -> interpreter.
Flip the gate to allow-by-default; GGML_VE_GC_STRICT=1 restores the refusal.
Verified correct on Llama-3.2-3B, Bonsai-8B BF16, and Bonsai-8B VEBP.
Result (GGML_VE_HBM=1, -fa on, -ub 1, warm):
- Ternary-Bonsai-8B-VEBP: 10.6 (interp) -> 33.5 tok/s compiled (3.17x).
That completes ternary models running correctly AND fast through the graph
compiler. (A cleaner alternative is F16 GET_ROWS on VE so VEBP is fully
self-contained -- follow-up.) Opt-in (GGML_VE_COMPILE_GRAPH=1). Not pushed.1 parent 3301c83 commit 23c5236
2 files changed
Lines changed: 14 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
| 166 | + | |
| 167 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
616 | 616 | | |
617 | 617 | | |
618 | 618 | | |
619 | | - | |
620 | | - | |
621 | | - | |
622 | | - | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
623 | 630 | | |
624 | | - | |
| 631 | + | |
625 | 632 | | |
626 | 633 | | |
627 | 634 | | |
| |||
0 commit comments