Commit fb9264c
committed
ggml-ve graph compiler: make the prompt-eval source size-independent
The N>1 codegen still baked the token count into the generated source in two
spots, so each distinct prompt length recompiled (~13s NCC) instead of reusing
one cached .so:
- the per-op debug comment embedded `n=`/`NB=` (the literal token count);
- the SWIGLU codegen emitted `int nc, nr;` with nr = ne[1]*ne[2]*ne[3] (= N),
left over (and (void)'d) after the loop bound moved to elem_n.
Both are N-dependent only — the actual computation already used per-token
constants + the runtime n_tok arg. Dropped them. Now two different prompt
lengths produce a byte-identical source (verified: 2nd length = 0 compiles),
so one .so serves any prompt length, matching decode's size-independence.
Output still token-for-token identical to the interpreter.1 parent 3b176b1 commit fb9264c
1 file changed
Lines changed: 13 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
726 | 726 | | |
727 | 727 | | |
728 | 728 | | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
729 | 733 | | |
730 | | - | |
| 734 | + | |
731 | 735 | | |
732 | 736 | | |
733 | 737 | | |
| |||
1209 | 1213 | | |
1210 | 1214 | | |
1211 | 1215 | | |
1212 | | - | |
1213 | | - | |
1214 | | - | |
1215 | | - | |
1216 | | - | |
1217 | | - | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
1218 | 1224 | | |
1219 | | - | |
1220 | 1225 | | |
1221 | 1226 | | |
1222 | 1227 | | |
1223 | | - | |
1224 | | - | |
1225 | | - | |
1226 | | - | |
1227 | | - | |
1228 | | - | |
1229 | 1228 | | |
1230 | | - | |
1231 | 1229 | | |
1232 | 1230 | | |
1233 | 1231 | | |
| |||
0 commit comments