Commit cd8b065
authored
fix(qwen35moe): size KV reservation by n_full + cache-type, not n_layer×f16 (#454)
Only n_full = n_layer/full_attention_interval layers carry a KV cache (the rest
are O(1)-state SSM/DeltaNet); honoring that plus the resolved q4_0 cache type
cuts the placement reservation ~14x (25 -> 1.76 GiB @131k), keeping experts
all-hot at deep context instead of forcing the slow hybrid spec path. Extract a
shared kv_reservation_bytes_per_token() helper (one source of truth for qwen35 +
qwen35moe) and add a unit test pinning n_full + cache-type vs the old form.1 parent 55a205d commit cd8b065
4 files changed
Lines changed: 65 additions & 7 deletions
File tree
- server
- src
- qwen35moe
- qwen35
- test
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
30 | 48 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
156 | 156 | | |
157 | 157 | | |
158 | 158 | | |
159 | | - | |
160 | 159 | | |
161 | 160 | | |
162 | | - | |
163 | | - | |
164 | | - | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
2163 | 2164 | | |
2164 | 2165 | | |
2165 | 2166 | | |
2166 | | - | |
2167 | | - | |
2168 | | - | |
| 2167 | + | |
| 2168 | + | |
| 2169 | + | |
| 2170 | + | |
| 2171 | + | |
| 2172 | + | |
| 2173 | + | |
| 2174 | + | |
| 2175 | + | |
| 2176 | + | |
2169 | 2177 | | |
2170 | 2178 | | |
2171 | 2179 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
161 | 192 | | |
162 | 193 | | |
163 | 194 | | |
| |||
167 | 198 | | |
168 | 199 | | |
169 | 200 | | |
| 201 | + | |
170 | 202 | | |
171 | 203 | | |
172 | 204 | | |
| |||
0 commit comments