Commit 611d34c
fix(eval): clamp Kimi adapter max_tokens to per-task budget; preflight-only arg
KimiEvalService floored max_tokens at 8000 even for the 20-token
preflight call. OpenRouter reserves max_tokens*price of credit upfront,
so flooring tiny calls inflated the reservation and caused spurious 402s
on pricier models / low balances. Now clamps to the caller's real
per-task budget with 8000 as a CEILING (never a floor); truncation is
still counted via finish_reason=="length". provider_ab_runner gains a
`--preflight-only` arg (validate every slug/key for ~$0.001 then exit;
early-exit wiring still TODO — tracked in the parked eval plan).
Eval-scoped, not production-wired.
7 hermetic adapter tests green (tests/quality/test_kimi_eval_service.py).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent dfb1ed8 commit 611d34c
2 files changed
Lines changed: 15 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
58 | | - | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
59 | 65 | | |
60 | 66 | | |
61 | 67 | | |
| |||
167 | 173 | | |
168 | 174 | | |
169 | 175 | | |
170 | | - | |
| 176 | + | |
| 177 | + | |
171 | 178 | | |
172 | 179 | | |
173 | 180 | | |
| |||
194 | 201 | | |
195 | 202 | | |
196 | 203 | | |
197 | | - | |
| 204 | + | |
| 205 | + | |
198 | 206 | | |
199 | 207 | | |
200 | 208 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
211 | 211 | | |
212 | 212 | | |
213 | 213 | | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
214 | 217 | | |
215 | 218 | | |
216 | 219 | | |
| |||
0 commit comments