Commit 02e9853
feat(eval): Slice 1J'' — reasoning_effort + run_json_prompt usage tracking
Two eval-adapter extensions needed for Slice 1K's 5-candidate slate:
1. **reasoning_effort thread-through**. `OpenRouterEvalService` and
`KimiEvalService` both accepted `reasoning_effort` as a kwarg for
signature compatibility with `OpenAIService` but never forwarded
it into the underlying `chat.completions.create` call. The Slice
1K slate has three reasoning-class candidates (gpt-5.4@medium,
gpt-5.4-mini@medium, o4-mini@high) that need the effort signal to
behave at the intended cost/latency tier — without forwarding it
they'd all run at their default effort, distorting the
comparison.
Forwarded conditionally (only when truthy). Anthropic slugs in
non-thinking mode and DeepSeek v4 leave the kwarg unset and would
400 if it were passed.
2. **Per-call usage accumulation in `run_json_prompt`**. Caught
during the Slice 1K smoke run: every cost column read $0.0000
because the single-shot path was never bumping the `_usage`
counters — only `run_tool_loop` was. Mirrored the accumulator into
the JSON-prompt path so the assistant / parser / structuring
suites also surface accurate per-call cost.
Plus pricing-table additions in `tests/quality/provider_pricing.py`
for the two Slice 1K newcomers: `openai/o4-mini`
($1.10 / $4.40 per Mtok — substituted for the non-existent
`openai/gpt-5.1-mini`) and `anthropic/claude-haiku-4.5`
($1.00 / $5.00 per Mtok).
Verification: 22 / 22 openrouter-adapter unit tests pass. Live
preflight against all 5 Slice 1K slugs returned valid JSON.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent a619cd8 commit 02e9853
3 files changed
Lines changed: 52 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
145 | | - | |
| 145 | + | |
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
149 | 159 | | |
150 | 160 | | |
151 | 161 | | |
152 | 162 | | |
153 | 163 | | |
154 | 164 | | |
155 | 165 | | |
| 166 | + | |
156 | 167 | | |
157 | 168 | | |
158 | 169 | | |
| |||
185 | 196 | | |
186 | 197 | | |
187 | 198 | | |
188 | | - | |
| 199 | + | |
| 200 | + | |
189 | 201 | | |
190 | 202 | | |
191 | 203 | | |
| |||
213 | 225 | | |
214 | 226 | | |
215 | 227 | | |
216 | | - | |
| 228 | + | |
| 229 | + | |
217 | 230 | | |
218 | 231 | | |
219 | 232 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
312 | 312 | | |
313 | 313 | | |
314 | 314 | | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
315 | 324 | | |
316 | 325 | | |
317 | 326 | | |
| |||
323 | 332 | | |
324 | 333 | | |
325 | 334 | | |
| 335 | + | |
326 | 336 | | |
327 | 337 | | |
328 | 338 | | |
| |||
452 | 462 | | |
453 | 463 | | |
454 | 464 | | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
455 | 470 | | |
456 | 471 | | |
457 | 472 | | |
| |||
461 | 476 | | |
462 | 477 | | |
463 | 478 | | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
464 | 491 | | |
| 492 | + | |
465 | 493 | | |
466 | 494 | | |
467 | 495 | | |
| |||
506 | 534 | | |
507 | 535 | | |
508 | 536 | | |
| 537 | + | |
509 | 538 | | |
510 | 539 | | |
511 | 540 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
46 | 53 | | |
47 | 54 | | |
48 | 55 | | |
| |||
0 commit comments