Commit f1cb455
fix: treat agent timeouts as scored results with timed_out metadata flag
AgentTimeoutError tasks already have their partial work scored by the
verifier, but were classified as "errored" hiding their reward from
analysis. Now timeouts with a verifier reward are classified as
completed_pass/completed_fail with a timed_out flag, keeping only real
infrastructure errors as "errored".
- aggregate_status.py: classify_task() checks exception_type before
marking errored; adds timed_out field and separate timed_out count
- extractors.py: AgentTimeoutError tasks classified by reward not error
- models.py: add timed_out field to TaskMetrics dataclass
- compare_configs.py: pass through timed_out flag in comparison output
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 9aa95a0 commit f1cb455
File tree
4 files changed
+52
-11
lines changed- scripts
- ccb_metrics
4 files changed
+52
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
| 237 | + | |
237 | 238 | | |
238 | 239 | | |
239 | 240 | | |
| |||
251 | 252 | | |
252 | 253 | | |
253 | 254 | | |
| 255 | + | |
254 | 256 | | |
255 | 257 | | |
256 | 258 | | |
| |||
276 | 278 | | |
277 | 279 | | |
278 | 280 | | |
279 | | - | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
280 | 301 | | |
281 | 302 | | |
282 | | - | |
283 | | - | |
284 | | - | |
| 303 | + | |
285 | 304 | | |
286 | | - | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
287 | 310 | | |
288 | 311 | | |
289 | 312 | | |
| |||
355 | 378 | | |
356 | 379 | | |
357 | 380 | | |
| 381 | + | |
| 382 | + | |
358 | 383 | | |
359 | 384 | | |
360 | 385 | | |
| |||
592 | 617 | | |
593 | 618 | | |
594 | 619 | | |
| 620 | + | |
595 | 621 | | |
596 | 622 | | |
597 | 623 | | |
598 | 624 | | |
599 | 625 | | |
600 | 626 | | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
601 | 630 | | |
602 | 631 | | |
603 | 632 | | |
| |||
658 | 687 | | |
659 | 688 | | |
660 | 689 | | |
661 | | - | |
662 | | - | |
663 | | - | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
664 | 696 | | |
665 | 697 | | |
666 | 698 | | |
667 | 699 | | |
668 | | - | |
| 700 | + | |
| 701 | + | |
669 | 702 | | |
670 | 703 | | |
671 | 704 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
101 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
102 | 105 | | |
103 | 106 | | |
104 | 107 | | |
| |||
160 | 163 | | |
161 | 164 | | |
162 | 165 | | |
| 166 | + | |
163 | 167 | | |
164 | 168 | | |
165 | 169 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
86 | 89 | | |
87 | 90 | | |
88 | 91 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| 125 | + | |
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
| |||
0 commit comments