Commit 6906779
committed
coordinator: eval union of baselined + target detectors; graceful missing-baseline scoring
With blank-slate upstream (ella/observer-blank) the agent invents detector
names freely. Old policy in relevant_detectors() — 'fall back to all known
detectors when target doesn't intersect known' — silently excluded the new
detector the candidate just created, evaluating only stale ones. Plus
score_against_baseline KeyError'd on any detector not yet baselined.
Behavior now (for every iteration):
detectors_to_eval = baseline.detectors.keys() ∪ candidate.target_components
- Every baselined detector is evaluated every iter → catches 'did this
candidate break an existing ship' across the whole admitted set, not
just the ones the candidate explicitly targets.
- The candidate's own target components are ALWAYS evaluated even if
not yet baselined → the new detector's progress is measured and
recorded per-iter.
- score_against_baseline returns a no-gate ScoringResult when the
detector is missing from baseline: raw F1/FPs populated,
strict_regressions=[], recall_floor_violations=[], baseline_mean_f1=0.
Gates only fire for baselined detectors. FP-ceiling already guards
baseline_total_fps > 0 so it auto-skips for unbaselined detectors.
Promotion flow: when iter N ships a good novel-vX detector, operator
runs import_baseline --detector novel-vX=<iter N report path> to admit
it. From then on future candidates are gated against novel-vX too.
No rolling auto-ratchet (anti-noise-promotion); promotion is always a
human decision.1 parent 4863d2b commit 6906779
2 files changed
Lines changed: 46 additions & 25 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
570 | 570 | | |
571 | 571 | | |
572 | 572 | | |
573 | | - | |
574 | | - | |
575 | | - | |
576 | | - | |
577 | | - | |
578 | | - | |
579 | | - | |
580 | | - | |
581 | | - | |
582 | | - | |
583 | | - | |
584 | | - | |
585 | | - | |
586 | | - | |
587 | | - | |
588 | | - | |
589 | | - | |
590 | | - | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
591 | 592 | | |
592 | 593 | | |
593 | | - | |
594 | | - | |
595 | | - | |
596 | | - | |
597 | | - | |
598 | | - | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
599 | 599 | | |
600 | 600 | | |
601 | 601 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
93 | 114 | | |
94 | 115 | | |
95 | 116 | | |
| |||
0 commit comments