Add H2 order-control seed policy#311
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a --seed-offset-policy parameter to the H2 response strength validation script, enabling a 'shared-position' mode for order-control scouting. The changes include updates to the documentation, seed offset calculation logic, and the final summary output, along with new unit tests. Review feedback suggests refining the scout detection logic to be more explicit and enhancing the verdict string to include validation results for scout runs to improve debugging clarity.
| raw_best_simple_auc=raw_best_simple_auc, | ||
| ) | ||
| validation_passed = bool(validation_gate["validation_passed"]) | ||
| is_order_control_scout = args.seed_offset_policy != "class-ordered" |
There was a problem hiding this comment.
The logic for determining if a run is an order_control_scout is slightly brittle. Currently, it assumes any policy other than the default class-ordered is a scout. If additional policies are added in the future that are intended for admission (not just scouting), this logic will incorrectly mark them as scouts. It would be safer to explicitly check for the known scout policies.
| is_order_control_scout = args.seed_offset_policy != "class-ordered" | |
| is_order_control_scout = args.seed_offset_policy == "shared-position" |
| "verdict": ( | ||
| "order-control scout generated" | ||
| if is_order_control_scout | ||
| else "positive but bounded validation" | ||
| if validation_passed | ||
| else "negative but useful" | ||
| ), |
There was a problem hiding this comment.
The verdict string currently prioritizes the scout status over the validation result. While scouts are indeed not for admission, it might be useful for debugging and clarity to include whether the scout run actually passed the validation gate in the verdict string itself, rather than just in the validation_passed boolean field.
| "verdict": ( | |
| "order-control scout generated" | |
| if is_order_control_scout | |
| else "positive but bounded validation" | |
| if validation_passed | |
| else "negative but useful" | |
| ), | |
| "verdict": ( | |
| f"order-control scout generated ({'passed' if validation_passed else 'failed'})" | |
| if is_order_control_scout | |
| else "positive but bounded validation" | |
| if validation_passed | |
| else "negative but useful" | |
| ), |
Summary
Tests