You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add TaskVerifierRegistry for custom task verification (#89)
Add a registry pattern for custom task verifiers that can inspect VM
state after task execution. This enables GoTo IT Autopilot (and other
integrators) to register domain-specific verification functions without
subclassing BenchmarkAdapter.
- TaskVerifierRegistry with decorator and programmatic registration
- VerificationResult dataclass with success/score/details
- WAALiveAdapter.run_powershell() for executing PowerShell on the VM
- Built-in clear_browsing_data reference verifier
- 33 tests covering registry operations and built-in verifiers
- Exports from evaluation package and main package __init__
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments