Define engine semantics for evaluator fallback failures and timeouts

## Summary
- Move evaluator error-handling policy out of individual evaluator configs and into the engine / shared evaluator spec.
- Evaluators should report that evaluation failed; the engine should decide whether that failure behaves as fail-open or fail-closed.

## Motivation
- PR https://github.com/agentcontrol/agent-control/pull/196 surfaced a broader engine contract gap, but the problem is not specific to `detect-secrets`.
- Today, several evaluators expose evaluator-local `on_error` / fallback behavior, including:
  - `cisco.ai_defense`
  - `galileo.luna2`
  - `yelp.detect_secrets`
- That forces evaluators to encode failure policy as ordinary boolean results (`matched=True/False`) or inconsistent uses of `result.error`.
- The engine then loses the distinction between:
  - "the evaluator produced a normal boolean result"
  - "the evaluator failed, and policy says treat that failure as allow or deny"
- This becomes especially problematic in composite condition trees and around timeout handling.

## Current behavior
- Evaluator-local `on_error` is part of evaluator-specific config today.
- Evaluators currently handle failures inconsistently:
  - some return `matched=True/False` with `error=None`
  - some set `result.error` only on fail-open paths
  - some rely on generic engine error handling
- In `/engine/src/agent_control_engine/core.py`, composite condition evaluation treats `result.error` as the only first-class failure signal.
- As a result, evaluator-local fallback behavior gets collapsed into ordinary booleans inside `not(...)`, `and(...)`, and `or(...)`.
- Separately, the engine wraps `evaluate()` in `asyncio.wait_for(...)`, so engine-level timeout handling can race with evaluator-local timeout / fallback behavior.

## Expected behavior
- Evaluators should report evaluation failure in a standard way.
- The engine should own the policy for what to do with that failure.
- That policy should be applied consistently across:
  - leaf evaluation
  - composite conditions
  - engine / SDK error reporting
  - confidence calculation
  - timeout handling
- The platform should have one clear contract for fail-open / fail-closed evaluator failures instead of each evaluator inventing its own encoding.

## Reproduction (if bug)
1. Configure an evaluator with evaluator-local `on_error="allow"` or `on_error="deny"`.
2. Use it inside a composite condition such as `not(...)`, `and(...)`, or `or(...)`.
3. Trigger an evaluator failure (runtime error or timeout).
4. Observe that the engine treats the result as either:
   - an ordinary boolean, or
   - a generic hard error,
   rather than a first-class "evaluation failed, apply fallback policy" outcome.

## Proposed solution (optional)
- Move fallback policy out of evaluator-specific config and into the engine / shared evaluator spec.
- A possible direction:
  - evaluators report failure in a standard way
  - engine applies a shared `error_policy` / `on_error` such as `allow` or `deny`
  - composite conditions operate on a first-class failure-with-policy state, not fake booleans
- This would make evaluator behavior simpler and make the fallback contract consistent across evaluators.

## Open questions
- **Where should engine-owned error policy live?**
  - on `EvaluatorSpec`
  - on the condition leaf
  - somewhere else in control definition structure
- **Should we reuse the existing `EvaluatorResult.error` field, or introduce a more explicit failure representation?**
  - reuse `error` plus new engine semantics
  - add a structured failure status / type
  - use a typed exception contract from evaluators
- **How should composites treat evaluator failures with policy?**
  - what should `not(...)` do?
  - what should `and(...)` / `or(...)` do?
  - should failure short-circuit, or collapse immediately to allow/deny?
- **How should top-level engine results expose fallback-driven failures?**
  - should they appear in `errors`
  - in `matches`
  - in a new category
  - how should confidence be computed?
- **What should the timeout contract be?**
  - should evaluator-local/runtime timeouts map into engine-owned error policy first?
  - should the engine keep a larger hard timeout outside that as a kill switch?
  - should evaluator timeout config and engine timeout config be separated?
- **What is the migration path?**
  - deprecate evaluator-local `on_error`
  - support both temporarily
  - how to preserve backward compatibility for existing contrib evaluators and users
- **Do we want all evaluators to support engine-owned fail-open/fail-closed behavior, or only evaluators that opt in?**

## Additional context
- PR that surfaced the issue: https://github.com/agentcontrol/agent-control/pull/196
- Relevant engine code:
  - `/Users/levneiman/code/agent-control-4/engine/src/agent_control_engine/core.py`
- Relevant shared model code:
  - `/Users/levneiman/code/agent-control-4/models/src/agent_control_models/controls.py`
- Relevant evaluator examples:
  - `/Users/levneiman/code/agent-control-4/evaluators/contrib/detect_secrets/src/agent_control_evaluator_detect_secrets/detect_secrets/evaluator.py`
  - `/Users/levneiman/code/agent-control-4/evaluators/contrib/cisco/src/agent_control_evaluator_cisco/ai_defense/evaluator.py`
  - `/Users/levneiman/code/agent-control-4/evaluators/contrib/galileo/src/agent_control_evaluator_galileo/luna2/evaluator.py`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define engine semantics for evaluator fallback failures and timeouts #199

Summary

Motivation

Current behavior

Expected behavior

Reproduction (if bug)

Proposed solution (optional)

Open questions

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Define engine semantics for evaluator fallback failures and timeouts #199

Description

Summary

Motivation

Current behavior

Expected behavior

Reproduction (if bug)

Proposed solution (optional)

Open questions

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions