Skip to content

Define engine semantics for evaluator fallback failures and timeouts #199

@lan17

Description

@lan17

Summary

  • Move evaluator error-handling policy out of individual evaluator configs and into the engine / shared evaluator spec.
  • Evaluators should report that evaluation failed; the engine should decide whether that failure behaves as fail-open or fail-closed.

Motivation

  • PR feat(evaluators): add yelp.detect_secrets contrib evaluator #196 surfaced a broader engine contract gap, but the problem is not specific to detect-secrets.
  • Today, several evaluators expose evaluator-local on_error / fallback behavior, including:
    • cisco.ai_defense
    • galileo.luna2
    • yelp.detect_secrets
  • That forces evaluators to encode failure policy as ordinary boolean results (matched=True/False) or inconsistent uses of result.error.
  • The engine then loses the distinction between:
    • "the evaluator produced a normal boolean result"
    • "the evaluator failed, and policy says treat that failure as allow or deny"
  • This becomes especially problematic in composite condition trees and around timeout handling.

Current behavior

  • Evaluator-local on_error is part of evaluator-specific config today.
  • Evaluators currently handle failures inconsistently:
    • some return matched=True/False with error=None
    • some set result.error only on fail-open paths
    • some rely on generic engine error handling
  • In /engine/src/agent_control_engine/core.py, composite condition evaluation treats result.error as the only first-class failure signal.
  • As a result, evaluator-local fallback behavior gets collapsed into ordinary booleans inside not(...), and(...), and or(...).
  • Separately, the engine wraps evaluate() in asyncio.wait_for(...), so engine-level timeout handling can race with evaluator-local timeout / fallback behavior.

Expected behavior

  • Evaluators should report evaluation failure in a standard way.
  • The engine should own the policy for what to do with that failure.
  • That policy should be applied consistently across:
    • leaf evaluation
    • composite conditions
    • engine / SDK error reporting
    • confidence calculation
    • timeout handling
  • The platform should have one clear contract for fail-open / fail-closed evaluator failures instead of each evaluator inventing its own encoding.

Reproduction (if bug)

  1. Configure an evaluator with evaluator-local on_error="allow" or on_error="deny".
  2. Use it inside a composite condition such as not(...), and(...), or or(...).
  3. Trigger an evaluator failure (runtime error or timeout).
  4. Observe that the engine treats the result as either:
    • an ordinary boolean, or
    • a generic hard error,
      rather than a first-class "evaluation failed, apply fallback policy" outcome.

Proposed solution (optional)

  • Move fallback policy out of evaluator-specific config and into the engine / shared evaluator spec.
  • A possible direction:
    • evaluators report failure in a standard way
    • engine applies a shared error_policy / on_error such as allow or deny
    • composite conditions operate on a first-class failure-with-policy state, not fake booleans
  • This would make evaluator behavior simpler and make the fallback contract consistent across evaluators.

Open questions

  • Where should engine-owned error policy live?
    • on EvaluatorSpec
    • on the condition leaf
    • somewhere else in control definition structure
  • Should we reuse the existing EvaluatorResult.error field, or introduce a more explicit failure representation?
    • reuse error plus new engine semantics
    • add a structured failure status / type
    • use a typed exception contract from evaluators
  • How should composites treat evaluator failures with policy?
    • what should not(...) do?
    • what should and(...) / or(...) do?
    • should failure short-circuit, or collapse immediately to allow/deny?
  • How should top-level engine results expose fallback-driven failures?
    • should they appear in errors
    • in matches
    • in a new category
    • how should confidence be computed?
  • What should the timeout contract be?
    • should evaluator-local/runtime timeouts map into engine-owned error policy first?
    • should the engine keep a larger hard timeout outside that as a kill switch?
    • should evaluator timeout config and engine timeout config be separated?
  • What is the migration path?
    • deprecate evaluator-local on_error
    • support both temporarily
    • how to preserve backward compatibility for existing contrib evaluators and users
  • Do we want all evaluators to support engine-owned fail-open/fail-closed behavior, or only evaluators that opt in?

Additional context

  • PR that surfaced the issue: feat(evaluators): add yelp.detect_secrets contrib evaluator #196
  • Relevant engine code:
    • /Users/levneiman/code/agent-control-4/engine/src/agent_control_engine/core.py
  • Relevant shared model code:
    • /Users/levneiman/code/agent-control-4/models/src/agent_control_models/controls.py
  • Relevant evaluator examples:
    • /Users/levneiman/code/agent-control-4/evaluators/contrib/detect_secrets/src/agent_control_evaluator_detect_secrets/detect_secrets/evaluator.py
    • /Users/levneiman/code/agent-control-4/evaluators/contrib/cisco/src/agent_control_evaluator_cisco/ai_defense/evaluator.py
    • /Users/levneiman/code/agent-control-4/evaluators/contrib/galileo/src/agent_control_evaluator_galileo/luna2/evaluator.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions