You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, several evaluators expose evaluator-local on_error / fallback behavior, including:
cisco.ai_defense
galileo.luna2
yelp.detect_secrets
That forces evaluators to encode failure policy as ordinary boolean results (matched=True/False) or inconsistent uses of result.error.
The engine then loses the distinction between:
"the evaluator produced a normal boolean result"
"the evaluator failed, and policy says treat that failure as allow or deny"
This becomes especially problematic in composite condition trees and around timeout handling.
Current behavior
Evaluator-local on_error is part of evaluator-specific config today.
Evaluators currently handle failures inconsistently:
some return matched=True/False with error=None
some set result.error only on fail-open paths
some rely on generic engine error handling
In /engine/src/agent_control_engine/core.py, composite condition evaluation treats result.error as the only first-class failure signal.
As a result, evaluator-local fallback behavior gets collapsed into ordinary booleans inside not(...), and(...), and or(...).
Separately, the engine wraps evaluate() in asyncio.wait_for(...), so engine-level timeout handling can race with evaluator-local timeout / fallback behavior.
Expected behavior
Evaluators should report evaluation failure in a standard way.
The engine should own the policy for what to do with that failure.
That policy should be applied consistently across:
leaf evaluation
composite conditions
engine / SDK error reporting
confidence calculation
timeout handling
The platform should have one clear contract for fail-open / fail-closed evaluator failures instead of each evaluator inventing its own encoding.
Reproduction (if bug)
Configure an evaluator with evaluator-local on_error="allow" or on_error="deny".
Use it inside a composite condition such as not(...), and(...), or or(...).
Trigger an evaluator failure (runtime error or timeout).
Observe that the engine treats the result as either:
an ordinary boolean, or
a generic hard error,
rather than a first-class "evaluation failed, apply fallback policy" outcome.
Proposed solution (optional)
Move fallback policy out of evaluator-specific config and into the engine / shared evaluator spec.
A possible direction:
evaluators report failure in a standard way
engine applies a shared error_policy / on_error such as allow or deny
composite conditions operate on a first-class failure-with-policy state, not fake booleans
This would make evaluator behavior simpler and make the fallback contract consistent across evaluators.
Open questions
Where should engine-owned error policy live?
on EvaluatorSpec
on the condition leaf
somewhere else in control definition structure
Should we reuse the existing EvaluatorResult.error field, or introduce a more explicit failure representation?
reuse error plus new engine semantics
add a structured failure status / type
use a typed exception contract from evaluators
How should composites treat evaluator failures with policy?
what should not(...) do?
what should and(...) / or(...) do?
should failure short-circuit, or collapse immediately to allow/deny?
How should top-level engine results expose fallback-driven failures?
should they appear in errors
in matches
in a new category
how should confidence be computed?
What should the timeout contract be?
should evaluator-local/runtime timeouts map into engine-owned error policy first?
should the engine keep a larger hard timeout outside that as a kill switch?
should evaluator timeout config and engine timeout config be separated?
What is the migration path?
deprecate evaluator-local on_error
support both temporarily
how to preserve backward compatibility for existing contrib evaluators and users
Do we want all evaluators to support engine-owned fail-open/fail-closed behavior, or only evaluators that opt in?
Summary
Motivation
detect-secrets.on_error/ fallback behavior, including:cisco.ai_defensegalileo.luna2yelp.detect_secretsmatched=True/False) or inconsistent uses ofresult.error.Current behavior
on_erroris part of evaluator-specific config today.matched=True/Falsewitherror=Noneresult.erroronly on fail-open paths/engine/src/agent_control_engine/core.py, composite condition evaluation treatsresult.erroras the only first-class failure signal.not(...),and(...), andor(...).evaluate()inasyncio.wait_for(...), so engine-level timeout handling can race with evaluator-local timeout / fallback behavior.Expected behavior
Reproduction (if bug)
on_error="allow"oron_error="deny".not(...),and(...), oror(...).rather than a first-class "evaluation failed, apply fallback policy" outcome.
Proposed solution (optional)
error_policy/on_errorsuch asallowordenyOpen questions
EvaluatorSpecEvaluatorResult.errorfield, or introduce a more explicit failure representation?errorplus new engine semanticsnot(...)do?and(...)/or(...)do?errorsmatcheson_errorAdditional context
/Users/levneiman/code/agent-control-4/engine/src/agent_control_engine/core.py/Users/levneiman/code/agent-control-4/models/src/agent_control_models/controls.py/Users/levneiman/code/agent-control-4/evaluators/contrib/detect_secrets/src/agent_control_evaluator_detect_secrets/detect_secrets/evaluator.py/Users/levneiman/code/agent-control-4/evaluators/contrib/cisco/src/agent_control_evaluator_cisco/ai_defense/evaluator.py/Users/levneiman/code/agent-control-4/evaluators/contrib/galileo/src/agent_control_evaluator_galileo/luna2/evaluator.py