Validation is a two-stage pipeline around every tool call:
flowchart LR
LLM[LLM tool_call] --> PRE[Pre-execution validation]
PRE -->|valid| EXEC[Handler execution]
PRE -->|invalid| REPAIR[Repair feedback to model]
EXEC --> POST[Post-execution validation]
POST --> STATE[State updates + coverage decisions]
Source:
tools/contracts.pytools/validators/pre_execution.py
Checks:
- required/unknown params
- type/range/enum rules
- custom contract validators
- targeted semantic checks (action-dependent logic)
Current semantic examples:
home_assistant:action='call_tool'requirestool_name.lookup_contact: action-specificquery/contact_idrequirements.lookup_contact_places: requires one ofcontact_id,contact_query, orgroup_query.lookup_place_contacts: requires eitherplace_idorplace_query.
Repair feedback now includes:
- contract intent hints
- valid parameter guidance
- targeted semantic repair suggestions
Important runtime rule:
- Pre-validation feedback must remain visible to the model on the next turn. If a tool result is compacted for prompt-size reasons, do not compact validation/error payloads into empty success-shaped results. Preserve fields like
valid=false,error, andsuggestionsverbatim so the model repairs the actual bad argument instead of retrying blindly. - For action-scoped tools, normalize away irrelevant parameters before execution when safe. Example:
get_events(action=by_ids)should not carry alimitvalue, becauselimitonly applies toby_time_spanretrieval.
Source: tools/validators/post_execution.py
Checks:
- explicit failure/error detection
- empty/no-signal result detection
- extracted facts for state
- goal coverage status:
SATISFIEDNEEDS_MORE_TOOLSNEED_USER_INPUTFAILED
Clarification-aware behavior:
- contact ambiguity and
need_user_inputenvelopes are treated asNEED_USER_INPUT, not generic retries. - Validator prompts should see high-fidelity inspected evidence whenever possible. When a result is too large, use field-aware budget compaction rather than blind string truncation, and preserve the currently inspected entity's key text before dropping broad candidate context.
- Goal completion checks must stay query-aligned. For evolving status questions, prefer the newest relevant event candidate over a higher-scoring but unrelated document result.
- Forced follow-up actions should include the exact candidate id and required tool arguments when available, so the model repairs/continues from controller-selected evidence instead of reconstructing ids from truncated context.
Tool contracts in tools/registry.py now carry stronger usage guidance:
- when to use / when not to use
- parameter intent and anti-pattern hints
This reduces dependence on one giant global prompt.
Runtime LLM calls that expect machine-readable JSON should enforce the shape through
response_format built by llm_helpers.build_json_schema_response_format(). Shared response
schemas live in llm_json_schemas.py so evals and production flows can use the same contracts.
Prompts may still describe task behavior and field semantics, but they should not duplicate the full output JSON object as a schema copy. This keeps schema enforcement in the transport layer and prevents eval-only structured output behavior from drifting away from normal runtime flows.
Validation outcomes feed no-progress and escalation logic:
- repeated invalid/empty paths are detected by limit checker and state
- restricted tool visibility can escalate to full tools when no-progress is reached
- when user phrasing explicitly requests full coverage (for example "all/everyone/entire"), query handlers may treat result limits as unbounded to avoid partial truncation
Recommended minimum tests when changing validation:
- contract schema tests (
tests/tools/test_contracts.py) - semantic pre-validation tests (
tests/tools/test_validators.py) - handler signature compatibility (
tests/tools/test_handlers/test_handler_signatures.py) - integration flows involving clarification and retries (
tests/integration/test_full_flow.py)