Skip to content

feat: state narrowing and transition verification for grounding cascade#257

Merged
abrichr merged 1 commit intomainfrom
feat/grounding-state-narrowing
Mar 31, 2026
Merged

feat: state narrowing and transition verification for grounding cascade#257
abrichr merged 1 commit intomainfrom
feat/grounding-state-narrowing

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 31, 2026

Summary

  • Implements Phase 4 of the grounding cascade: state narrowing before grounding and transition verification after clicking
  • Adds check_state_preconditions() to verify window title, nearby text, and surrounding labels match expectations before grounding a click (pre-click state check)
  • Adds verify_transition() to verify disappearance_text, appearance_text, and window_title_change after clicking (post-click transition verification)
  • Integrates both functions into DemoExecutor.run() as observational warnings (non-blocking in Phase 4, blocking/recovery deferred to later phases)
  • Both functions accept an optional ocr_fn callable for OCR integration (Phase 5); gracefully skip when no OCR is available

Test plan

  • 26 new tests in tests/test_grounding.py covering:
    • check_state_preconditions: no-OCR skip, no-expectations skip, window title match/mismatch, nearby text threshold, surrounding labels threshold, case insensitivity, combined checks
    • verify_transition: no-expectations skip, no-OCR skip, appearance text found/missing, disappearance text gone/present, window title change, modal toggled skip, combined scenarios
    • GroundingTarget round-trip serialization (to_dict/from_dict, tuple conversion, defaults omission)
  • All 1542 existing tests pass (54 skipped, 17 deselected by filter)
  • uv run --no-sources pytest tests/test_grounding.py -v -- 26 passed

🤖 Generated with Claude Code

Phase 4 of the grounding cascade — detect "wrong screen" before
grounding and verify state changes after clicking.

Added to grounding.py:
- check_state_preconditions(): verifies window title, nearby text,
  and surrounding labels match expectations before grounding a click.
  Skips gracefully when no OCR function is provided (Phase 5).
- verify_transition(): checks disappearance_text, appearance_text,
  and window_title_change against post-click screenshot via OCR.
  Modal detection deferred (logged, not enforced).
- _text_present(): case-insensitive substring matching helper.

Integrated into DemoExecutor.run():
- Pre-click: calls check_state_preconditions for click/double_click
  steps with a grounding_target. Observational only (warns, proceeds).
- Post-click: calls verify_transition after action dispatch.
  Observational only (warns, proceeds).

Tests (26 new):
- 11 tests for check_state_preconditions (no-OCR, no-expectations,
  window title match/mismatch, nearby text, surrounding labels,
  case insensitivity, combined checks)
- 11 tests for verify_transition (no-expectations, no-OCR,
  appearance/disappearance, window title change, modal skip,
  combined scenarios)
- 4 tests for GroundingTarget round-trip serialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit e22b404 into main Mar 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant