feat: state narrowing and transition verification for grounding cascade#257
Merged
feat: state narrowing and transition verification for grounding cascade#257
Conversation
Phase 4 of the grounding cascade — detect "wrong screen" before grounding and verify state changes after clicking. Added to grounding.py: - check_state_preconditions(): verifies window title, nearby text, and surrounding labels match expectations before grounding a click. Skips gracefully when no OCR function is provided (Phase 5). - verify_transition(): checks disappearance_text, appearance_text, and window_title_change against post-click screenshot via OCR. Modal detection deferred (logged, not enforced). - _text_present(): case-insensitive substring matching helper. Integrated into DemoExecutor.run(): - Pre-click: calls check_state_preconditions for click/double_click steps with a grounding_target. Observational only (warns, proceeds). - Post-click: calls verify_transition after action dispatch. Observational only (warns, proceeds). Tests (26 new): - 11 tests for check_state_preconditions (no-OCR, no-expectations, window title match/mismatch, nearby text, surrounding labels, case insensitivity, combined checks) - 11 tests for verify_transition (no-expectations, no-OCR, appearance/disappearance, window title change, modal skip, combined scenarios) - 4 tests for GroundingTarget round-trip serialization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
check_state_preconditions()to verify window title, nearby text, and surrounding labels match expectations before grounding a click (pre-click state check)verify_transition()to verify disappearance_text, appearance_text, and window_title_change after clicking (post-click transition verification)DemoExecutor.run()as observational warnings (non-blocking in Phase 4, blocking/recovery deferred to later phases)ocr_fncallable for OCR integration (Phase 5); gracefully skip when no OCR is availableTest plan
tests/test_grounding.pycovering:check_state_preconditions: no-OCR skip, no-expectations skip, window title match/mismatch, nearby text threshold, surrounding labels threshold, case insensitivity, combined checksverify_transition: no-expectations skip, no-OCR skip, appearance text found/missing, disappearance text gone/present, window title change, modal toggled skip, combined scenariosGroundingTargetround-trip serialization (to_dict/from_dict, tuple conversion, defaults omission)uv run --no-sources pytest tests/test_grounding.py -v-- 26 passed🤖 Generated with Claude Code