Skip to content

feat: OCR text anchoring (Tier 1.5a) for grounding cascade#258

Closed
abrichr wants to merge 1 commit intomainfrom
feat/grounding-text-anchoring
Closed

feat: OCR text anchoring (Tier 1.5a) for grounding cascade#258
abrichr wants to merge 1 commit intomainfrom
feat/grounding-text-anchoring

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 31, 2026

Add run_ocr and ground_by_text to grounding.py. Integrate Tier 1.5a into DemoExecutor. 19 new tests, all passing. See commit message for details.

Add Phase 5 of the grounding cascade: OCR-based text anchoring as a
cheap grounding tier ($0, <100ms) that runs before VLM grounding.

Changes:
- grounding.py: add run_ocr() and ground_by_text() functions
  - run_ocr: optional pytesseract-based text extraction
  - ground_by_text: match target description against OCR text with
    exact (0.95), case-insensitive (0.90), substring (0.70), and
    fuzzy (0.50) scoring tiers, plus nearby-text spatial boost
- demo_executor.py: integrate Tier 1.5a before VLM grounder
  - _try_text_anchoring: attempts OCR match, returns action if
    local_score > 0.85, otherwise falls through to Tier 2
  - telemetry: track tier15a_count separately
- telemetry.py: add tier15a_count parameter
- tests/test_text_anchoring.py: 19 tests covering all match types,
  sorting, nearby-text boost, pytesseract fallback, and DemoExecutor
  integration (all use mock OCR results, no pytesseract required)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr
Copy link
Copy Markdown
Member Author

abrichr commented Mar 31, 2026

Closing — will rebase Phase 5 additions on top of merged Phase 4.

@abrichr abrichr closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant