Skip to content

feat(recognition): expose Tesseract confidence in OCR matches#1327

Merged
mikahanninen merged 2 commits intorobocorp:masterfrom
chelslava:feat/1303-ocr-confidence
May 3, 2026
Merged

feat(recognition): expose Tesseract confidence in OCR matches#1327
mikahanninen merged 2 commits intorobocorp:masterfrom
chelslava:feat/1303-ocr-confidence

Conversation

@chelslava
Copy link
Copy Markdown
Contributor

Summary

  • preserve the existing OCR text-similarity confidence value for compatibility
  • add ocr_confidence to each match using the underlying Tesseract confidence data
  • add unit coverage for confidence parsing and aggregation

Closes #1303.

Test plan

  • python -m pytest packages/recognition/tests/python/test_ocr_confidence.py -q -p no:cacheprovider --no-cov

The OCR matcher already returned text similarity confidence but threw
away Tesseract's own word confidence data. Preserve the existing
confidence semantics for compatibility, attach an averaged
ocr_confidence to each match, and lock the parsing and aggregation
behavior with focused unit tests.

Constraint: existing callers already depend on confidence meaning text similarity
Rejected: Rename confidence to text_confidence | would break current match consumers
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep confidence mapped to text similarity unless a major-version API change is planned
Tested: python -m pytest packages/recognition/tests/python/test_ocr_confidence.py -q -p no:cacheprovider --no-cov
Not-tested: End-to-end OCR against a local Tesseract binary
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 16, 2026

CLA assistant check
All committers have signed the CLA.

@mikahanninen mikahanninen merged commit 4123560 into robocorp:master May 3, 2026
8 checks passed
@chelslava chelslava deleted the feat/1303-ocr-confidence branch May 4, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RPA Desktop OCR - Use confidence from Tesseract result

3 participants