Commit 2dedb62
committed
feat: felt OCR — three approaches to character recognition by shape qualia
Three recognition methods compared on synthetic glyphs:
1. Base17/JL (34 bytes/glyph): golden-step projection to 17D, L1 codebook
→ B-n=38939 (closest: share vertical+bump shape), A-B=43001
2. Polar quantization (8 bytes/glyph): 16 angles × 4 radii, rotation-invariant
→ B-n=10 (lowest!), Q-I=10, m-n=11, m-z=11
3. BGZ17 palette (1 byte/glyph): 256×256 distance table, O(1) lookup
→ B-n=87 (lowest), A-B=96, O-z=104
All three agree: B feels like n (vertical stroke + bump). The system
discovers character relationships without being told.
Plus:
- Euler-γ fast skew: γ/(γ+1)≈0.366 signal floor, skip search for straight pages
- Indent-based paragraph detection: first-pixel margin analysis
- Synthetic glyph renderer for codebook bootstrapping
- CharCodebook: 256 entries, recognize() returns (char, distance, confidence)
For production: use ocrs+rten (AdaWorldAPI/ocrs + AdaWorldAPI/rten).
This module is the felt-distance fast path: no neural net, pure lookup.
10 tests passing.
https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp1 parent bb1f9b8 commit 2dedb62
2 files changed
Lines changed: 531 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
| 228 | + | |
228 | 229 | | |
229 | 230 | | |
230 | 231 | | |
| |||
0 commit comments