Skip to content

Commit 2dedb62

Browse files
committed
feat: felt OCR — three approaches to character recognition by shape qualia
Three recognition methods compared on synthetic glyphs: 1. Base17/JL (34 bytes/glyph): golden-step projection to 17D, L1 codebook → B-n=38939 (closest: share vertical+bump shape), A-B=43001 2. Polar quantization (8 bytes/glyph): 16 angles × 4 radii, rotation-invariant → B-n=10 (lowest!), Q-I=10, m-n=11, m-z=11 3. BGZ17 palette (1 byte/glyph): 256×256 distance table, O(1) lookup → B-n=87 (lowest), A-B=96, O-z=104 All three agree: B feels like n (vertical stroke + bump). The system discovers character relationships without being told. Plus: - Euler-γ fast skew: γ/(γ+1)≈0.366 signal floor, skip search for straight pages - Indent-based paragraph detection: first-pixel margin analysis - Synthetic glyph renderer for codebook bootstrapping - CharCodebook: 256 entries, recognize() returns (char, distance, confidence) For production: use ocrs+rten (AdaWorldAPI/ocrs + AdaWorldAPI/rten). This module is the felt-distance fast path: no neural net, pure lookup. 10 tests passing. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
1 parent bb1f9b8 commit 2dedb62

2 files changed

Lines changed: 531 additions & 0 deletions

File tree

src/hpc/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,7 @@ pub mod jitson;
225225
#[allow(missing_docs)]
226226
pub mod jitson_cranelift;
227227
pub mod ocr_simd;
228+
pub mod ocr_felt;
228229

229230
#[cfg(test)]
230231
mod e2e_tests {

0 commit comments

Comments
 (0)