Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
30c453e
feat: port ocr detector
JakubGonera Jun 12, 2025
70f74ff
feat: add OCR scaffolding
JakubGonera Jun 12, 2025
c5cb9c5
port Recognizer to cpp (unclean version)
mlodyjesienin Jul 15, 2025
d07fbbd
fix Detector implementation
mlodyjesienin Jul 15, 2025
c3dc7cc
Recognition Handler port to cpp (unclean version)
mlodyjesienin Jul 15, 2025
b21f4c1
port OCR module to cpp (unclean)
mlodyjesienin Jul 15, 2025
41eca7a
remove potentially dangerous files
mlodyjesienin Jul 15, 2025
d7f4f1c
feat: port ocr detector
JakubGonera Jun 12, 2025
76be20e
remove whole ios ocr directory
mlodyjesienin Jul 15, 2025
50210a0
add not working jsi bindings
mlodyjesienin Jul 15, 2025
10b88b6
fix jsi-binding
mlodyjesienin Jul 16, 2025
e228a35
feat: ported verticalOCR (#458)
NorbertKlockiewicz Jul 22, 2025
c9d2978
refactor: fix errors after rebase, remove unnecessary files
NorbertKlockiewicz Jul 22, 2025
4288350
feat: ported ocr
NorbertKlockiewicz Jul 22, 2025
a8e43af
refactor: improve readability and performance
NorbertKlockiewicz Jul 22, 2025
f766f60
formatting: format files with clang
NorbertKlockiewicz Jul 22, 2025
27da4dc
Apply suggestions from code review
mlodyjesienin Jul 28, 2025
cae91e7
fix: fix issues created by last commit :)
mlodyjesienin Jul 28, 2025
3966f7a
refactor: requested changes in cpp code refactored
mlodyjesienin Jul 31, 2025
665a80d
add final specifier to class definitions
mlodyjesienin Jul 31, 2025
06c436a
remove include Log.h file
mlodyjesienin Jul 31, 2025
967a253
requested changes added
mlodyjesienin Jul 31, 2025
8908ba2
remove include Log.h
mlodyjesienin Jul 31, 2025
96f568a
bug: fix issue where VERTICAL OCR crashed on some photos with text ne…
mlodyjesienin Aug 1, 2025
21c611d
requested changes added
mlodyjesienin Aug 1, 2025
12204d2
remove console logs in ts code
mlodyjesienin Aug 1, 2025
b35d7e3
add requested changes that i have missed lol
mlodyjesienin Aug 1, 2025
b9151c7
small requested changes
mlodyjesienin Aug 4, 2025
86f9a2f
refactor: DetectorUtils functions
mlodyjesienin Aug 4, 2025
4a7df6a
add floating point comparison functionality
mlodyjesienin Aug 4, 2025
1d302d8
requested changes added
mlodyjesienin Aug 4, 2025
c5a58cd
Apply suggestions from code review
mlodyjesienin Aug 4, 2025
fd4c186
make use of getDesiredWidth() function
mlodyjesienin Aug 4, 2025
b4b9d37
add descriptive comments, refactor VOCR, add support for all detector…
mlodyjesienin Aug 6, 2025
7fbdf83
Add requested changes + swap int to int32_t everywhere
mlodyjesienin Aug 6, 2025
62ad874
make non-static OCR module
mlodyjesienin Aug 6, 2025
b92205e
make non-static Vertical OCR module
mlodyjesienin Aug 6, 2025
8c12333
add delete method to OCRmodule and VOCRmodule
mlodyjesienin Aug 6, 2025
458e60f
docs: update docs to match the reality
mlodyjesienin Aug 6, 2025
4964653
add unload to OCR/Vertical OCR, add deletion to Modules, add on retur…
mlodyjesienin Aug 7, 2025
9bee188
Add requested changes on review
mlodyjesienin Aug 11, 2025
8e8f177
docs: update OCR and VOCR benchmarks
mlodyjesienin Aug 11, 2025
056a08d
Add requested changes. Refactor comments / namespace for types / func…
mlodyjesienin Aug 13, 2025
2f4ed99
Add changes requested in a review
mlodyjesienin Aug 13, 2025
0cb3af6
Add changes requested in review
mlodyjesienin Aug 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion apps/computer-vision/app/object_detection/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ export default function ObjectDetectionScreen() {
if (imageUri) {
try {
const output = await ssdLite.forward(imageUri);
console.log(output);
setResults(output);
} catch (e) {
console.error(e);
Expand Down
5 changes: 2 additions & 3 deletions apps/computer-vision/app/ocr/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ export default function OCRScreen() {
try {
const output = await model.forward(imageUri);
setResults(output);
console.log(output);
} catch (e) {
console.error(e);
}
Expand Down Expand Up @@ -78,8 +77,8 @@ export default function OCRScreen() {
<View style={styles.results}>
<Text style={styles.resultHeader}>Results</Text>
<ScrollView style={styles.resultsList}>
{results.map(({ text, score }) => (
<View key={text} style={styles.resultRecord}>
{results.map(({ text, score }, index) => (
<View key={index} style={styles.resultRecord}>
<Text style={styles.resultLabel}>{text}</Text>
<Text>{score.toFixed(3)}</Text>
</View>
Expand Down
5 changes: 2 additions & 3 deletions apps/computer-vision/app/ocr_vertical/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ export default function VerticalOCRScree() {
try {
const output = await model.forward(imageUri);
setResults(output);
console.log(output);
} catch (e) {
console.error(e);
}
Expand Down Expand Up @@ -80,8 +79,8 @@ export default function VerticalOCRScree() {
<View style={styles.results}>
<Text style={styles.resultHeader}>Results</Text>
<ScrollView style={styles.resultsList}>
{results.map(({ text, score }) => (
<View key={text} style={styles.resultRecord}>
{results.map(({ text, score }, index) => (
<View key={index} style={styles.resultRecord}>
<Text style={styles.resultLabel}>{text}</Text>
<Text>{score.toFixed(3)}</Text>
</View>
Expand Down
2 changes: 1 addition & 1 deletion apps/computer-vision/ios/Podfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2454,7 +2454,7 @@ SPEC CHECKSUMS:
React-logger: 8edfcedc100544791cd82692ca5a574240a16219
React-Mapbuffer: c3f4b608e4a59dd2f6a416ef4d47a14400194468
React-microtasksnativemodule: 054f34e9b82f02bd40f09cebd4083828b5b2beb6
react-native-executorch: 98a2d5c0fc2290d473db87f2d6f3bf9dc7b77ab1
react-native-executorch: d06ae11e5411f0cb798316c4e69cf7d8678da297
react-native-image-picker: 8a3f16000e794f5381a7fe47bb48fd8d06741e47
react-native-safe-area-context: 562163222d999b79a51577eda2ea8ad2c32b4d06
react-native-skia: b6cb66e99a953dae6880348c92cfb20a76d90b4f
Expand Down
29 changes: 22 additions & 7 deletions docs/docs/02-hooks/02-computer-vision/useOCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,19 +301,34 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| -------------------------------------------------------------------------------------------- | :--------------------: | :----------------: |
| Detector (CRAFT_800) + Recognizer (CRNN_512) + Recognizer (CRNN_256) + Recognizer (CRNN_128) | 2100 | 1782 |
| Detector (CRAFT_800) + Recognizer (CRNN_512) + Recognizer (CRNN_256) + Recognizer (CRNN_128) | 1600 | 1700 |

### Inference time

**Image Used for Benchmarking:**

| ![Alt text](../../../static/img/harvard.png) | ![Alt text](../../../static/img/harvard-boxes.png) |
| -------------------------------------------- | -------------------------------------------------- |
| Original Image | Image with detected Text Boxes |

:::warning warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] |
| --------------------- | :--------------------------: | :------------------------------: | :------------------------: | :-------------------------------: | :-------------------------------: |
| Detector (CRAFT_800) | 2099 | 2227 | ❌ | 2245 | 7108 |
| Recognizer (CRNN_512) | 70 | 252 | ❌ | 54 | 151 |
| Recognizer (CRNN_256) | 39 | 123 | ❌ | 24 | 78 |
| Recognizer (CRNN_128) | 17 | 83 | ❌ | 14 | 39 |
**Time measurements:**

| Metric | iPhone 14 Pro Max <br /> [ms] | iPhone 16 Pro <br /> [ms] | iPhone SE 3 | Samsung Galaxy S24 <br /> [ms] | OnePlus 12 <br /> [ms] |
| ------------------------- | ----------------------------- | ------------------------- | ----------- | ------------------------------ | ---------------------- |
| **Total Inference Time** | 4330 | 2537 | ❌ | 6648 | 5993 |
| **Detector (CRAFT_800)** | 1945 | 1809 | ❌ | 2080 | 1961 |
| **Recognizer (CRNN_512)** | | | | | |
| ├─ Average Time | 273 | 76 | ❌ | 289 | 252 |
| ├─ Total Time (3 runs) | 820 | 229 | ❌ | 867 | 756 |
| **Recognizer (CRNN_256)** | | | | | |
| ├─ Average Time | 137 | 39 | ❌ | 260 | 229 |
| ├─ Total Time (7 runs) | 958 | 271 | ❌ | 1818 | 1601 |
| **Recognizer (CRNN_128)** | | | | | |
| ├─ Average Time | 68 | 18 | ❌ | 239 | 214 |
| ├─ Total Time (7 runs) | 478 | 124 | ❌ | 1673 | 1498 |

❌ - Insufficient RAM.
31 changes: 23 additions & 8 deletions docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,20 +316,35 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| -------------------------------------------------------------------- | :--------------------: | :----------------: |
| Detector (CRAFT_1280) + Detector (CRAFT_320) + Recognizer (CRNN_512) | 2770 | 3720 |
| Detector(CRAFT_1280) + Detector(CRAFT_320) + Recognizer (CRNN_64) | 1770 | 2740 |
| Detector (CRAFT_1280) + Detector (CRAFT_320) + Recognizer (CRNN_512) | 2172 | 2214 |
| Detector(CRAFT_1280) + Detector(CRAFT_320) + Recognizer (CRNN_64) | 1774 | 1705 |

### Inference time

**Image Used for Benchmarking:**

| ![Alt text](../../../static/img/sales-vertical.jpeg) | ![Alt text](../../../static/img/sales-vertical-boxes.png) |
| ---------------------------------------------------- | --------------------------------------------------------- |
| Original Image | Image with detected Text Boxes |

:::warning warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] |
| --------------------- | :--------------------------: | :------------------------------: | :------------------------: | :-------------------------------: | :-------------------------------: |
| Detector (CRAFT_1280) | 5457 | 5833 | ❌ | 6296 | 14053 |
| Detector (CRAFT_320) | 1351 | 1460 | ❌ | 1485 | 3101 |
| Recognizer (CRNN_512) | 39 | 123 | ❌ | 24 | 78 |
| Recognizer (CRNN_64) | 10 | 33 | ❌ | 7 | 18 |
**Time measurements:**

| Metric | iPhone 14 Pro Max <br /> [ms] | iPhone 16 Pro <br /> [ms] | iPhone SE 3 | Samsung Galaxy S24 <br /> [ms] | OnePlus 12 <br /> [ms] |
| -------------------------------------------------------------------------- | ----------------------------- | ------------------------- | ----------- | ------------------------------ | ---------------------- |
| **Total Inference Time** | 9350 / 9620 | 8572 / 8621 | ❌ | 13737 / 10570 | 13436 / 9848 |
| **Detector (CRAFT_1250)** | 4895 | 4756 | ❌ | 5574 | 5016 |
| **Detector (CRAFT_320)** | | | | | |
| ├─ Average Time | 1247 | 1206 | ❌ | 1350 | 1356 |
| ├─ Total Time (3 runs) | 3741 | 3617 | ❌ | 4050 | 4069 |
| **Recognizer (CRNN_64)** <br /> (_With Flag `independentChars == true`_) | | | | | |
| ├─ Average Time | 31 | 9 | ❌ | 195 | 207 |
| ├─ Total Time (21 runs) | 649 | 191 | ❌ | 4092 | 4339 |
| **Recognizer (CRNN_512)** <br /> (_With Flag `independentChars == false`_) | | | | | |
| ├─ Average Time | 306 | 80 | ❌ | 308 | 250 |
| ├─ Total Time (3 runs) | 919 | 240 | ❌ | 925 | 751 |

❌ - Insufficient RAM.
Loading