Skip to content

Commit 1fc8b98

Browse files
Bobholamovicclaude
andauthored
[Feat] PaddleOCR.js (#17861)
* chore: import paddleocr-js subproject skeleton Made-with: Cursor * refactor: rename internal paddleocr-js workspaces Made-with: Cursor * fix: remove stale paddleocr-js workspace lock entries Made-with: Cursor * docs: rewrite paddleocr-js subproject docs Made-with: Cursor * test: fix ocr shared lint regression Made-with: Cursor * chore: add low-coupling paddleocr-js host integration Made-with: Cursor * Update docs * Add langchain-paddleocr to skip gpu condition * Skip subprojects * Update lock * docs(paddleocr-js): add TypeScript migration design spec Made-with: Cursor * docs(paddleocr-js): add TypeScript migration implementation plan Made-with: Cursor * build(paddleocr-js): add TypeScript toolchain and build infrastructure Set up the TypeScript migration foundation: - Add TypeScript, vite-plugin-dts, and type definitions as devDependencies - Create tsconfig.json files (core, test, root project references) - Configure Vite library build with ES/CJS/UMD outputs and .d.ts generation - Update package.json entry points from src/ to dist/ with proper exports map - Replace plain JS ESLint config with typescript-eslint strict type-checked rules - Update Vitest coverage to target .ts files - Add typecheck script, engines, keywords across workspace packages Made-with: Cursor * fix(paddleocr-js): exclude config files from strict TS lint, reorder exports types-first Made-with: Cursor * refactor(paddleocr-js): add third-party type declarations for clipper-lib and opencv Minimal .d.ts files covering only the APIs actually used by the project. Made-with: Cursor * refactor(paddleocr-js): convert Layer 0-1 modules to TypeScript Converts utils/common, worker/protocol, pipelines/ocr/default-config, pipelines/ocr/runtime-params, runtime/opencv, runtime/ort, runtime/index, and models/common from .js to .ts with full type annotations. Made-with: Cursor * refactor(paddleocr-js): convert Layer 2 modules to TypeScript Converts models/det, models/rec, models/index, resources/tar, resources/cache, resources/registry, resources/standard-model, resources/index, platform/browser, and platform/worker from .js to .ts with interfaces for all model configs, results, and platform types. Made-with: Cursor * refactor(paddleocr-js): convert Layer 3-4 modules to TypeScript Converts worker/client, worker/entry, pipelines/ocr/config, pipelines/ocr/shared, pipelines/ocr/core, pipelines/ocr/worker-backed, and pipelines/ocr/worker-entry from .js to .ts with full type annotations for pipeline options, results, and initialization state. Made-with: Cursor * refactor(paddleocr-js): convert Layer 5 modules to TypeScript and add types/index.ts Converts pipelines/ocr/index, pipelines/index, and src/index from .js to .ts. Creates src/types/index.ts re-exporting all public-facing types. Fixes all TypeScript errors (strict mode + verbatimModuleSyntax clean). Made-with: Cursor * refactor(paddleocr-js): convert test files to TypeScript - Renamed all 29 test files from .js to .ts - Added type annotations to test helpers and mocks - Fixed barrel import in core.ts to maintain mock interceptability - Restored createWorker: null in resolveWorkerOptions for API contract - Updated public-api test to import from source directly All 154 tests pass. Made-with: Cursor * fix(paddleocr-js): fix Vite build worker format and demo alias - Add worker.format: 'es' to core vite config (fixes UMD build conflict) - Add resolve alias in demo vite config to resolve paddleocr-js to source Made-with: Cursor * fix(paddleocr-js): resolve all ESLint strict type-checked errors - Remove unnecessary optional chains, type assertions, and type conversions - Remove unused type imports - Fix floating promises, non-Error throws, and catch variable types - Widen SourceToMatFn to accept sync or async returns - Update tests to match sync dispose() and sourcePayloadToMat() 96 lint errors -> 0. All 154 tests pass. Made-with: Cursor * Update docs * Remove AI docs * Update node version bound * Polish typing * Fix bugs * Fix doc * Rename paddleocr-js to @paddleocr/paddleocr-js * FIx CI * Fix CI bug * Fix git workflow * Add viz utility design spec for paddleocr-js Design document for an optional visualization module exported via @paddleocr/paddleocr-js/viz subpath. Renders side-by-side composite images (source + detection boxes | white panel + recognized text) with custom font support and Blob export for browser download. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(viz): add implementation plan for viz utility 11-task TDD plan covering types, color generation, canvas factory, font management, box drawing, text drawing, side-by-side assembly, OcrVisualizer class, subpath entry, build config, and final check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(viz): add foundational types for viz utility Define RgbColor, FontConfig, BoxStyleOptions, and OcrVisualizerOptions interfaces that will be shared across all viz module files. Include type-level test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(viz): add deterministic color generation via LCG Extract the Linear Congruential Generator color function from the demo app into the core viz utility so all consumers produce visually consistent box colors for the same detection index. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add canvas factory abstraction for viz module Provides createCanvas, getContext2D, and canvasToBlob that abstract over OffscreenCanvas vs HTMLCanvasElement differences, enabling the viz module to work in both main-thread and web worker contexts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add font management utilities for viz (Task 4) Implement loadFontFace/removeFontFace wrappers around the browser FontFace API, with full test coverage using jsdom mocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add draw-boxes panel for viz module (Task 5) Draws the left panel of the side-by-side OCR visualization: source image overlaid with colored detection box polygons. Uses save/restore for clean state management and supports custom color functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(viz): add right-panel text drawing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(viz): add side-by-side composite assembly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(viz): add OcrVisualizer class and renderOcrToBlob Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(viz): add subpath entry point and public exports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * build(viz): add viz subpath entry to Vite build and package.json exports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore(viz): fix lint and formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(viz): update demo app and docs for viz module - Replace demo's local deterministicColor with viz module import - Add "Download Result" button to demo using OcrVisualizer.toBlob() - Document viz subpath in README.md and README_cn.md - Add viz to Package Layout and API sections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: improve viz module, core SDK internals, and test coverage - Refine draw-boxes rendering and font loading in viz module - Improve type exports and internal module organization across models, pipelines, resources, runtime, platform, and worker layers - Update test suite for cache, models, pipeline, runtime, worker-backed, and worker-client modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: fix documentation accuracy across 6 files and add changeset - Move visualization section before API reference in SDK READMEs (EN/CN) - Add deterministicColor usage description to viz docs - Add src/viz, src/types, src/utils to package layout sections - Fix ESLint rule level: test files use recommendedTypeChecked - Remove non-existent index.umd.js; add viz.mjs/viz.cjs to build outputs - Correct worker WASM inflation size from ~78 MB to ~50 MB - Add changeset for 0.2.0 minor release Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: release @paddleocr/paddleocr-js v0.2.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add paths mapping to tsconfig.eslint.json and demo tsconfig Add @paddleocr/paddleocr-js and @paddleocr/paddleocr-js/viz paths to tsconfig.eslint.json so IDE ESLint can resolve SDK types without depending on workspace symlinks. Also add the viz subpath mapping to apps/demo/tsconfig.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add viz subpath alias to demo Vite dev config Vite resolve.alias needs an explicit entry for the viz subpath — tsconfig paths only affect TypeScript, not Vite's module resolution. The more specific path is listed first to avoid prefix matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(demo): replace canvas rendering with viz module side-by-side output Use OcrVisualizer.toBlob() to render detection boxes and recognized text as a composite image displayed via <img>, replacing the manual canvas drawing and removing the Download Result button. Configure PingFang SC font from CDN for proper CJK text rendering in visualizations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix bugs * Remove superpowers docs * Fix bugs * Fix bug * Refactor * Fix bugs * refactor: move cropByPoly from models/det to pipelines/ocr/crop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: unify DetModel interface to predict(cv, mats, overrides) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: unify RecModel interface to predict(cv, mats, overrides) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: restructure getOcrRuntimeParams return into { det, rec, pipeline } Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: simplify OcrPipelineRunner.predict to pure orchestration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Refactor * Fix code style * Fix and update * Fix and update * Bump to 0.3.0 * Remove v6 * Bump to v0.3.1 * Fix default paramter * variant -> preset * Add to mkdocs * Update docs and add license headers * Remove unused --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 26d00a7 commit 1fc8b98

110 files changed

Lines changed: 17335 additions & 15 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test_gpu.yml

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,23 +22,42 @@ jobs:
2222
detect-changes:
2323
runs-on: ubuntu-latest
2424
outputs:
25-
docs_only: ${{ steps.filter.outputs.docs_only }}
25+
skip_gpu: ${{ steps.compute_skip.outputs.skip_gpu }}
2626
steps:
2727
- uses: actions/checkout@v6
28-
- id: filter
29-
uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3
30-
with:
31-
filters: |
32-
docs_only:
33-
- '**.md'
34-
- '**.txt'
35-
- '**.yml'
36-
- '**.yaml'
28+
- id: compute_skip
29+
name: Compute skip_gpu (PR-only); push/workflow_dispatch always run GPU
30+
shell: bash
31+
run: |
32+
if [ "${{ github.event_name }}" != "pull_request" ]; then
33+
echo "skip_gpu=false" >> "$GITHUB_OUTPUT"
34+
exit 0
35+
fi
36+
37+
git fetch origin "${{ github.base_ref }}" --depth=1
38+
changed_files="$(git diff --name-only "origin/${{ github.base_ref }}"..HEAD)"
39+
40+
if [ -z "$changed_files" ]; then
41+
echo "skip_gpu=false" >> "$GITHUB_OUTPUT"
42+
exit 0
43+
fi
44+
45+
skip_gpu=true
46+
while IFS= read -r file; do
47+
[ -z "$file" ] && continue
48+
if [[ "$file" == paddleocr-js/* ]] || [[ "$file" == langchain-paddleocr/* ]] || [[ "$file" == skills/* ]] || [[ "$file" == mcp_server/* ]] || [[ "$file" == deploy/* ]] || [[ "$file" == *.md ]] || [[ "$file" == *.txt ]] || [[ "$file" == *.yml ]] || [[ "$file" == *.yaml ]]; then
49+
continue
50+
fi
51+
skip_gpu=false
52+
break
53+
done <<< "$changed_files"
54+
55+
echo "skip_gpu=$skip_gpu" >> "$GITHUB_OUTPUT"
3756
3857
test-pr-gpu:
3958
runs-on: [self-hosted, GPU-2Card-OCR]
4059
needs: detect-changes
41-
if: needs.detect-changes.outputs.docs_only != 'true'
60+
if: needs.detect-changes.outputs.skip_gpu != 'true'
4261
steps:
4362
- name: run test
4463
env:

.github/workflows/tests.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ jobs:
2727
- '**.txt'
2828
- '**.yml'
2929
- '**.yaml'
30+
- 'paddleocr-js/**'
31+
- 'langchain-paddleocr/**'
32+
- 'skills/**'
33+
- 'mcp_server/**'
34+
- 'deploy/**'
3035
3136
test-pr:
3237
runs-on: ubuntu-latest

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
exclude: ^(langchain-paddleocr/)
1+
exclude: ^(langchain-paddleocr/|paddleocr-js/)
22
repos:
33
- repo: https://github.com/pre-commit/pre-commit-hooks
44
rev: v5.0.0
Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
---
2+
comments: true
3+
---
4+
5+
# PaddleOCR.js (browser deployment)
6+
7+
PaddleOCR provides **PaddleOCR.js**, a browser OCR SDK for running the PP-OCR pipeline in the browser. You can embed text detection and recognition in web apps and run inference on the client.
8+
9+
The npm package is **`@paddleocr/paddleocr-js`**. Source and demo live under [`paddleocr-js`](https://github.com/PaddlePaddle/PaddleOCR/tree/main/paddleocr-js) on GitHub.
10+
11+
## Install
12+
13+
```bash
14+
npm install @paddleocr/paddleocr-js
15+
```
16+
17+
## Quick start
18+
19+
```js
20+
import { PaddleOCR } from "@paddleocr/paddleocr-js";
21+
22+
const ocr = await PaddleOCR.create({
23+
lang: "ch",
24+
ocrVersion: "PP-OCRv5",
25+
ortOptions: {
26+
backend: "auto"
27+
}
28+
});
29+
30+
const [result] = await ocr.predict(fileOrBlob);
31+
console.log(result.items);
32+
```
33+
34+
`predict` resolves to an **array** of `OcrResult` (one per input image). A single `Blob` / `File` still produces a one-element array—use destructuring or `results[0]`.
35+
36+
## Construction options
37+
38+
Two styles: **direct parameters** to `PaddleOCR.create({ ... })`, or a **`pipelineConfig`** object.
39+
40+
### 1. Direct parameters
41+
42+
With direct parameters, you can specify models and set batch sizes, ORT options, and other runtime settings.
43+
44+
**Model selection — `lang` + `ocrVersion`:**
45+
46+
```js
47+
await PaddleOCR.create({
48+
lang: "ch",
49+
ocrVersion: "PP-OCRv5"
50+
});
51+
```
52+
53+
**Model selection — built-in model names:**
54+
55+
```js
56+
await PaddleOCR.create({
57+
textDetectionModelName: "PP-OCRv5_mobile_det",
58+
textRecognitionModelName: "PP-OCRv5_mobile_rec"
59+
});
60+
```
61+
62+
**Custom models** — provide a name and asset URL for each of detection and recognition:
63+
64+
```js
65+
await PaddleOCR.create({
66+
textDetectionModelName: "my_det_model",
67+
textDetectionModelAsset: {
68+
url: "https://example.com/models/my_det_model.tar"
69+
},
70+
textRecognitionModelName: "my_rec_model",
71+
textRecognitionModelAsset: {
72+
url: "https://example.com/models/my_rec_model.tar"
73+
}
74+
});
75+
```
76+
77+
**Batch sizes, ORT options, and other runtime settings:**
78+
79+
```js
80+
await PaddleOCR.create({
81+
lang: "ch",
82+
ocrVersion: "PP-OCRv5",
83+
textDetectionBatchSize: 2,
84+
textRecognitionBatchSize: 8,
85+
ortOptions: {
86+
backend: "wasm",
87+
wasmPaths: "/assets/"
88+
}
89+
});
90+
```
91+
92+
#### Custom model archive format and validation
93+
94+
The SDK downloads `textDetectionModelAsset.url` / `textRecognitionModelAsset.url` over HTTP(S) and parses the body as a **plain ustar tar (uncompressed)** archive. Ensure that:
95+
96+
| Requirement | Details |
97+
|-------------|---------|
98+
| Archive format | The response body must be an **uncompressed `.tar`**. The implementation does **not** gunzip **`.tar.gz`**; if you pass a gzip-compressed tarball, parsing will typically fail and an error will be thrown. |
99+
| Required files | The tar must contain **`inference.onnx`** and **`inference.yml`** (they may live in a subdirectory; entries are matched by basename). |
100+
| `model_name` | **`inference.yml`** must define a **`model_name`** that matches the `textDetectionModelName` / `textRecognitionModelName` you pass to `create`. This is checked after load during initialization. |
101+
102+
If you need to convert Paddle models into the ONNX model files used here, see [Obtaining ONNX models](obtaining_onnx_models.en.md). The standard model files produced by that workflow can then be packaged as a `.tar` following the rules above for use with PaddleOCR.js.
103+
104+
If the archive or model files do not meet these rules, initialization typically fails with an **`Error`** that describes the problem, for example: non-2xx download, missing `inference.onnx` / `inference.yml` in the tar, empty resources, missing or mismatched `model_name`, incomplete model config, or ONNX load failure. There is no silent fallback.
105+
106+
All selected OCR models must satisfy the `model_name` rules above.
107+
108+
### 2. Pipeline config
109+
110+
```js
111+
import { PaddleOCR } from "@paddleocr/paddleocr-js";
112+
113+
const pipelineConfig = `
114+
pipeline_name: OCR
115+
SubModules:
116+
TextDetection:
117+
model_name: PP-OCRv5_mobile_det
118+
batch_size: 2
119+
TextRecognition:
120+
model_name: PP-OCRv5_mobile_rec
121+
batch_size: 6
122+
`;
123+
124+
const ocr = await PaddleOCR.create({ pipelineConfig });
125+
```
126+
127+
`pipelineConfig` can be YAML text or a parsed object. In the browser, submodule `model_dir` must be **`null` or an asset object** (e.g. `{ url: "..." }`), not a local filesystem path string. If you want to start from a pipeline configuration exported by PaddleOCR / PaddleX, see the "Exporting Pipeline Configuration Files" section in [PaddleOCR and PaddleX](../paddleocr_and_paddlex.en.md); the exported YAML can be used as the basis for `pipelineConfig`, and any `model_dir` entries should then be adapted to browser-side asset objects.
128+
129+
If both direct parameters and `pipelineConfig` are provided, **direct parameters take precedence**.
130+
131+
## Prediction
132+
133+
### Params
134+
135+
`ocr.predict(image | images[], params?)` accepts both camelCase and PaddleOCR-style snake_case:
136+
137+
- `textDetLimitSideLen` or `text_det_limit_side_len`
138+
- `textDetLimitType` or `text_det_limit_type`
139+
- `textDetMaxSideLimit` or `text_det_max_side_limit`
140+
- `textDetThresh` or `text_det_thresh`
141+
- `textDetBoxThresh` or `text_det_box_thresh`
142+
- `textDetUnclipRatio` or `text_det_unclip_ratio`
143+
- `textRecScoreThresh` or `text_rec_score_thresh`
144+
145+
Supported `image` inputs include `Blob`, `ImageBitmap`, `ImageData`, `HTMLCanvasElement`, `HTMLImageElement`, and `cv.Mat`. Pass an array to run on multiple images in one call.
146+
147+
In **worker mode**, `cv.Mat` is not transferable and is not supported as input.
148+
149+
### Return value
150+
151+
Resolves to `Promise<OcrResult[]>`. Each `OcrResult` contains:
152+
153+
- `image`: `{ width, height }` for that source
154+
- `items`: recognized lines (`poly`, `text`, `score`)
155+
- `metrics`: `detMs`, `recMs`, `totalMs`, `detectedBoxes`, `recognizedCount` — box and line counts are per image; `detMs`, `recMs`, and `totalMs` cover the **entire** `predict()` call (identical on every element when you pass multiple images)
156+
- `runtime`: requested backend and provider metadata
157+
158+
## Worker mode
159+
160+
You can run the pipeline inside a dedicated Worker while keeping the same high-level API:
161+
162+
```js
163+
import { PaddleOCR } from "@paddleocr/paddleocr-js";
164+
165+
const ocr = await PaddleOCR.create({
166+
lang: "ch",
167+
ocrVersion: "PP-OCRv5",
168+
worker: true,
169+
ortOptions: {
170+
backend: "wasm",
171+
wasmPaths: "https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/",
172+
numThreads: 2,
173+
simd: true
174+
}
175+
});
176+
```
177+
178+
Summary:
179+
180+
- Worker mode uses the package worker entry, not ONNX Runtime Web `env.wasm.proxy`
181+
- When `worker: true`, the package forces ORT wasm proxy off to avoid nested workers
182+
- Browser inputs are normalized on the main thread, then transferred to the worker
183+
- `cv.Mat` is only supported on the main-thread pipeline path
184+
185+
186+
## Visualization
187+
188+
The optional **`@paddleocr/paddleocr-js/viz`** subpath renders OCR results to images.
189+
190+
```js
191+
import { OcrVisualizer } from "@paddleocr/paddleocr-js/viz";
192+
193+
const viz = new OcrVisualizer({
194+
font: { family: "Noto Sans SC", source: "/fonts/NotoSansSC-Regular.ttf" }
195+
});
196+
197+
const blob = await viz.toBlob(imageBitmap, result);
198+
const url = URL.createObjectURL(blob);
199+
const a = document.createElement("a");
200+
a.href = url;
201+
a.download = "ocr_result.png";
202+
a.click();
203+
URL.revokeObjectURL(url);
204+
205+
viz.dispose();
206+
```
207+
208+
`renderOcrToBlob` and `deterministicColor` are also exported. Visualization takes a **single** `OcrResult` (for one image, use the first element of the `predict` result array).
209+
210+
## API summary
211+
212+
- `PaddleOCR.create(options)`
213+
- `ocr.initialize()` / `ocr.getInitializationSummary()`
214+
- `ocr.predict(image | images[], params?)``Promise<OcrResult[]>`
215+
- `ocr.dispose()`
216+
- `parseOcrPipelineConfigText(text)` / `normalizeOcrPipelineConfig(config)`
217+
- `OcrVisualizer`, `renderOcrToBlob`, `deterministicColor` (from `@paddleocr/paddleocr-js/viz`)
218+
219+
## Host application responsibilities
220+
221+
The SDK manages OpenCV.js and ONNX Runtime internally. You still handle:
222+
223+
- **COOP/COEP** (and related headers) when enabling threaded WASM or WebGPU
224+
- **ORT environment options** (e.g. `wasmPaths`, threads, SIMD)
225+
- A bundler/runtime that can emit and load **module workers** when `worker: true`

0 commit comments

Comments
 (0)