Skip to content

[Feat] Add PP-OCRv6 iOS Demo#17933

Open
Bobholamovic wants to merge 126 commits into
PaddlePaddle:mainfrom
Bobholamovic:feat/ios
Open

[Feat] Add PP-OCRv6 iOS Demo#17933
Bobholamovic wants to merge 126 commits into
PaddlePaddle:mainfrom
Bobholamovic:feat/ios

Conversation

@Bobholamovic
Copy link
Copy Markdown
Member

No description provided.

Bobholamovic and others added 30 commits March 16, 2026 19:13
- Implement ClipperOffset class with addPath/execute API
- Support JT_ROUND + ET_CLOSEDPOLYGON for DB postprocessing
- Add static offsetPolygon() convenience for DBPostProcess
- Arc tolerance calculation matches pyclipper's Clipper 6.x
- Pure Swift, no external dependencies
- Add Yams ~> 5.0 pod to Podfile for inference.yml parsing
- Create InferenceConfig.swift with typed parsing of inference.yml
- Support TransformOp enum: DetResizeForTest, NormalizeImage, ToCHWImage, RecResizeImg
- PostProcessConfig handles both det (DBPostProcess) and rec (CTCLabelDecode) configs
- Python-style scale string '1./255.' parsed via string splitting (not eval)
- Register InferenceConfig.swift in Xcode project
- Create Preprocessing.swift with DetPreprocessor and PreprocessResult
- Port DetResizeForTest: resize longest side to resize_long, ceil to 128 stride
- Port NormalizeImage: config-driven scale/mean/std normalization
- Port ToCHWImage: HWC-to-CHW layout conversion producing [1,3,H,W] tensor
- Image padding for tiny images (h+w < 64) matching Python reference
- Pure Swift using CoreGraphics + Accelerate (no OpenCV dependency)
- All transform parameters read from InferenceConfig (zero hardcoded values)
- Register Preprocessing.swift in Xcode project
- Add DBPostProcessor with full pipeline: threshold -> contours -> minAreaRect -> score -> expand -> scale
- Implement Suzuki-Abe contour finding with CHAIN_APPROX_SIMPLE compression
- Implement rotating calipers minAreaRect + convex hull (Andrew's monotone chain)
- Add scanline polygon fill for box_score_fast computation
- Integrate ClipperOffset for polygon expansion (unclip)
- All parameters configurable via DBPostProcessConfigurable protocol
- Pure Swift, no OpenCV dependency
- Add runDetection(inputData:shape:) for real inference with preprocessed data
- Returns output tensors as [String: (data: [Float], shape: [Int])] dictionary
- Includes NaN validation on output tensors
- All existing methods (loadModels, validateDetModel, validateRecModel) preserved unchanged
…line

- DetectionEngine wires DetPreprocessor -> ORTSessionManager.runDetection -> DBPostProcessor
- DetectionResult struct with boxes and per-stage timing metrics (preprocess/inference/postprocess)
- All parameters loaded from inference.yml via InferenceConfig.load()
- PostProcessConfig conforms to DBPostProcessConfigurable for type-safe init bridging
- DetectionEngineError for noOutputTensor and unexpectedOutputShape cases
- Registered in Xcode project pbxproj
- SUMMARY.md documenting DetectionEngine pipeline integration
- STATE.md updated with position, decisions, metrics
- REQUIREMENTS.md: POST-01, POST-02 marked complete
…rence method

- Extract private runInference() from runDetection to eliminate code duplication
- Add public runRecognition() method that uses recSession for recognition model inference
- Both runDetection and runRecognition guard their respective sessions and delegate to runInference
- Recognition model supports dynamic-width input tensors [1, 3, 48, W]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implement OCRResizeNormImg algorithm matching PaddleX text_recognition/processors.py
- Read imgC/imgH/imgW from inference.yml RecResizeImg.image_shape (config-driven, not hardcoded)
- Aspect-ratio-aware resize with ceil() width computation matching Python math.ceil()
- Recognition normalization: pixel/127.5 - 1.0 mapping [0,255] to [-1,1] (not ImageNet mean/std)
- HWC-to-CHW transpose and right-pad with zeros to target width
- Bilinear interpolation via CGContext (pure Swift, no OpenCV)
- Register RecPreprocessor.swift in Xcode project (PBXBuildFile, PBXFileReference, PBXGroup, Sources)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create 03-01-SUMMARY.md documenting plan execution
- Update STATE.md: advance to Phase 3 Plan 1 complete, add decisions
- Update ROADMAP.md: Phase 3 progress 1/2
- Update REQUIREMENTS.md: mark PREP-04 and PREP-05 as complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…racter dictionary

- CTCDecoder struct ported from ppocr/postprocess/rec_postprocess.py CTCLabelDecode
- Reads character_dict from inference.yml PostProcess config
- Prepends blank token at index 0 (CTC convention)
- Decoding: argmax + consecutive duplicate removal + blank filtering + char mapping
- Confidence: mean probability of selected timesteps
- Registered in Xcode project (Engine group)
…pipeline

- RecognitionEngine class mirrors DetectionEngine pattern
- Composes RecPreprocessor + ORTSessionManager.runRecognition + CTCDecoder
- Returns RecognitionEngineResult with text, confidence, and per-stage timing
- Config-driven: reads inference.yml for preprocessing dims and character dictionary
- Registered in Xcode project (Engine group)
@timminator
Copy link
Copy Markdown
Contributor

Hi! Sorry for bothering. I saw the upcoming PRs regarding a PPOCRv6 model but there were no changes made to the general OCR pipeline yet. Will this also get an update or has PPOCRv6 a different use case distinct from the PPOCRv5 model?
Thank you for your time!

@Bobholamovic
Copy link
Copy Markdown
Member Author

Hi! Sorry for bothering. I saw the upcoming PRs regarding a PPOCRv6 model but there were no changes made to the general OCR pipeline yet. Will this also get an update or has PPOCRv6 a different use case distinct from the PPOCRv5 model? Thank you for your time!

PP-OCRv6 is still under development. It is expected to be used in the same way as PP-OCRv5, namely through the general OCR pipeline.

Bobholamovic and others added 28 commits April 20, 2026 10:04
Per review feedback from changdazhou on PR PaddlePaddle#17820 (L26), update the
CUDA 12.6 Docker GPU line to require driver >= 550.54.14, matching
the pip section already at L61 (both ZH and EN).

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Per follow-up review on PR PaddlePaddle#17820: from a completeness standpoint,
None belongs in the "Supports ..." enumeration rather than only in
the trailing clarification sentence. Move None into the list as the
default value and tighten the follow-on sentence accordingly.

- EN: "Supports None (the default), paddle, paddle_static,
  paddle_dynamic, and transformers. When left as None, PaddleOCR
  preserves the behavior of earlier versions..."
- ZH: "支持 None(默认值)、paddle、paddle_static、paddle_dynamic、
  transformers。保持为默认值 None 时..."

Applied to all three supported-value variants across the module_usage
and pipeline_usage pages — same 48 files / 66 rows as the previous
clarification commit.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Resolves conflict in docs/version3.x/pipeline_usage/PaddleOCR-VL.md:
- Accept upstream refactor of CLI and Python instantiation parameter
  tables from HTML to markdown pipe-table format.
- Preserve the {#流程导览} anchor on the "流程导览" heading (needed
  for mkdocs bilingual link check).
- Re-apply the engine-row clarification (None as default + legacy
  behavior note) to the two engine rows in the new pipe-tables.

Incoming commits:
- a874bcb Optimize docs
- 85275d4 Update docs

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
docs: release-review fixes for 3.5 docs
…ary (PaddlePaddle#17954) (PaddlePaddle#17961)

* ci,docs: align PaddleX install branch and document py3.8 extras boundary

- CI: derive the PaddleX install branch from the paddlex constraint in
  pyproject.toml (release/X.Y) so PR/GPU tests stay in sync as the
  paddleocr series advances; apply to both tests.yml and test_gpu.yml
- CI: install only paddleocr[doc2md] on py3.8 since several paddlex
  transitive deps require py3.9+; add a py38_incompatible pytest marker
  and gate affected tests behind it
- CI: pin paddlepaddle==3.0.0 on py3.8 / 3.1.0 on py3.9+ (match GPU CI)
- CI: standardize workflow filenames (.yaml -> .yml, dashes -> underscores)
- deps: pin lmdb<1.5 on py3.8 (newer lmdb wheels reference Py_SET_REFCNT,
  a py3.9 stdlib C API)
- docs: note Python 3.8+ for base paddleocr/doc2md; py3.9+ for
  doc-parser/ie/trans/all extras (safetensors>=0.7 dropped py3.8)
- docs: update PaddleOCR-VL manual-install Python range to 3.9-3.13
  across all hardware variants (VL pipelines use doc-parser)
- tests: tag tests that require py3.9+ extras/deps with py38_incompatible



* ci: mark /workspace as git safe.directory in GPU runner

setuptools_scm runs git to derive the package version during
pip install -e .; inside the GPU CI container /workspace is owned
by the host user, which trips git's dubious-ownership check and
aborts the paddleocr install.



---------


(cherry picked from commit 09e8700)

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants