[Feat] Add PP-OCRv6 iOS Demo#17933
Open
Bobholamovic wants to merge 126 commits into
Open
Conversation
- Implement ClipperOffset class with addPath/execute API - Support JT_ROUND + ET_CLOSEDPOLYGON for DB postprocessing - Add static offsetPolygon() convenience for DBPostProcess - Arc tolerance calculation matches pyclipper's Clipper 6.x - Pure Swift, no external dependencies
- Add Yams ~> 5.0 pod to Podfile for inference.yml parsing - Create InferenceConfig.swift with typed parsing of inference.yml - Support TransformOp enum: DetResizeForTest, NormalizeImage, ToCHWImage, RecResizeImg - PostProcessConfig handles both det (DBPostProcess) and rec (CTCLabelDecode) configs - Python-style scale string '1./255.' parsed via string splitting (not eval) - Register InferenceConfig.swift in Xcode project
- Create Preprocessing.swift with DetPreprocessor and PreprocessResult - Port DetResizeForTest: resize longest side to resize_long, ceil to 128 stride - Port NormalizeImage: config-driven scale/mean/std normalization - Port ToCHWImage: HWC-to-CHW layout conversion producing [1,3,H,W] tensor - Image padding for tiny images (h+w < 64) matching Python reference - Pure Swift using CoreGraphics + Accelerate (no OpenCV dependency) - All transform parameters read from InferenceConfig (zero hardcoded values) - Register Preprocessing.swift in Xcode project
- Add DBPostProcessor with full pipeline: threshold -> contours -> minAreaRect -> score -> expand -> scale - Implement Suzuki-Abe contour finding with CHAIN_APPROX_SIMPLE compression - Implement rotating calipers minAreaRect + convex hull (Andrew's monotone chain) - Add scanline polygon fill for box_score_fast computation - Integrate ClipperOffset for polygon expansion (unclip) - All parameters configurable via DBPostProcessConfigurable protocol - Pure Swift, no OpenCV dependency
…xproj UUID conflicts
- Add runDetection(inputData:shape:) for real inference with preprocessed data - Returns output tensors as [String: (data: [Float], shape: [Int])] dictionary - Includes NaN validation on output tensors - All existing methods (loadModels, validateDetModel, validateRecModel) preserved unchanged
…line - DetectionEngine wires DetPreprocessor -> ORTSessionManager.runDetection -> DBPostProcessor - DetectionResult struct with boxes and per-stage timing metrics (preprocess/inference/postprocess) - All parameters loaded from inference.yml via InferenceConfig.load() - PostProcessConfig conforms to DBPostProcessConfigurable for type-safe init bridging - DetectionEngineError for noOutputTensor and unexpectedOutputShape cases - Registered in Xcode project pbxproj
- SUMMARY.md documenting DetectionEngine pipeline integration - STATE.md updated with position, decisions, metrics - REQUIREMENTS.md: POST-01, POST-02 marked complete
…rence method - Extract private runInference() from runDetection to eliminate code duplication - Add public runRecognition() method that uses recSession for recognition model inference - Both runDetection and runRecognition guard their respective sessions and delegate to runInference - Recognition model supports dynamic-width input tensors [1, 3, 48, W] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implement OCRResizeNormImg algorithm matching PaddleX text_recognition/processors.py - Read imgC/imgH/imgW from inference.yml RecResizeImg.image_shape (config-driven, not hardcoded) - Aspect-ratio-aware resize with ceil() width computation matching Python math.ceil() - Recognition normalization: pixel/127.5 - 1.0 mapping [0,255] to [-1,1] (not ImageNet mean/std) - HWC-to-CHW transpose and right-pad with zeros to target width - Bilinear interpolation via CGContext (pure Swift, no OpenCV) - Register RecPreprocessor.swift in Xcode project (PBXBuildFile, PBXFileReference, PBXGroup, Sources) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create 03-01-SUMMARY.md documenting plan execution - Update STATE.md: advance to Phase 3 Plan 1 complete, add decisions - Update ROADMAP.md: Phase 3 progress 1/2 - Update REQUIREMENTS.md: mark PREP-04 and PREP-05 as complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…racter dictionary - CTCDecoder struct ported from ppocr/postprocess/rec_postprocess.py CTCLabelDecode - Reads character_dict from inference.yml PostProcess config - Prepends blank token at index 0 (CTC convention) - Decoding: argmax + consecutive duplicate removal + blank filtering + char mapping - Confidence: mean probability of selected timesteps - Registered in Xcode project (Engine group)
…pipeline - RecognitionEngine class mirrors DetectionEngine pattern - Composes RecPreprocessor + ORTSessionManager.runRecognition + CTCDecoder - Returns RecognitionEngineResult with text, confidence, and per-stage timing - Config-driven: reads inference.yml for preprocessing dims and character dictionary - Registered in Xcode project (Engine group)
Contributor
|
Hi! Sorry for bothering. I saw the upcoming PRs regarding a PPOCRv6 model but there were no changes made to the general OCR pipeline yet. Will this also get an update or has PPOCRv6 a different use case distinct from the PPOCRv5 model? |
Member
Author
PP-OCRv6 is still under development. It is expected to be used in the same way as PP-OCRv5, namely through the general OCR pipeline. |
Per review feedback from changdazhou on PR PaddlePaddle#17820 (L26), update the CUDA 12.6 Docker GPU line to require driver >= 550.54.14, matching the pip section already at L61 (both ZH and EN). Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Per follow-up review on PR PaddlePaddle#17820: from a completeness standpoint, None belongs in the "Supports ..." enumeration rather than only in the trailing clarification sentence. Move None into the list as the default value and tighten the follow-on sentence accordingly. - EN: "Supports None (the default), paddle, paddle_static, paddle_dynamic, and transformers. When left as None, PaddleOCR preserves the behavior of earlier versions..." - ZH: "支持 None(默认值)、paddle、paddle_static、paddle_dynamic、 transformers。保持为默认值 None 时..." Applied to all three supported-value variants across the module_usage and pipeline_usage pages — same 48 files / 66 rows as the previous clarification commit. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Resolves conflict in docs/version3.x/pipeline_usage/PaddleOCR-VL.md:
- Accept upstream refactor of CLI and Python instantiation parameter
tables from HTML to markdown pipe-table format.
- Preserve the {#流程导览} anchor on the "流程导览" heading (needed
for mkdocs bilingual link check).
- Re-apply the engine-row clarification (None as default + legacy
behavior note) to the two engine rows in the new pipe-tables.
Incoming commits:
- a874bcb Optimize docs
- 85275d4 Update docs
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
docs: release-review fixes for 3.5 docs
…ary (PaddlePaddle#17954) (PaddlePaddle#17961) * ci,docs: align PaddleX install branch and document py3.8 extras boundary - CI: derive the PaddleX install branch from the paddlex constraint in pyproject.toml (release/X.Y) so PR/GPU tests stay in sync as the paddleocr series advances; apply to both tests.yml and test_gpu.yml - CI: install only paddleocr[doc2md] on py3.8 since several paddlex transitive deps require py3.9+; add a py38_incompatible pytest marker and gate affected tests behind it - CI: pin paddlepaddle==3.0.0 on py3.8 / 3.1.0 on py3.9+ (match GPU CI) - CI: standardize workflow filenames (.yaml -> .yml, dashes -> underscores) - deps: pin lmdb<1.5 on py3.8 (newer lmdb wheels reference Py_SET_REFCNT, a py3.9 stdlib C API) - docs: note Python 3.8+ for base paddleocr/doc2md; py3.9+ for doc-parser/ie/trans/all extras (safetensors>=0.7 dropped py3.8) - docs: update PaddleOCR-VL manual-install Python range to 3.9-3.13 across all hardware variants (VL pipelines use doc-parser) - tests: tag tests that require py3.9+ extras/deps with py38_incompatible * ci: mark /workspace as git safe.directory in GPU runner setuptools_scm runs git to derive the package version during pip install -e .; inside the GPU CI container /workspace is owned by the host user, which trips git's dubious-ownership check and aborts the paddleocr install. --------- (cherry picked from commit 09e8700) Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
…ib .npy Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.