Optional burned-in text / PHI detection (phi-text-detector) by luckfamousa · Pull Request #1 · UMEssen/dicom-deidentify-rs

luckfamousa · 2026-05-22T08:07:29Z

What

Tag-based de-identification never touches pixel data, so PHI burned into images (name overlays, ultrasound banners, secondary-capture screenshots) survives. This adds an optional detector to flag such images.

New workspace crate phi-text-detector — a PaddleOCR DB text detector (detection only, no OCR) via ONNX Runtime (ort):

onnx_detector — PP-OCR preprocessing (aspect-preserving resize to multiples of 32, ImageNet normalization, NCHW) + ONNX session (behind a Mutex so the detector is Send + Sync with a &self detect()).
db_postprocess — pure-Rust probability-map → bitmap → connected components → scored/expanded/rescaled boxes (heavily unit-tested on synthetic maps).
dicom_render — render first frame + extract screening metadata.
image_prefilter — margin crops + a contrast gate.
screening — Safe/Review/Unsafe from detection + metadata (BurnedInAnnotation, modality, secondary-capture SOP class); fail-closed.
phi-screen CLI — screen images / DICOM / directories, JSON or JSONL.

Optional integration into the de-id CLI (the requested behavior)

Behind a text-detection feature (off by default, so the core library never pulls in ONNX Runtime):

--detect-text off (default) — no detection
--detect-text warn — flag on stderr, still writes the de-identified file
--detect-text skip — writes no output for flagged images, exits 3

Model

Not committed. scripts/fetch-model.sh downloads and SHA-256-verifies ppocr_det.onnx (source + hash recorded in models/ppocr_det.metadata.json). CI fetches it.

Tests

DB postprocessing, screening rules, prefilter — unit tests.
Real ONNX detection on committed fixtures (positive_text.png → text; negative_no_text.png → none).
De-id CLI policy (skip exits 3 + no output; warn writes + warns; clean image passes) via BurnedInAnnotation.
CI runs the workspace + the text-detection integration; verified locally against rustc/clippy 1.95.

Notes

This first increment renders the first frame and screens the full image (margin-first crops are available in image_prefilter but not yet wired as the default scan strategy).

🤖 Generated with Claude Code

Tag-based de-identification leaves pixel data untouched, so PHI burned into images survives. Add a new workspace crate, phi-text-detector, that flags such images using a PaddleOCR DB text detector (detection only) run via ONNX Runtime: - onnx_detector: PP-OCR preprocessing (resize to mult-of-32, ImageNet normalize, NCHW) + ONNX Runtime session (session behind a Mutex so the detector is Send+Sync with a &self detect()). - db_postprocess: pure-Rust DB map -> bitmap -> connected components -> scored, expanded, rescaled boxes (heavily unit-tested). - dicom_render: render first frame + extract screening metadata. - image_prefilter: margin crops + a contrast gate. - screening: Safe/Review/Unsafe decision from detection + metadata (BurnedInAnnotation, modality, secondary capture); fail-closed. - phi-screen CLI: screen images/DICOM/dirs, JSON / JSONL output. Integrate optionally into the de-id CLI behind a `text-detection` feature: --detect-text off|warn|skip (warn flags but still writes; skip suppresses output and exits 3). Default off, so the core library never pulls in ONNX Runtime. The model is fetched + SHA-256-verified by scripts/fetch-model.sh (not committed); CI fetches it and runs the workspace + feature tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

luckfamousa mentioned this pull request May 22, 2026

Extract burned-in text, classify PHI, and redact pixels #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional burned-in text / PHI detection (phi-text-detector)#1

Optional burned-in text / PHI detection (phi-text-detector)#1
luckfamousa wants to merge 1 commit into
mainfrom
feature/burned-in-text-detection

luckfamousa commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

luckfamousa commented May 22, 2026

What

Optional integration into the de-id CLI (the requested behavior)

Model

Tests

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant