|
| 1 | +# TODO |
| 2 | + |
| 3 | +## ~~Add `Image.FaceDetection`~~ — shipped in v0.3.0 |
| 4 | + |
| 5 | +`Image.FaceDetection` ships with the `detect/2`, `boxes/2`, |
| 6 | +`crop_largest/2`, and `draw_boxes/3` API, backed by YuNet |
| 7 | +2023-March (MIT, ~340 KB). The `crop_largest/2` helper is the |
| 8 | +wire-in point for the face-aware crop bias still pending in |
| 9 | +`image_plug`'s interpreter — `gravity: :face`, ImageKit `z-`, |
| 10 | +Cloudflare `face-zoom`, and Cloudinary `e_pixelate_faces` can |
| 11 | +all drive their face-aware behaviour through it once the |
| 12 | +`image_plug` interpreter passes detection results back into |
| 13 | +the resize / pixelate ops. |
| 14 | + |
| 15 | +The original recommendation notes (kept for context): |
| 16 | + |
| 17 | +The default model should follow the same conventions as |
| 18 | +`Image.Background`, `Image.Detection`, and |
| 19 | +`Image.Segmentation`: |
| 20 | + |
| 21 | +* ONNX export, loaded via Ortex, conditional on |
| 22 | + `ImageVision.ortex_configured?()`. |
| 23 | +* Hosted on HuggingFace under a stable namespace. |
| 24 | +* Permissive licence (MIT / Apache 2.0 — *not* AGPL). |
| 25 | +* Sensible-defaults API: `detect/2` returns |
| 26 | + `[%{box: %{x:, y:, w:, h:}, score:, landmarks: [{x, y}, …]}]`. |
| 27 | + |
| 28 | +### Recommended primary model: YuNet (OpenCV) |
| 29 | + |
| 30 | +* HuggingFace: [`opencv/face_detection_yunet`](https://huggingface.co/opencv/face_detection_yunet) |
| 31 | + (or any community ONNX export of the upstream |
| 32 | + [opencv_zoo YuNet](https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet)). |
| 33 | +* Size: ~340 KB — **the smallest production-quality face |
| 34 | + detector available**. Comparable to BlazeFace but with |
| 35 | + better accuracy. |
| 36 | +* Licence: MIT. |
| 37 | +* Output: bounding boxes + 5 facial landmarks (left eye, |
| 38 | + right eye, nose tip, mouth corners) + confidence. |
| 39 | +* Input: 320×320 (configurable). Real-time on CPU. |
| 40 | +* Maintained by the OpenCV team — long-term stability is high. |
| 41 | + |
| 42 | +This is the best "sensible default" for a defaults-first |
| 43 | +library. The 340 KB size means it's reasonable to ship in a |
| 44 | +Docker image without bloating the layer. |
| 45 | + |
| 46 | +### Recommended alternative for higher accuracy: SCRFD |
| 47 | + |
| 48 | +* HuggingFace: [`onnx-community/scrfd_2.5g_bnkps`](https://huggingface.co/onnx-community/scrfd_2.5g_bnkps) |
| 49 | + (or the larger `scrfd_10g_bnkps` for production servers). |
| 50 | +* Size: 3 MB (2.5g variant) to ~17 MB (10g). |
| 51 | +* Licence: MIT (InsightFace). |
| 52 | +* Output: same shape as YuNet — boxes + 5 landmarks + score. |
| 53 | +* Slightly better mAP on WIDER FACE than YuNet, especially |
| 54 | + on small / occluded faces. |
| 55 | +* Same on-disk model format / preprocessing path; an |
| 56 | + `:image_vision` user can swap via `:repo` and `:model_file` |
| 57 | + options without code changes. |
| 58 | + |
| 59 | +### Recommended ultra-light alternative: BlazeFace (MediaPipe) |
| 60 | + |
| 61 | +* HuggingFace: [`onnx-community/blazeface`](https://huggingface.co/Xenova/blazeface) / |
| 62 | + community exports of the |
| 63 | + [MediaPipe BlazeFace](https://github.com/google/mediapipe). |
| 64 | +* Size: ~250 KB. |
| 65 | +* Licence: Apache 2.0. |
| 66 | +* Output: boxes + 6 landmarks. |
| 67 | +* Good for selfies / portrait photography (front-facing |
| 68 | + variant) but worse on group photos or occluded faces than |
| 69 | + YuNet / SCRFD. |
| 70 | + |
| 71 | +### Sketch of `Image.FaceDetection` API |
| 72 | + |
| 73 | +```elixir |
| 74 | +{:ok, image} = Image.open("./crowd.jpg") |
| 75 | + |
| 76 | +[%{box: box, score: score, landmarks: marks} | _] = |
| 77 | + Image.FaceDetection.detect(image) |
| 78 | + |
| 79 | +# Convenience: just the bounding boxes, ranked. |
| 80 | +boxes = Image.FaceDetection.boxes(image) |
| 81 | + |
| 82 | +# Crop to the largest detected face with N% padding. |
| 83 | +{:ok, portrait} = Image.FaceDetection.crop_largest(image, padding: 0.2) |
| 84 | +``` |
| 85 | + |
| 86 | +The `crop_largest/2` helper is the wire-in point for |
| 87 | +`gravity: :face`, ImageKit `z-<n>`, and Cloudflare |
| 88 | +`face-zoom` over in `image_plug`. |
| 89 | + |
| 90 | +## Other ideas (lower priority) |
| 91 | + |
| 92 | +* **Pose estimation** (MediaPipe Pose / RT-Pose) — for |
| 93 | + pose-aware cropping. Niche, but useful for sports / fashion |
| 94 | + imagery. |
| 95 | + |
| 96 | +* **OCR-aware detection** — wrap `image_ocr` to expose |
| 97 | + `Image.FaceDetection`-style "where are the text regions" |
| 98 | + results. Sibling concern; could land in `image_ocr` itself. |
| 99 | + |
| 100 | +* **Aesthetic-quality scoring** — model that rates an image's |
| 101 | + composition / sharpness / exposure. Useful for picking the |
| 102 | + best frame from a video or the best variant from a batch. |
| 103 | + Several open-source variants exist (NIMA, MUSIQ); none yet |
| 104 | + packaged as a clean ONNX export with a permissive licence. |
0 commit comments