Skip to content

Commit 832a8bd

Browse files
committed
Add Image.FaceDetection (YuNet) — detect/boxes/crop_largest/draw_boxes
1 parent 24904ba commit 832a8bd

10 files changed

Lines changed: 837 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Changelog
22

3+
## ImageVision v0.3.0
4+
5+
### Added
6+
7+
* **`Image.FaceDetection`** — fast face detection with bounding boxes, confidence scores, and the five canonical facial landmarks (right eye, left eye, nose tip, right mouth corner, left mouth corner). Default model is [YuNet 2023-March](https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet) hosted at [`opencv/face_detection_yunet`](https://huggingface.co/opencv/face_detection_yunet) — MIT licensed, ~340 KB on disk, real-time on CPU. Functions: `detect/2`, `boxes/2`, `crop_largest/2`, `draw_boxes/3`. The `crop_largest/2` helper is the wire-in point for face-aware crop bias used by sibling `image_plug` (`gravity: :face`, ImageKit `z-`, Cloudflare `face-zoom`).
8+
39
## ImageVision v0.2.0
410

511
### Added

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,10 @@ iex> Image.Classification.embed(puppy)
3838
# Background removal — class-agnostic foreground cutout
3939
iex> {:ok, cutout} = Image.Background.remove(puppy)
4040

41+
# Face detection — bounding boxes + 5 facial landmarks
42+
iex> [%{box: _, score: _, landmarks: _} | _] = Image.FaceDetection.detect(portrait)
43+
iex> {:ok, cropped_to_face} = Image.FaceDetection.crop_largest(portrait, padding: 0.2)
44+
4145
# Image captioning — natural-language description
4246
iex> Image.Captioning.caption(puppy)
4347
"a small brown and white puppy sitting on a wooden floor"

TODO.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# TODO
2+
3+
## ~~Add `Image.FaceDetection`~~ — shipped in v0.3.0
4+
5+
`Image.FaceDetection` ships with the `detect/2`, `boxes/2`,
6+
`crop_largest/2`, and `draw_boxes/3` API, backed by YuNet
7+
2023-March (MIT, ~340 KB). The `crop_largest/2` helper is the
8+
wire-in point for the face-aware crop bias still pending in
9+
`image_plug`'s interpreter — `gravity: :face`, ImageKit `z-`,
10+
Cloudflare `face-zoom`, and Cloudinary `e_pixelate_faces` can
11+
all drive their face-aware behaviour through it once the
12+
`image_plug` interpreter passes detection results back into
13+
the resize / pixelate ops.
14+
15+
The original recommendation notes (kept for context):
16+
17+
The default model should follow the same conventions as
18+
`Image.Background`, `Image.Detection`, and
19+
`Image.Segmentation`:
20+
21+
* ONNX export, loaded via Ortex, conditional on
22+
`ImageVision.ortex_configured?()`.
23+
* Hosted on HuggingFace under a stable namespace.
24+
* Permissive licence (MIT / Apache 2.0 — *not* AGPL).
25+
* Sensible-defaults API: `detect/2` returns
26+
`[%{box: %{x:, y:, w:, h:}, score:, landmarks: [{x, y}, …]}]`.
27+
28+
### Recommended primary model: YuNet (OpenCV)
29+
30+
* HuggingFace: [`opencv/face_detection_yunet`](https://huggingface.co/opencv/face_detection_yunet)
31+
(or any community ONNX export of the upstream
32+
[opencv_zoo YuNet](https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet)).
33+
* Size: ~340 KB — **the smallest production-quality face
34+
detector available**. Comparable to BlazeFace but with
35+
better accuracy.
36+
* Licence: MIT.
37+
* Output: bounding boxes + 5 facial landmarks (left eye,
38+
right eye, nose tip, mouth corners) + confidence.
39+
* Input: 320×320 (configurable). Real-time on CPU.
40+
* Maintained by the OpenCV team — long-term stability is high.
41+
42+
This is the best "sensible default" for a defaults-first
43+
library. The 340 KB size means it's reasonable to ship in a
44+
Docker image without bloating the layer.
45+
46+
### Recommended alternative for higher accuracy: SCRFD
47+
48+
* HuggingFace: [`onnx-community/scrfd_2.5g_bnkps`](https://huggingface.co/onnx-community/scrfd_2.5g_bnkps)
49+
(or the larger `scrfd_10g_bnkps` for production servers).
50+
* Size: 3 MB (2.5g variant) to ~17 MB (10g).
51+
* Licence: MIT (InsightFace).
52+
* Output: same shape as YuNet — boxes + 5 landmarks + score.
53+
* Slightly better mAP on WIDER FACE than YuNet, especially
54+
on small / occluded faces.
55+
* Same on-disk model format / preprocessing path; an
56+
`:image_vision` user can swap via `:repo` and `:model_file`
57+
options without code changes.
58+
59+
### Recommended ultra-light alternative: BlazeFace (MediaPipe)
60+
61+
* HuggingFace: [`onnx-community/blazeface`](https://huggingface.co/Xenova/blazeface) /
62+
community exports of the
63+
[MediaPipe BlazeFace](https://github.com/google/mediapipe).
64+
* Size: ~250 KB.
65+
* Licence: Apache 2.0.
66+
* Output: boxes + 6 landmarks.
67+
* Good for selfies / portrait photography (front-facing
68+
variant) but worse on group photos or occluded faces than
69+
YuNet / SCRFD.
70+
71+
### Sketch of `Image.FaceDetection` API
72+
73+
```elixir
74+
{:ok, image} = Image.open("./crowd.jpg")
75+
76+
[%{box: box, score: score, landmarks: marks} | _] =
77+
Image.FaceDetection.detect(image)
78+
79+
# Convenience: just the bounding boxes, ranked.
80+
boxes = Image.FaceDetection.boxes(image)
81+
82+
# Crop to the largest detected face with N% padding.
83+
{:ok, portrait} = Image.FaceDetection.crop_largest(image, padding: 0.2)
84+
```
85+
86+
The `crop_largest/2` helper is the wire-in point for
87+
`gravity: :face`, ImageKit `z-<n>`, and Cloudflare
88+
`face-zoom` over in `image_plug`.
89+
90+
## Other ideas (lower priority)
91+
92+
* **Pose estimation** (MediaPipe Pose / RT-Pose) — for
93+
pose-aware cropping. Niche, but useful for sports / fashion
94+
imagery.
95+
96+
* **OCR-aware detection** — wrap `image_ocr` to expose
97+
`Image.FaceDetection`-style "where are the text regions"
98+
results. Sibling concern; could land in `image_ocr` itself.
99+
100+
* **Aesthetic-quality scoring** — model that rates an image's
101+
composition / sharpness / exposure. Useful for picking the
102+
best frame from a video or the best variant from a batch.
103+
Several open-source variants exist (NIMA, MUSIQ); none yet
104+
packaged as a clean ONNX export with a permissive licence.

0 commit comments

Comments
 (0)