Skip to content

Commit 7f12d1a

Browse files
committed
Add guide, fix version
1 parent 6d82bec commit 7f12d1a

4 files changed

Lines changed: 119 additions & 167 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,11 @@
11
# Changelog
22

3-
## ImageVision v0.3.0
3+
## [0.2.0] 2026-05-02
44

55
### Added
66

77
* **`Image.FaceDetection`** — fast face detection with bounding boxes, confidence scores, and the five canonical facial landmarks (right eye, left eye, nose tip, right mouth corner, left mouth corner). Default model is [YuNet 2023-March](https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet) hosted at [`opencv/face_detection_yunet`](https://huggingface.co/opencv/face_detection_yunet) — MIT licensed, ~340 KB on disk, real-time on CPU. Functions: `detect/2`, `boxes/2`, `crop_largest/2`, `draw_boxes/3`. The `crop_largest/2` helper is the wire-in point for face-aware crop bias used by sibling `image_plug` (`gravity: :face`, ImageKit `z-`, Cloudflare `face-zoom`).
88

9-
## ImageVision v0.2.0
10-
11-
### Added
12-
139
* **`Image.Background`** — class-agnostic foreground/background separation. `remove/2` returns the input image with the background made transparent (alpha mask applied); `mask/2` returns the foreground mask alone for custom compositing. Default model is [BiRefNet lite](https://huggingface.co/onnx-community/BiRefNet_lite-ONNX) (MIT, ~210 MB), powered by Ortex.
1410

1511
* **`Image.Captioning`** — natural-language description of an image. `caption/2` returns a string like `"a man riding a horse with a bird of prey"`. Default model is [BLIP base](https://huggingface.co/Salesforce/blip-image-captioning-base) (BSD-3-Clause, ~990 MB), powered by Bumblebee. Heavy enough that it is not autostarted by default; configure `autostart: true` or add the child spec to your supervisor.

Dockerfile.ortex-precompiled

Lines changed: 0 additions & 160 deletions
This file was deleted.

guides/face_detection.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Face Detection
2+
3+
`Image.FaceDetection` answers "where are the faces in this image?". It returns a list of bounding boxes, confidence scores, and five facial landmarks (right eye, left eye, nose tip, right mouth corner, left mouth corner) per detected face.
4+
5+
## Basic detection
6+
7+
```elixir
8+
iex> image = Image.open!("group.jpg")
9+
iex> faces = Image.FaceDetection.detect(image)
10+
iex> hd(faces)
11+
%{
12+
box: {412, 88, 96, 124},
13+
score: 0.94,
14+
landmarks: [{438.2, 130.1}, {478.7, 129.6}, {458.0, 152.3}, {442.1, 178.5}, {475.0, 178.2}]
15+
}
16+
```
17+
18+
Each detection is a map with:
19+
- `:box``{x, y, width, height}` in pixel coordinates of the original image
20+
- `:score` — confidence score in `[0.0, 1.0]`
21+
- `:landmarks` — a list of five `{x, y}` tuples: right eye, left eye, nose tip, right mouth corner, left mouth corner — in that order
22+
23+
Results are sorted by descending confidence.
24+
25+
## Filtering by confidence
26+
27+
The default minimum score is `0.6`. Raise it for stricter detections:
28+
29+
```elixir
30+
iex> Image.FaceDetection.detect(image, min_score: 0.8)
31+
```
32+
33+
`:nms_iou` (default `0.3`) controls how aggressively overlapping boxes are collapsed by non-maximum suppression. Lower values keep fewer overlapping faces.
34+
35+
## Boxes only
36+
37+
When landmarks aren't needed, `boxes/2` skips them:
38+
39+
```elixir
40+
iex> Image.FaceDetection.boxes(image)
41+
[{412, 88, 96, 124}, {612, 102, 84, 110}]
42+
```
43+
44+
## Drawing detections
45+
46+
`draw_boxes/3` overlays bounding boxes, the score as a percentage label, and the five landmark dots:
47+
48+
```elixir
49+
iex> faces = Image.FaceDetection.detect(image)
50+
iex> annotated = Image.FaceDetection.draw_boxes(faces, image)
51+
iex> Image.write!(annotated, "annotated.jpg")
52+
```
53+
54+
Pipeline form:
55+
56+
```elixir
57+
iex> image
58+
...> |> Image.FaceDetection.detect()
59+
...> |> Image.FaceDetection.draw_boxes(image)
60+
...> |> Image.write!("annotated.jpg")
61+
```
62+
63+
Drawing options include `:color`, `:stroke_width`, `:landmark_radius`, `:font_size`, and `:show_landmarks?` (set to `false` to skip the dots).
64+
65+
## Face-aware crop
66+
67+
`crop_largest/2` is a convenience for the common "crop to the most prominent face" case (the wire-in point for face-aware crop bias used by `gravity: :face` in `image_plug`, ImageKit `z-`, and Cloudflare `face-zoom`):
68+
69+
```elixir
70+
iex> {:ok, portrait} = Image.FaceDetection.crop_largest(image, padding: 0.2)
71+
```
72+
73+
The largest face is chosen by bounding-box area. `:padding` is a fraction of each face dimension — `0.0` is a tight crop, `0.5` adds 50% on each side, `1.0` doubles the box. The expanded crop is clipped to the image bounds.
74+
75+
When no face meets the score threshold, `crop_largest/2` returns `{:error, :no_face_detected}`.
76+
77+
## Default model
78+
79+
[YuNet](https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet) (`opencv/face_detection_yunet`) — the OpenCV team's production face detector. Roughly **340 KB on disk**, MIT licensed, real-time on CPU. The 2023-March export produces decoded boxes, keypoints, and scores directly.
80+
81+
Model weights are downloaded on first call and cached. Configure the cache directory with:
82+
83+
```elixir
84+
config :image_vision, :cache_dir, "/path/to/cache"
85+
```
86+
87+
## Using a different model
88+
89+
`detect/2` accepts `:repo` and `:model_file` to swap in a different YuNet ONNX export:
90+
91+
```elixir
92+
iex> Image.FaceDetection.detect(image,
93+
...> repo: "opencv/face_detection_yunet",
94+
...> model_file: "face_detection_yunet_2023mar.onnx"
95+
...> )
96+
```
97+
98+
### Caveat: post-processor is YuNet 2023-March specific
99+
100+
The output decoder assumes YuNet's 2023-March 12-tensor convention (`cls_*`, `obj_*`, `bbox_*`, `kps_*` at strides 8/16/32, fixed 640×640 input). `SCRFD`, `BlazeFace`, and other face-detector exports produce different output shapes and need a different post-processor — they will not work as a drop-in replacement.
101+
102+
## Dependencies
103+
104+
Face detection requires `:ortex`. Add to `mix.exs`:
105+
106+
```elixir
107+
{:ortex, "~> 0.1"}
108+
```

mix.exs

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
defmodule ImageVision.MixProject do
22
use Mix.Project
33

4-
@version "0.3.0"
4+
@version "0.2.0"
55
@app_name "image_vision"
66

77
def project do
@@ -118,11 +118,18 @@ defmodule ImageVision.MixProject do
118118
logo: "logo.jpg",
119119
extra_section: "Guides",
120120
extras: extras(),
121-
formatters: ["html"],
121+
groups_for_extras: groups_for_extras(),
122+
formatters: ["html", "markdown"],
122123
skip_undefined_reference_warnings_on: ["changelog", "CHANGELOG.md"]
123124
]
124125
end
125126

127+
defp groups_for_extras do
128+
[
129+
Guides: ~r"guides/.*"
130+
]
131+
end
132+
126133
defp extras do
127134
Enum.filter(
128135
[
@@ -133,6 +140,7 @@ defmodule ImageVision.MixProject do
133140
"guides/classification.md",
134141
"guides/segmentation.md",
135142
"guides/detection.md",
143+
"guides/face_detection.md",
136144
"guides/background.md",
137145
"guides/captioning.md",
138146
"guides/zero_shot.md"

0 commit comments

Comments
 (0)