You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+65-2Lines changed: 65 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,8 +47,8 @@ def deps do
47
47
48
48
# Required for Image.Classification and Image.Classification.embed/2
49
49
{:bumblebee, "~> 0.6"},
50
-
{:nx, "~> 0.11"},
51
-
{:exla, "~> 0.9"}, # or {:torchx, "~> 0.9"} for Torch backend
50
+
{:nx, "~> 0.10"},
51
+
{:exla, "~> 0.10"}, # or {:torchx, "~> 0.10"} for Torch backend
52
52
53
53
# Required for Image.Detection and Image.Segmentation
54
54
{:ortex, "~> 0.1"}
@@ -58,6 +58,69 @@ end
58
58
59
59
All ML deps are optional — omit any you do not use. The library compiles cleanly without them.
60
60
61
+
## Prerequisites
62
+
63
+
For the vast majority of users on Linux x86_64, macOS (Intel and Apple Silicon), and Windows x86_64, **no native toolchain is required**. The libraries used here ship precompiled native binaries for those platforms and `mix deps.get` is all you need.
64
+
65
+
If your platform isn't covered by precompiled binaries — uncommon Linux distros, ARM Linux, glibc mismatches — you'll need:
66
+
67
+
***A Rust toolchain** for `:ortex` (the ONNX runtime wrapper used by detection and segmentation) and for `:tokenizers` (pulled in transitively by `:bumblebee`). Install via [rustup](https://rustup.rs):
68
+
69
+
```bash
70
+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
71
+
```
72
+
73
+
***A C compiler** for `:vix` (the libvips wrapper used by `:image`). On Linux install `build-essential` (Debian/Ubuntu) or `gcc` (Fedora/RHEL); on macOS install Xcode Command Line Tools (`xcode-select --install`).
74
+
75
+
***`libvips`** if you need advanced libvips features beyond what the precompiled NIF includes. On macOS: `brew install vips`. On Linux: your distro's `libvips-dev` / `vips-devel` package. Most users don't need this.
76
+
77
+
If you see Cargo or `cc` errors during `mix deps.compile`, you've likely landed on a platform without precompiled coverage — install the toolchain above and re-run.
78
+
79
+
### Disk space and first-call latency
80
+
81
+
Model weights are downloaded on first call and cached on disk. Across all three default models the total is approximately:
The first call to each task therefore appears to "hang" while weights download — that's expected, not a bug.
91
+
92
+
To pre-download all default models before first use (recommended for production deployments and CI):
93
+
94
+
```bash
95
+
mix image_vision.download_models
96
+
```
97
+
98
+
Pass `--classify`, `--detect`, or `--segment` to limit scope.
99
+
100
+
### Livebook Desktop
101
+
102
+
Livebook Desktop launches as a GUI application and **does not inherit your shell's `PATH`**. Tools installed via `rustup`, `mise`, `asdf`, or Homebrew aren't visible to it by default — even if `cargo` works fine in your terminal.
103
+
104
+
If you hit "cargo: command not found" or similar during `Mix.install` inside Livebook Desktop, create `~/.livebookdesktop.sh` and add the relevant directories to `PATH`. A reasonable starting point:
105
+
106
+
```bash
107
+
# ~/.livebookdesktop.sh
108
+
109
+
# Rust (rustup)
110
+
export PATH="$HOME/.cargo/bin:$PATH"
111
+
112
+
# Homebrew (Apple Silicon)
113
+
export PATH="/opt/homebrew/bin:$PATH"
114
+
115
+
# mise — uncomment if you use it
116
+
# eval "$(mise activate bash)"
117
+
118
+
# asdf — uncomment if you use it
119
+
# . "$HOME/.asdf/asdf.sh"
120
+
```
121
+
122
+
Restart Livebook Desktop after creating this file. See the [Livebook Desktop documentation](https://github.com/livebook-dev/livebook/blob/main/README.md#livebook-desktop) for details.
123
+
61
124
## Default models
62
125
63
126
All models are permissively licensed. Weights are downloaded automatically on first call and cached on disk — no manual setup required.
The embedder is configured the same way, independently:
63
+
64
+
```elixir
65
+
config :image_vision, :embedder,
66
+
model: {:hf, "facebook/dinov2-large"},
67
+
featurizer: {:hf, "facebook/dinov2-large"}
68
+
```
69
+
70
+
Any image classification or image embedding model that Bumblebee can load works as a drop-in replacement — anything from the [HuggingFace image-classification](https://huggingface.co/models?pipeline_tag=image-classification) or [feature-extraction](https://huggingface.co/models?pipeline_tag=feature-extraction) catalogue with a corresponding featurizer. Larger models trade speed and memory for accuracy. The label set will be whatever the chosen model was trained on — for example, `convnext-large-224-22k-1k` returns the same 1000 ImageNet labels as the default, while a model fine-tuned on a different dataset returns that dataset's labels.
71
+
62
72
To manage the serving yourself (e.g. in an umbrella app):
Configuration changes take effect at application start. After editing `config/runtime.exs`, restart the application; the new model is downloaded on first call and cached.
83
+
84
+
To pre-download a configured model so first-call latency is eliminated:
85
+
86
+
```bash
87
+
mix image_vision.download_models --classify
88
+
```
89
+
72
90
## Dependencies
73
91
74
92
Classification requires `:bumblebee`, `:nx`, and an Nx backend such as `:exla`. Add to `mix.exs`:
The default `onnx/model.onnx` (~175 MB FP32) gives the best accuracy. Use the quantized variant (~45 MB INT8) for a much smaller download at some accuracy cost:
60
+
`detect/2` accepts `:repo` and `:filename` to swap in any RT-DETR-family ONNX model from HuggingFace:
61
61
62
62
```elixir
63
+
# Smaller R18 backbone (~80 MB) — faster, slightly less accurate
(The download task fetches the configured default; if you've changed the repo for a single call site, the cache will populate on first use.)
83
+
84
+
### Caveat: COCO 80 labels are hardcoded
85
+
86
+
`detect/2` maps class indices to label strings using a baked-in COCO 80 list ([detection.ex](https://github.com/elixir-image/image_vision/blob/main/lib/detection.ex)). RT-DETR models trained on a different label set (e.g. Open Images, custom domains) will produce indices the wrapper can't translate — labels will be wrong even though boxes and scores are correct. For non-COCO models, use the underlying `Ortex.run/2` directly with the model's own `id2label`.
87
+
66
88
## Default model
67
89
68
90
RT-DETR (`onnx-community/rtdetr_r50vd`) is a real-time transformer-based detector that outperforms YOLOv8 on COCO while being **Apache 2.0 licensed** (YOLOv8/11 are AGPL). It is NMS-free — no Non-Maximum Suppression post-processing is needed.
Both `segment/2` and `segment_panoptic/2` accept options to swap models. They are passed per call rather than via app config — neither function uses a long-running serving, so there is no autostart cost to overriding on a single call.
101
+
102
+
### Promptable (SAM 2)
103
+
104
+
```elixir
105
+
# Use a larger SAM 2 variant for better quality on small or thin objects
106
+
iex>Image.Segmentation.segment(image,
107
+
...>prompt: {:point, 320, 240},
108
+
...>repo:"SharpAI/sam2-hiera-small-onnx")
109
+
```
110
+
111
+
`segment/2` accepts:
112
+
113
+
-`:repo` — any HuggingFace repo containing a SAM 2 ONNX export with separate encoder and decoder files
114
+
-`:encoder_file` — encoder filename within the repo (default `"encoder.onnx"`)
115
+
-`:decoder_file` — decoder filename within the repo (default `"decoder.onnx"`)
116
+
117
+
The protocol matches `SharpAI/sam2-hiera-tiny-onnx` (separate encoder/decoder, the standard SAM 2 ONNX export shape). Repos that bundle both into a single file or use a different I/O layout will not work without changes to the wrapper.
118
+
119
+
### Class-labeled (DETR-panoptic)
120
+
121
+
```elixir
122
+
# Quantized variant — much smaller, some accuracy cost
-`:repo` — any HuggingFace repo with a DETR-panoptic ONNX export and a `config.json` providing `id2label`
132
+
-`:model_file` — ONNX filename within the repo (default `"onnx/model.onnx"`)
133
+
134
+
Labels are read from the repo's `config.json`. Where that config has placeholder `LABEL_n` entries, the wrapper falls back to the canonical [COCO panoptic taxonomy](https://github.com/cocodataset/panopticapi/blob/master/panoptic_coco_categories.json), so common stuff classes (`sky-other-merged`, `mountain-merged`, `grass-merged`, …) resolve correctly even on repos with incomplete configs.
135
+
136
+
### Pre-downloading
137
+
138
+
To populate the cache before first use:
139
+
140
+
```bash
141
+
mix image_vision.download_models --segment
142
+
```
143
+
144
+
This fetches the configured defaults. For non-default repos, the cache populates on first call to `segment/2` or `segment_panoptic/2`.
Copy file name to clipboardExpand all lines: livebooks/image_vision.livemd
+14-3Lines changed: 14 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,16 @@
1
1
# Image Vision
2
2
3
+
> #### Livebook Desktop users {: .info}
4
+
>
5
+
> Livebook Desktop launches as a GUI app and does **not** inherit your terminal's `PATH`. If `Mix.install` below fails with "cargo: command not found" or similar, your Rust toolchain isn't visible to Livebook Desktop. Fix by creating `~/.livebookdesktop.sh` with at minimum:
6
+
>
7
+
> ```sh
8
+
>export PATH="$HOME/.cargo/bin:$PATH"
9
+
># If you use mise/asdf, also activate them here.
10
+
>```
11
+
>
12
+
> Restart Livebook Desktop after editing the file. See the project [README](https://github.com/elixir-image/image_vision#prerequisites) for details on which toolchains are needed.
13
+
3
14
```elixir
4
15
Mix.install(
5
16
[
@@ -54,7 +65,7 @@ Kino.DataTable.new(predictions)
54
65
55
66
## Object Detection
56
67
57
-
Upload an image and the model returns bounding boxes with labels for every recognised object. The default model is `PekingU/rtdetr_v2_r18vd` (RT-DETR v2, Apache 2.0, ~160 MB), which detects 80 COCO object classes.
68
+
Upload an image and the model returns bounding boxes with labels for every recognised object. The default model is `onnx-community/rtdetr_r50vd` (RT-DETR, Apache 2.0, ~175 MB), which detects 80 COCO object classes.
58
69
59
70
```elixir
60
71
detection_input = Kino.Input.image("Image for object detection")
@@ -136,7 +147,7 @@ Segmentation answers "exactly which pixels belong to this object?". We show two
136
147
137
148
### Panoptic segmentation
138
149
139
-
The default model is `facebook/detr-resnet-50-panoptic` (Apache 2.0, ~172 MB). It labels every pixel across 133 COCO categories — both things (cat, car) and stuff (sky, grass).
150
+
The default model is `Xenova/detr-resnet-50-panoptic` (Apache 2.0, ~175 MB). It labels every pixel across 133 COCO panoptic categories — both things (cat, car) and stuff (sky, grass).
140
151
141
152
```elixir
142
153
panoptic_input = Kino.Input.image("Image for panoptic segmentation")
@@ -174,7 +185,7 @@ end)
174
185
175
186
### Promptable segmentation (SAM 2)
176
187
177
-
The default model is `facebook/sam2.1-hiera-tiny` (Apache 2.0, ~150 MB). Click a point inside the object you want to isolate and SAM 2 returns a precise pixel mask for it.
188
+
The default model is `SharpAI/sam2-hiera-tiny-onnx` (Apache 2.0, ~150 MB total — encoder + decoder). Click a point inside the object you want to isolate and SAM 2 returns a precise pixel mask for it.
178
189
179
190
```elixir
180
191
sam_input = Kino.Input.image("Image for SAM 2 segmentation")
0 commit comments