Skip to content

Commit 70f64c1

Browse files
committed
Improve install docs
1 parent 2380905 commit 70f64c1

5 files changed

Lines changed: 169 additions & 7 deletions

File tree

README.md

Lines changed: 65 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ def deps do
4747

4848
# Required for Image.Classification and Image.Classification.embed/2
4949
{:bumblebee, "~> 0.6"},
50-
{:nx, "~> 0.11"},
51-
{:exla, "~> 0.9"}, # or {:torchx, "~> 0.9"} for Torch backend
50+
{:nx, "~> 0.10"},
51+
{:exla, "~> 0.10"}, # or {:torchx, "~> 0.10"} for Torch backend
5252

5353
# Required for Image.Detection and Image.Segmentation
5454
{:ortex, "~> 0.1"}
@@ -58,6 +58,69 @@ end
5858

5959
All ML deps are optional — omit any you do not use. The library compiles cleanly without them.
6060

61+
## Prerequisites
62+
63+
For the vast majority of users on Linux x86_64, macOS (Intel and Apple Silicon), and Windows x86_64, **no native toolchain is required**. The libraries used here ship precompiled native binaries for those platforms and `mix deps.get` is all you need.
64+
65+
If your platform isn't covered by precompiled binaries — uncommon Linux distros, ARM Linux, glibc mismatches — you'll need:
66+
67+
* **A Rust toolchain** for `:ortex` (the ONNX runtime wrapper used by detection and segmentation) and for `:tokenizers` (pulled in transitively by `:bumblebee`). Install via [rustup](https://rustup.rs):
68+
69+
```bash
70+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
71+
```
72+
73+
* **A C compiler** for `:vix` (the libvips wrapper used by `:image`). On Linux install `build-essential` (Debian/Ubuntu) or `gcc` (Fedora/RHEL); on macOS install Xcode Command Line Tools (`xcode-select --install`).
74+
75+
* **`libvips`** if you need advanced libvips features beyond what the precompiled NIF includes. On macOS: `brew install vips`. On Linux: your distro's `libvips-dev` / `vips-devel` package. Most users don't need this.
76+
77+
If you see Cargo or `cc` errors during `mix deps.compile`, you've likely landed on a platform without precompiled coverage — install the toolchain above and re-run.
78+
79+
### Disk space and first-call latency
80+
81+
Model weights are downloaded on first call and cached on disk. Across all three default models the total is approximately:
82+
83+
| Task | Default model | Size |
84+
|---|---|---|
85+
| Classification | `facebook/convnext-tiny-224` | ~110 MB |
86+
| Detection | `onnx-community/rtdetr_r50vd` | ~175 MB |
87+
| Segmentation (SAM 2) | `SharpAI/sam2-hiera-tiny-onnx` | ~150 MB |
88+
| Segmentation (panoptic) | `Xenova/detr-resnet-50-panoptic` | ~175 MB |
89+
90+
The first call to each task therefore appears to "hang" while weights download — that's expected, not a bug.
91+
92+
To pre-download all default models before first use (recommended for production deployments and CI):
93+
94+
```bash
95+
mix image_vision.download_models
96+
```
97+
98+
Pass `--classify`, `--detect`, or `--segment` to limit scope.
99+
100+
### Livebook Desktop
101+
102+
Livebook Desktop launches as a GUI application and **does not inherit your shell's `PATH`**. Tools installed via `rustup`, `mise`, `asdf`, or Homebrew aren't visible to it by default — even if `cargo` works fine in your terminal.
103+
104+
If you hit "cargo: command not found" or similar during `Mix.install` inside Livebook Desktop, create `~/.livebookdesktop.sh` and add the relevant directories to `PATH`. A reasonable starting point:
105+
106+
```bash
107+
# ~/.livebookdesktop.sh
108+
109+
# Rust (rustup)
110+
export PATH="$HOME/.cargo/bin:$PATH"
111+
112+
# Homebrew (Apple Silicon)
113+
export PATH="/opt/homebrew/bin:$PATH"
114+
115+
# mise — uncomment if you use it
116+
# eval "$(mise activate bash)"
117+
118+
# asdf — uncomment if you use it
119+
# . "$HOME/.asdf/asdf.sh"
120+
```
121+
122+
Restart Livebook Desktop after creating this file. See the [Livebook Desktop documentation](https://github.com/livebook-dev/livebook/blob/main/README.md#livebook-desktop) for details.
123+
61124
## Default models
62125

63126
All models are permissively licensed. Weights are downloaded automatically on first call and cached on disk — no manual setup required.

guides/classification.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,16 @@ config :image_vision, :classifier,
5959
featurizer: {:hf, "facebook/convnext-large-224-22k-1k"}
6060
```
6161

62+
The embedder is configured the same way, independently:
63+
64+
```elixir
65+
config :image_vision, :embedder,
66+
model: {:hf, "facebook/dinov2-large"},
67+
featurizer: {:hf, "facebook/dinov2-large"}
68+
```
69+
70+
Any image classification or image embedding model that Bumblebee can load works as a drop-in replacement — anything from the [HuggingFace image-classification](https://huggingface.co/models?pipeline_tag=image-classification) or [feature-extraction](https://huggingface.co/models?pipeline_tag=feature-extraction) catalogue with a corresponding featurizer. Larger models trade speed and memory for accuracy. The label set will be whatever the chosen model was trained on — for example, `convnext-large-224-22k-1k` returns the same 1000 ImageNet labels as the default, while a model fine-tuned on a different dataset returns that dataset's labels.
71+
6272
To manage the serving yourself (e.g. in an umbrella app):
6373

6474
```elixir
@@ -69,6 +79,14 @@ config :image_vision, :classifier, autostart: false
6979
children = [Image.Classification.classifier()]
7080
```
7181

82+
Configuration changes take effect at application start. After editing `config/runtime.exs`, restart the application; the new model is downloaded on first call and cached.
83+
84+
To pre-download a configured model so first-call latency is eliminated:
85+
86+
```bash
87+
mix image_vision.download_models --classify
88+
```
89+
7290
## Dependencies
7391

7492
Classification requires `:bumblebee`, `:nx`, and an Nx backend such as `:exla`. Add to `mix.exs`:

guides/detection.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,14 +55,36 @@ iex> Image.Detection.classes()
5555
["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", ...]
5656
```
5757

58-
## Choosing a lighter model
58+
## Using a different model
5959

60-
The default `onnx/model.onnx` (~175 MB FP32) gives the best accuracy. Use the quantized variant (~45 MB INT8) for a much smaller download at some accuracy cost:
60+
`detect/2` accepts `:repo` and `:filename` to swap in any RT-DETR-family ONNX model from HuggingFace:
6161

6262
```elixir
63+
# Smaller R18 backbone (~80 MB) — faster, slightly less accurate
64+
iex> Image.Detection.detect(image, repo: "onnx-community/rtdetr_r18vd")
65+
66+
# Quantized variant of the default (~45 MB INT8) — much smaller download, some accuracy cost
6367
iex> Image.Detection.detect(image, filename: "onnx/model_quantized.onnx")
6468
```
6569

70+
For one-off use, pass options per call. To make a non-default model the project default, you can wrap the call:
71+
72+
```elixir
73+
defp detect(image), do: Image.Detection.detect(image, repo: "onnx-community/rtdetr_r18vd")
74+
```
75+
76+
To pre-download a model into the cache:
77+
78+
```bash
79+
mix image_vision.download_models --detect
80+
```
81+
82+
(The download task fetches the configured default; if you've changed the repo for a single call site, the cache will populate on first use.)
83+
84+
### Caveat: COCO 80 labels are hardcoded
85+
86+
`detect/2` maps class indices to label strings using a baked-in COCO 80 list ([detection.ex](https://github.com/elixir-image/image_vision/blob/main/lib/detection.ex)). RT-DETR models trained on a different label set (e.g. Open Images, custom domains) will produce indices the wrapper can't translate — labels will be wrong even though boxes and scores are correct. For non-COCO models, use the underlying `Ortex.run/2` directly with the model's own `id2label`.
87+
6688
## Default model
6789

6890
RT-DETR (`onnx-community/rtdetr_r50vd`) is a real-time transformer-based detector that outperforms YOLOv8 on COCO while being **Apache 2.0 licensed** (YOLOv8/11 are AGPL). It is NMS-free — no Non-Maximum Suppression post-processing is needed.

guides/segmentation.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,54 @@ Adjust transparency with `:alpha` (default `0.5`):
9595
iex> overlay = Image.Segmentation.compose_overlay(street, segments, alpha: 0.3)
9696
```
9797

98+
## Using a different model
99+
100+
Both `segment/2` and `segment_panoptic/2` accept options to swap models. They are passed per call rather than via app config — neither function uses a long-running serving, so there is no autostart cost to overriding on a single call.
101+
102+
### Promptable (SAM 2)
103+
104+
```elixir
105+
# Use a larger SAM 2 variant for better quality on small or thin objects
106+
iex> Image.Segmentation.segment(image,
107+
...> prompt: {:point, 320, 240},
108+
...> repo: "SharpAI/sam2-hiera-small-onnx")
109+
```
110+
111+
`segment/2` accepts:
112+
113+
- `:repo` — any HuggingFace repo containing a SAM 2 ONNX export with separate encoder and decoder files
114+
- `:encoder_file` — encoder filename within the repo (default `"encoder.onnx"`)
115+
- `:decoder_file` — decoder filename within the repo (default `"decoder.onnx"`)
116+
117+
The protocol matches `SharpAI/sam2-hiera-tiny-onnx` (separate encoder/decoder, the standard SAM 2 ONNX export shape). Repos that bundle both into a single file or use a different I/O layout will not work without changes to the wrapper.
118+
119+
### Class-labeled (DETR-panoptic)
120+
121+
```elixir
122+
# Quantized variant — much smaller, some accuracy cost
123+
iex> Image.Segmentation.segment_panoptic(image, model_file: "onnx/model_quantized.onnx")
124+
125+
# A different ONNX-exported DETR-panoptic repo
126+
iex> Image.Segmentation.segment_panoptic(image, repo: "your-org/detr-panoptic-onnx")
127+
```
128+
129+
`segment_panoptic/2` accepts:
130+
131+
- `:repo` — any HuggingFace repo with a DETR-panoptic ONNX export and a `config.json` providing `id2label`
132+
- `:model_file` — ONNX filename within the repo (default `"onnx/model.onnx"`)
133+
134+
Labels are read from the repo's `config.json`. Where that config has placeholder `LABEL_n` entries, the wrapper falls back to the canonical [COCO panoptic taxonomy](https://github.com/cocodataset/panopticapi/blob/master/panoptic_coco_categories.json), so common stuff classes (`sky-other-merged`, `mountain-merged`, `grass-merged`, …) resolve correctly even on repos with incomplete configs.
135+
136+
### Pre-downloading
137+
138+
To populate the cache before first use:
139+
140+
```bash
141+
mix image_vision.download_models --segment
142+
```
143+
144+
This fetches the configured defaults. For non-default repos, the cache populates on first call to `segment/2` or `segment_panoptic/2`.
145+
98146
## Dependencies
99147

100148
Segmentation requires `:ortex`. Add to `mix.exs`:

livebooks/image_vision.livemd

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
# Image Vision
22

3+
> #### Livebook Desktop users {: .info}
4+
>
5+
> Livebook Desktop launches as a GUI app and does **not** inherit your terminal's `PATH`. If `Mix.install` below fails with "cargo: command not found" or similar, your Rust toolchain isn't visible to Livebook Desktop. Fix by creating `~/.livebookdesktop.sh` with at minimum:
6+
>
7+
> ```sh
8+
> export PATH="$HOME/.cargo/bin:$PATH"
9+
> # If you use mise/asdf, also activate them here.
10+
> ```
11+
>
12+
> Restart Livebook Desktop after editing the file. See the project [README](https://github.com/elixir-image/image_vision#prerequisites) for details on which toolchains are needed.
13+
314
```elixir
415
Mix.install(
516
[
@@ -54,7 +65,7 @@ Kino.DataTable.new(predictions)
5465
5566
## Object Detection
5667
57-
Upload an image and the model returns bounding boxes with labels for every recognised object. The default model is `PekingU/rtdetr_v2_r18vd` (RT-DETR v2, Apache 2.0, ~160 MB), which detects 80 COCO object classes.
68+
Upload an image and the model returns bounding boxes with labels for every recognised object. The default model is `onnx-community/rtdetr_r50vd` (RT-DETR, Apache 2.0, ~175 MB), which detects 80 COCO object classes.
5869
5970
```elixir
6071
detection_input = Kino.Input.image("Image for object detection")
@@ -136,7 +147,7 @@ Segmentation answers "exactly which pixels belong to this object?". We show two
136147
137148
### Panoptic segmentation
138149
139-
The default model is `facebook/detr-resnet-50-panoptic` (Apache 2.0, ~172 MB). It labels every pixel across 133 COCO categories — both things (cat, car) and stuff (sky, grass).
150+
The default model is `Xenova/detr-resnet-50-panoptic` (Apache 2.0, ~175 MB). It labels every pixel across 133 COCO panoptic categories — both things (cat, car) and stuff (sky, grass).
140151
141152
```elixir
142153
panoptic_input = Kino.Input.image("Image for panoptic segmentation")
@@ -174,7 +185,7 @@ end)
174185
175186
### Promptable segmentation (SAM 2)
176187
177-
The default model is `facebook/sam2.1-hiera-tiny` (Apache 2.0, ~150 MB). Click a point inside the object you want to isolate and SAM 2 returns a precise pixel mask for it.
188+
The default model is `SharpAI/sam2-hiera-tiny-onnx` (Apache 2.0, ~150 MB total — encoder + decoder). Click a point inside the object you want to isolate and SAM 2 returns a precise pixel mask for it.
178189
179190
```elixir
180191
sam_input = Kino.Input.image("Image for SAM 2 segmentation")

0 commit comments

Comments
 (0)