Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
161 commits
Select commit Hold shift + click to select a range
d3fa498
wip
grzegorz-roboflow Mar 11, 2026
a76369c
wip
grzegorz-roboflow Mar 11, 2026
f0dfe81
wip
grzegorz-roboflow Mar 12, 2026
74f2e44
wip
grzegorz-roboflow Mar 12, 2026
93b6bb6
wip
grzegorz-roboflow Mar 13, 2026
d92ef7f
Expose max batch size
grzegorz-roboflow Mar 17, 2026
5ec1f74
wip
grzegorz-roboflow Mar 18, 2026
392d1fe
wip
grzegorz-roboflow Mar 31, 2026
b35b91e
wip
grzegorz-roboflow Mar 31, 2026
d668c3a
wip
grzegorz-roboflow Apr 1, 2026
8ba2b13
wip
grzegorz-roboflow Apr 1, 2026
65a2633
wip
grzegorz-roboflow Apr 3, 2026
d52fd87
wip
grzegorz-roboflow Apr 3, 2026
79a4e68
enable pytest for inference and inference_models to be discovered by …
grzegorz-roboflow Apr 3, 2026
550dbb2
wip
grzegorz-roboflow Apr 3, 2026
57777ce
expose batch and allow to instantiate model class without loading model
grzegorz-roboflow Apr 3, 2026
2ac3ddf
wip / enable loading the same model many times
grzegorz-roboflow Apr 4, 2026
c5352ec
wip
grzegorz-roboflow Apr 10, 2026
f4e3286
wip
grzegorz-roboflow Apr 13, 2026
9f14af9
wip
grzegorz-roboflow Apr 16, 2026
9654624
model manager process
grzegorz-roboflow Apr 16, 2026
2f31dcd
launcher
grzegorz-roboflow Apr 16, 2026
3e5d68b
nvjpeg
grzegorz-roboflow Apr 16, 2026
e59545b
handle gone workers
grzegorz-roboflow Apr 16, 2026
708de9c
server
grzegorz-roboflow Apr 16, 2026
629f016
debug
grzegorz-roboflow Apr 16, 2026
56ce352
api key flows through request params
grzegorz-roboflow Apr 16, 2026
fe33851
defend against empty body / non-image
grzegorz-roboflow Apr 16, 2026
4cabd00
multi-instances, for now controlled by caller
grzegorz-roboflow Apr 16, 2026
1a55bb7
model manager process startup
grzegorz-roboflow Apr 16, 2026
c711f0a
fix
grzegorz-roboflow Apr 17, 2026
361dce1
mm e2e
grzegorz-roboflow Apr 17, 2026
0c9f260
tests
grzegorz-roboflow Apr 17, 2026
7ccfcb7
tests
grzegorz-roboflow Apr 17, 2026
3630be0
e2e fixes
grzegorz-roboflow Apr 17, 2026
3d32458
wip
grzegorz-roboflow Apr 17, 2026
3459fc0
move to cpu before pickling
grzegorz-roboflow Apr 17, 2026
4334f11
mps
grzegorz-roboflow Apr 17, 2026
8f9d371
pass nvjpeg
grzegorz-roboflow Apr 17, 2026
091eeab
bastch max size
grzegorz-roboflow Apr 17, 2026
c0bd0c8
dependencies
grzegorz-roboflow Apr 20, 2026
1080b4c
fixes / running on macos; add heic support
grzegorz-roboflow Apr 20, 2026
a1969f3
dependencies
grzegorz-roboflow Apr 20, 2026
620783e
make shm_pool_name mandatory, remove SHMPool.create fallback and _own…
grzegorz-roboflow Apr 21, 2026
b5fea3b
unify INPUT+RESULT into single DATA area & Update MMP, launcher, serv…
grzegorz-roboflow Apr 21, 2026
0e1bfb9
lazy-create shared SHMPool + pass to backends + submit logic + shutdo…
grzegorz-roboflow Apr 21, 2026
843a20b
tests
grzegorz-roboflow Apr 21, 2026
6f889d2
fixes to direct backend
grzegorz-roboflow Apr 21, 2026
d4b9246
Move model manager to inference_model_manager
grzegorz-roboflow Apr 21, 2026
0e1e06b
Revert load_weights
grzegorz-roboflow Apr 21, 2026
5667c8f
remove model manager tests from inference_models
grzegorz-roboflow Apr 21, 2026
a903eb5
move inference model manager tests
grzegorz-roboflow Apr 21, 2026
65f2ee1
imports
grzegorz-roboflow Apr 21, 2026
b6a7033
graceful drain
grzegorz-roboflow Apr 21, 2026
f6f443b
Health / readiness per model
grzegorz-roboflow Apr 21, 2026
eccde42
Explicit load / unload (HTTP scaffold)
grzegorz-roboflow Apr 21, 2026
1dd025c
list_models, health
grzegorz-roboflow Apr 21, 2026
61eddb0
preload
grzegorz-roboflow Apr 21, 2026
f2a9ef3
add serializer
grzegorz-roboflow Apr 22, 2026
d5feba9
dependnecies
grzegorz-roboflow Apr 22, 2026
cc227f2
Add inference_server; move mock app.py from inference_model_manager t…
grzegorz-roboflow Apr 22, 2026
92e2f1c
auth
grzegorz-roboflow Apr 22, 2026
698c25a
Fix exceptions
grzegorz-roboflow Apr 22, 2026
6b08e85
isort/formatting
grzegorz-roboflow Apr 22, 2026
6199742
fix eviction — unload(drain=True) instead of no-op sleep for subprocess
grzegorz-roboflow Apr 22, 2026
e1e22b1
Add ManagedModel to inference_models to expose interface used by infe…
grzegorz-roboflow Apr 22, 2026
f271315
Add ManagedModel to inference_models to expose interface used by infe…
grzegorz-roboflow Apr 22, 2026
360e2e0
add resolve_task, get_supported_tasks to ModelManager
grzegorz-roboflow Apr 22, 2026
8f0fb3b
infer -> process, add task to subproc backend
grzegorz-roboflow Apr 22, 2026
6bb83e2
task & interface discovery in inference_server
grzegorz-roboflow Apr 22, 2026
a798676
hot/cold eviction, idle timeout, request rate tracking, cache hit/miss
grzegorz-roboflow Apr 22, 2026
6066413
reactive admission loop — OOM evict-retry, proper load failure handling
grzegorz-roboflow Apr 22, 2026
510b3d6
code review
grzegorz-roboflow Apr 22, 2026
10c996d
code review
grzegorz-roboflow Apr 22, 2026
1c255b3
code review
grzegorz-roboflow Apr 22, 2026
d55c3e0
code review
grzegorz-roboflow Apr 22, 2026
6d76d65
code review
grzegorz-roboflow Apr 23, 2026
032229b
propagate errors from subprocess backend to caller
grzegorz-roboflow Apr 23, 2026
a289df0
code review
grzegorz-roboflow Apr 23, 2026
f3d7620
code review
grzegorz-roboflow Apr 23, 2026
cf99279
unrelated fixes
grzegorz-roboflow Apr 23, 2026
e6be8fd
code review
grzegorz-roboflow Apr 23, 2026
f5da973
stubs
grzegorz-roboflow Apr 27, 2026
bf0f4d4
instrument tests
grzegorz-roboflow Apr 27, 2026
ed27b78
model registry
grzegorz-roboflow Apr 27, 2026
8eb9547
add contributor guide
grzegorz-roboflow Apr 27, 2026
04e65f0
fix
grzegorz-roboflow Apr 27, 2026
a1e997e
remove max_batch_size from inference_models
grzegorz-roboflow Apr 27, 2026
bfa0fbe
batch size logic
grzegorz-roboflow Apr 27, 2026
c221c9f
Remove AutoModel resolve_class
grzegorz-roboflow Apr 27, 2026
fb002eb
formatting
grzegorz-roboflow Apr 27, 2026
eb569a8
adjust README so it's factual
grzegorz-roboflow Apr 27, 2026
abfa080
Wire v2 stubs: /v2/server/ready, /v2/server/info, /v2/server/metrics
grzegorz-roboflow Apr 28, 2026
b1b16bf
/models/load -> /v2/models/load; /models/unload -> /v2/models/unload
grzegorz-roboflow Apr 28, 2026
fa8e776
split app.py
grzegorz-roboflow Apr 28, 2026
589599e
formatting
grzegorz-roboflow Apr 28, 2026
7e8bab2
Add CLAUDE.local.md to .gitignore
grzegorz-roboflow Apr 28, 2026
5a97c7c
adjust README to include installation; adjust pyproject.toml to enabl…
grzegorz-roboflow Apr 28, 2026
71dfec4
wire /v2/models/interface
grzegorz-roboflow Apr 28, 2026
a529429
add lifecycle + dispatch tests
grzegorz-roboflow Apr 28, 2026
56a64b7
v2/models/infer step 1: response envelope + typed compact serialization
grzegorz-roboflow Apr 28, 2026
7dbf5f4
v2 structured error responses: error_response() helper, all v2 endpoi…
grzegorz-roboflow Apr 28, 2026
451bddd
v2/models/infer steps 2-4: structured errors, multipart form, JSON+ba…
grzegorz-roboflow Apr 28, 2026
ef4bd32
v2/models/infer step 5: URL-based image input via query param
grzegorz-roboflow Apr 28, 2026
9fc37d2
v2/models/infer step 6: batch inference — concurrent slot alloc, mult…
grzegorz-roboflow Apr 28, 2026
e40c73c
v2/models/infer step 7: rich format default, compact via ?style=compa…
grzegorz-roboflow Apr 28, 2026
31c76f2
formatting
grzegorz-roboflow Apr 28, 2026
1a77a94
Bump spool to 32MB as default, make it configurable via INFERENCE_MUL…
grzegorz-roboflow Apr 28, 2026
269cd27
cascade inference-models extras
grzegorz-roboflow Apr 28, 2026
a638e73
fix
grzegorz-roboflow Apr 28, 2026
8fac24c
uv torch index routing for model_manager + server
grzegorz-roboflow Apr 28, 2026
1090597
remove uv sources from model_manager+server, index routing lives in i…
grzegorz-roboflow Apr 28, 2026
282180e
capture empty body
grzegorz-roboflow Apr 28, 2026
7e1710a
fix: ASGI auth middleware (don't consume body stream), drop unused De…
grzegorz-roboflow Apr 28, 2026
e5d8911
fix: lazy env init in state.py (fork race), ASGI auth middleware (no …
grzegorz-roboflow Apr 28, 2026
acc2f9a
fix: MMP creates ModelManager internally, single shared SHM pool, T_L…
grzegorz-roboflow Apr 28, 2026
f2f1619
remove debug logs
grzegorz-roboflow Apr 28, 2026
efff296
formatting
grzegorz-roboflow Apr 29, 2026
09e48f8
fix: add missing torch-cu126/cu130 source routing in inference_models…
grzegorz-roboflow Apr 29, 2026
3b64984
inference_model_manager / inference_server uv workspaces
grzegorz-roboflow Apr 29, 2026
304841a
Add python-multipart
grzegorz-roboflow Apr 29, 2026
7e742bd
wire state/device into MMP stats, remove sleep/wake
grzegorz-roboflow Apr 29, 2026
3127a84
Merge branch 'main' into feat/new-model-manager
grzegorz-roboflow Apr 29, 2026
5777652
Merge branch 'main' into feat/new-model-manager
grzegorz-roboflow Apr 30, 2026
c3e6b8a
feat: auto-restart subprocess backend on worker crash
grzegorz-roboflow Apr 30, 2026
785ea7d
Fix batch size detection for TRT/ONNX models
grzegorz-roboflow Apr 30, 2026
ed16a18
Fix mmp_pending leak
grzegorz-roboflow Apr 30, 2026
b919e93
Add model type to stats (trt/onnx/torch)
grzegorz-roboflow Apr 30, 2026
9833400
fix tests
grzegorz-roboflow Apr 30, 2026
f16f7af
formatting
grzegorz-roboflow Apr 30, 2026
aad2cfb
feat: worker stats via heartbeat piggyback
grzegorz-roboflow Apr 30, 2026
50e182b
fix: GPU extras auto-pull cuda deps (pynvml, pycuda)
grzegorz-roboflow Apr 30, 2026
2738031
fix: queue_depth from MMP pending, drop redundant models from stats
grzegorz-roboflow Apr 30, 2026
0f9828c
feat: rich task params in registry (type, required, default)
grzegorz-roboflow Apr 30, 2026
13f4d24
fix: handle client disconnect (free slot, suppress traceback)
grzegorz-roboflow Apr 30, 2026
0ae93dc
Add temporary debugs
grzegorz-roboflow May 1, 2026
951505e
Merge branch 'main' into feat/new-model-manager
grzegorz-roboflow May 1, 2026
67658b4
collect all envs in common config.py
grzegorz-roboflow May 12, 2026
46481ff
formatting
grzegorz-roboflow May 12, 2026
5f3654a
fix(mmp): drop bogus manager= kwarg in standalone main()
grzegorz-roboflow May 13, 2026
a81f1a6
fix(registry): use exact-class skip check in _register_from_config
grzegorz-roboflow May 13, 2026
f882ec0
fix(model-manager): unify process_async with process() for subprocess…
grzegorz-roboflow May 13, 2026
4f46662
fix(mmp): persist api_key/device on ModelState for worker auto-reload
grzegorz-roboflow May 13, 2026
7a5e993
fix(mmp): enforce T_ENSURE_LOADED deadlines independently of load com…
grzegorz-roboflow May 13, 2026
953fae2
refactor(backends): drop public infer_sync/submit from Backend API
grzegorz-roboflow May 13, 2026
3055b55
fix(passthrough): use torch tensors so worker .cpu() is a no-op
grzegorz-roboflow May 13, 2026
7ce3e08
fix(subproc): cache slot capacity before mv.release() in oversized-re…
grzegorz-roboflow May 13, 2026
1d4edea
fix(model-manager): align submit() direct path with process()
grzegorz-roboflow May 13, 2026
9340f0d
Merge branch 'main' into feat/new-model-manager
grzegorz-roboflow May 14, 2026
b17fc32
Merge branch 'main' into feat/new-model-manager
grzegorz-roboflow May 14, 2026
ff22447
Remove debugs
grzegorz-roboflow May 14, 2026
57ea024
add ModelManagerProxy Protocol
grzegorz-roboflow May 19, 2026
2e4b49f
feat(inference_server): add MMPClient (ModelManagerProxy over ZMQ+SHM)
grzegorz-roboflow May 19, 2026
32d030e
feat(inference_server): add MMWrapper (ModelManagerProxy in-process)
grzegorz-roboflow May 19, 2026
90f3f2d
refactor(inference_server): v2_server router uses ModelManagerProxy
grzegorz-roboflow May 19, 2026
2de5509
refactor(inference_server): v2_models router uses ModelManagerProxy
grzegorz-roboflow May 19, 2026
b09ca90
refactor(inference_server): /infer router uses ModelManagerProxy
grzegorz-roboflow May 19, 2026
380b997
refactor(inference_server): delete state.py
grzegorz-roboflow May 19, 2026
2444562
refactor(inference_server): make launcher testable, drop env coupling
grzegorz-roboflow May 19, 2026
4a7b557
fix(inference_server): server.main injects INFERENCE_DEPLOYMENT_MODE=mmp
grzegorz-roboflow May 19, 2026
7762ed1
refactor(inference_server): extract input parsers into framework/
grzegorz-roboflow May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -214,4 +214,10 @@ inference_models/tests/e2e_platform_tests/assets/
inference_testing

# rerun.io recordings
*.rrd
*.rrd

# Test-downloaded model artifacts
inference_model_manager/tests/integration_tests/assets/

# Private CLAUDE context
CLAUDE.local.md
206 changes: 206 additions & 0 deletions inference_model_manager/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# inference-model-manager

Manages model lifecycle (load, unload, evict) and dispatches inference requests. Sits between your code and `inference-models` — you don't call models directly.

## Install

Requires Python 3.10–3.12. From this directory:

```bash
python -m venv .venv
source .venv/bin/activate
pip install uv

# For development: install inference-models editable first
# uv pip install -e "../inference_models"

# CPU (torch + ONNX)
uv pip install -e ".[torch-cpu,onnx-cpu]"

# CUDA 12.4
uv pip install -e ".[torch-cu124,onnx-cu12]"

# CUDA 12.6 + TRT (Jetson JP6)
uv pip install -e ".[torch-jp6-cu126,onnx-jp6-cu126]"
```

Extras cascade to `inference-models`.

## Backends

Two backends:

- **Direct** — model runs in your process. Simple, good for scripts and notebooks.
- **Subprocess** — model runs in a separate process with shared memory transport. Process isolation, GPU fault containment, zero-copy I/O.

## Direct backend

Model loads and runs in the same process. Fastest for single-model use.

```python
import urllib.request
import imagecodecs
from inference_model_manager.model_manager import ModelManager

mm = ModelManager()
mm.load("yolov8n-640", api_key="YOUR_KEY", backend="direct")

# imagecodecs returns RGB. Pass input_color_format="rgb" so model
# pre-processing knows not to flip channels (default assumes BGR).
image_bytes = urllib.request.urlopen("https://media.roboflow.com/dog.jpeg").read()
image = imagecodecs.imread(image_bytes)
result = mm.process("yolov8n-640", images=image, input_color_format="rgb")
print(result)
# {"type": "roboflow-object-detection-compact-v1", "class_names": [...], "xyxy": ..., ...}

mm.shutdown()
```

## Subprocess backend

Model loads in a child process. Communicates via shared memory pool — images written to SHM slots, results read back. No serialization on the hot path for image data.

```python
import urllib.request
import imagecodecs
from inference_model_manager.model_manager import ModelManager

mm = ModelManager()
mm.load("yolov8n-640", api_key="YOUR_KEY", backend="subprocess")

# Same API — backend difference is transparent.
image_bytes = urllib.request.urlopen("https://media.roboflow.com/dog.jpeg").read()
image = imagecodecs.imread(image_bytes)
result = mm.process("yolov8n-640", images=image, input_color_format="rgb")
print(result)
# Same typed dict output as direct backend.

mm.shutdown() # kills worker process, frees SHM
```

Use subprocess when you need:
- Process isolation (model crash doesn't kill your app)
- Multiple models on one GPU without GIL contention
- Worker-side batching (accumulate requests, decode + infer in one GPU call)

## Color format

Models default to BGR input (OpenCV convention). Pass `input_color_format` if your source is different:

| Source | Format | Pass to `process()` |
|--------|--------|---------------------|
| `imagecodecs.imread()` | RGB | `input_color_format="rgb"` |
| `cv2.imread()` | BGR | nothing (default) |
| `PIL.Image` → `np.array()` | RGB | `input_color_format="rgb"` |
| torch tensor (CHW) | RGB | nothing (tensor default is RGB) |

## Registering a model in the registry

Models in `inference-models` work standalone — no changes needed there. To make a model available through model manager (task dispatch, validation, typed serialization), add an entry to `registry_defaults.py`.

### Case 1: Model inherits from a registered base class

If your model inherits from `ObjectDetectionModel`, `ClassificationModel`, `InstanceSegmentationModel`, etc. — **nothing to do**. The registry matches by class name via MRO. Your model inherits the base class entry automatically.

```python
# inference_models/models/my_detector/my_detector.py
class MyDetector(ObjectDetectionModel):
def infer(self, images, **kwargs):
...
```

This works out of the box with `mm.process("my-detector", images=img)`.

### Case 2: New base class or model with unique tasks

Add entries to `_TASK_CONFIGS` in `registry_defaults.py`. Each entry is a tuple:

```
(task_name, method_name, is_default, params_dict, validator_name, serializer_name, response_type)
```

Example — a model with two tasks:

```python
# In registry_defaults.py _TASK_CONFIGS dict:
"MyCustomModel": [
("generate", "generate_output", True,
{
"images": {"type": "image", "required": True},
"prompt": {"type": "str", "required": True},
"temperature": {"type": "float", "required": False, "default": 0.7},
},
"validate_images_and_prompt", "serialize_text",
"roboflow-text-v1"),
("embed", "embed_images", False,
{"images": {"type": "image", "required": True}},
"validate_images_required", "serialize_embeddings",
"roboflow-embeddings-compact-v1"),
],
```

Reusable param fragments (`_P_IMAGES`, `_P_IMAGES_PROMPT`, `_K_OD`, etc.) are defined at the top of `registry_defaults.py`. Use `_p()` to merge them:

```python
"MyDetector": [
("infer", "infer", True, _p(_P_IMAGES, _K_OD),
"validate_images_required", "serialize_detections_compact",
"roboflow-object-detection-compact-v1"),
],
```

Fields:
- **task_name** — what users pass as `task=` param (e.g. `mm.process("model", task="embed")`)
- **method_name** — actual method on the model class to call (can differ from task_name)
- **is_default** — exactly one task must be `True`; used when `task=None`
- **params_dict** — `{name: {type, required, default?}}` — exposed in stats/interface for API discovery
- **validator_name** — function from `validators.py` (e.g. `"validate_images_required"`)
- **serializer_name** — function from `serializers_typed.py` (e.g. `"serialize_text"`)
- **response_type** — type string for JSON response envelope

If your model inherits from a registered base class but has different params (e.g. different defaults), add a concrete class entry — MRO picks it up first.

### Case 3: Custom validator or serializer

Add to `validators.py` or `serializers_typed.py`:

```python
# validators.py
def validate_my_custom_input(kwargs: dict) -> dict:
if "images" not in kwargs:
raise ValueError("'images' required")
if "language" not in kwargs:
raise ValueError("'language' required for this model")
return kwargs
```

```python
# serializers_typed.py
def serialize_my_custom_output(output, model) -> dict:
return {
"type": "my-custom-output-v1",
"result": output.result,
"metadata": output.metadata,
}
```

Then reference by name in `_TASK_CONFIGS`:

```python
"MyCustomModel": [
("infer", "infer", True, ["images", "language"],
"validate_my_custom_input", "serialize_my_custom_output",
"my-custom-output-v1"),
],
```

### How it works

Registration is lazy. Nothing is imported until `ModelManager.load()` is called. At that point:

1. Backend loads the model (`AutoModel.from_pretrained` for direct, worker subprocess for subprocess)
2. For direct backend: `lazy_register(type(model))` walks the class MRO
3. For subprocess backend: worker sends MRO class names in READY pipe, `lazy_register_by_names(mro_names)` matches by string
4. For each ancestor, checks if `cls.__name__` has an entry in `_TASK_CONFIGS`
5. If found, registers the tasks (imports only validators/serializers — pure Python, no heavy deps)
6. Subsequent `process()` calls use the registered entry for dispatch + serialization
25 changes: 25 additions & 0 deletions inference_model_manager/inference_model_manager/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""
Public API of the ``inference-model-manager`` package.

The model manager orchestrates model lifecycle (load/unload/evict),
SHM-based zero-copy transport, subprocess worker management, and batched
GPU inference.

Usage::

from inference_model_manager import ModelManager

mm = ModelManager()
mm.load("yolov8n-640", api_key=key, backend="subprocess")
result = mm.process("yolov8n-640", images=image, confidence=0.7)
mm.shutdown()
"""

import importlib.metadata as _meta

try:
__version__ = _meta.version(__package__ or __name__)
except _meta.PackageNotFoundError:
__version__ = "development"

from inference_model_manager.model_manager import ModelManager
Empty file.
Loading
Loading