|
| 1 | +--- |
| 2 | +name: add-inference-model |
| 3 | +description: Add a new core (pre-trained, non-user-fine-tuned) model to the inference repos. Covers the inference_models class + registry entry + tests, weight-zip preparation, the registration script against the Roboflow model registry, and the optional surfaces (workflow block, legacy-endpoint adapter). Trigger when the user asks to "add a model", "port a new model from HuggingFace", "wrap a transformers model", "expose X as a workflow block / model", or similar. This is a living skill — iterate on it each time a new model ships. |
| 4 | +--- |
| 5 | + |
| 6 | +# Adding a new core model |
| 7 | + |
| 8 | +This skill is the end-to-end playbook for shipping a new **pre-trained / core** model. Don't invoke it for: |
| 9 | + |
| 10 | +- user-fine-tuned models (different path — workspace/dataset/version ids) |
| 11 | +- new backends for an **existing** architecture (just extend the registry) |
| 12 | +- bug fixes inside an existing model (ordinary code change) |
| 13 | + |
| 14 | +## Where the code goes (read this first) |
| 15 | + |
| 16 | +There are two top-level Python packages in this repo: |
| 17 | + |
| 18 | +- `inference_models/inference_models/` — **the canonical place for new model implementations.** All new core models go here. |
| 19 | +- `inference/` — the older package. Its `inference/models/<family>/` subfolders (e.g. `inference/models/sam3/`, `inference/models/yolov8/`) hold legacy model implementations that predate `inference_models`. **Treat those as deprecated — never add a new one.** The path forward is always `inference_models/` first, then cross-reference via the surfaces below. |
| 20 | + |
| 21 | +A new model has up to **four surfaces**, and you only wire up the ones you need: |
| 22 | + |
| 23 | +| # | Surface | Required? | Location | |
| 24 | +| --- | --- | --- | --- | |
| 25 | +| 1 | Model class + registry entry | **Always** | `inference_models/inference_models/models/<family>/` + `models_registry.py` | |
| 26 | +| 2 | Weight zips + registry registration script | **Always** (so the model is actually loadable) | GCS test-assets bucket + PR to `roboflow/model-registry-sdk` | |
| 27 | +| 3 | Workflow block | Only if it should appear in workflows | `inference/core/workflows/core_steps/models/foundation/<name>/v1.py` | |
| 28 | +| 4 | Inference-models adapter | Only if it should serve on a plain `/infer` HTTP endpoint | Add a subclass in `inference/core/models/inference_models_adapters.py`, wire into `inference/models/utils.py` | |
| 29 | + |
| 30 | +Quick guidance: plain HTTP `/infer` endpoint ⇒ add surface 4. Workflow visibility ⇒ add surface 3. Streaming video / stateful trackers typically only need 1-3 (state can't cross an `/infer` request boundary). |
| 31 | + |
| 32 | +## Before scaffolding — survey existing models |
| 33 | + |
| 34 | +`inference_models` has many models already. Before writing a single file, **read 2-3 same-backend, same-task siblings**. They carry patterns you should match (file layout, class naming, `from_pretrained` contract, how they handle device/dtype/quantization, how they shape the registry entry). |
| 35 | + |
| 36 | +``` |
| 37 | +ls inference_models/inference_models/models/ |
| 38 | +ls inference_models/inference_models/models/auto_loaders/models_registry.py # see every registered arch |
| 39 | +``` |
| 40 | + |
| 41 | +Pick the 1-2 closest to the new model by (backend, task) and read: |
| 42 | +- the model class file |
| 43 | +- its registry entry |
| 44 | +- its unit test under `inference_models/tests/unit_tests/models/test_<family>.py` |
| 45 | +- its integration test + fixture if present |
| 46 | + |
| 47 | +Only start scaffolding after you know which existing model yours most resembles. If nothing close exists (new backend, new task), flag that to the user — the skill's templates may not cover the gap. |
| 48 | + |
| 49 | +## Discovery phase — ask the user |
| 50 | + |
| 51 | +Before touching files, get concrete answers: |
| 52 | + |
| 53 | +1. **Architecture name** (registry key string) — lower-case, hyphens OK, no slashes. This is the string matched in `models_registry.py`. |
| 54 | +2. **Task type** — choose one of the concrete task constants defined in `inference_models/inference_models/models/auto_loaders/models_registry.py` (for example `OBJECT_DETECTION_TASK`, etc.), and verify the exact service-side accepted string against the model-registry API/schema docs or the model-registry SDK. |
| 55 | +3. **Backend** — `HF` / `TORCH` / `ONNX` / `TRT` (or `TORCH_SCRIPT`, `MEDIAPIPE`, etc. — check `BackendType`). Determines which sibling you survey. |
| 56 | +4. **Upstream weight source** — HF repo id, internal `.pt`, local files. **If HF and gated, stop** — the user needs to accept terms and supply an `HF_TOKEN` before any download. |
| 57 | +5. **Variants** — one id like `clip`, or a family like `foo/{tiny,small,large}` with a default. Variants share one registry entry; variant resolution lives in the weights provider. |
| 58 | +6. **Which surfaces?** Ask explicitly: workflow block? plain `/infer` adapter? If both, both get wired. |
| 59 | +7. **Any existing legacy implementation under `inference/models/<family>/`?** If yes, note that the new `inference_models` implementation is the replacement — don't delete the legacy in the same PR unless the user asks, but do avoid depending on it. |
| 60 | + |
| 61 | +Write the plan back to the user and get an OK before moving on. |
| 62 | + |
| 63 | +## Step-by-step |
| 64 | + |
| 65 | +### 1. Model class (surface 1a) |
| 66 | + |
| 67 | +Create `inference_models/inference_models/models/<family>/__init__.py` and `<family>/<family>_<backend>.py`. The **only** hard contract is a classmethod: |
| 68 | + |
| 69 | +```python |
| 70 | +@classmethod |
| 71 | +def from_pretrained(cls, model_name_or_path: str, **kwargs) -> "YourModel": ... |
| 72 | +``` |
| 73 | + |
| 74 | +`model_name_or_path` points at a directory containing every file registered for the model package. |
| 75 | + |
| 76 | +If your model fits a standard category, extend the corresponding base class (see `inference_models/docs/contributors/adding-model.md` for the catalog: `ObjectDetectionModel`, `InstanceSegmentationModel`, `ClassificationModel`, `KeyPointsDetectionModel`, `SemanticSegmentationModel`, etc.). If it doesn't, a standalone class is fine — base classes exist for consistency, not as a hard requirement. |
| 77 | + |
| 78 | +For **shared plumbing across several HF models** (sessioned video trackers, etc.), check `inference_models/inference_models/models/common/` for reusable bases before writing your own. |
| 79 | + |
| 80 | +Read `inference_models/docs/contributors/adding-model.md` and `inference_models/docs/contributors/writing-tests.md` — they cover the `from_pretrained` contract in more depth than this skill. |
| 81 | + |
| 82 | +### 2. Registry entry (surface 1b) |
| 83 | + |
| 84 | +Edit `inference_models/inference_models/models/auto_loaders/models_registry.py`. Add: |
| 85 | + |
| 86 | +```python |
| 87 | +("<architecture>", <TASK_CONSTANT>, BackendType.<BACKEND>): LazyClass( |
| 88 | + module_name="inference_models.models.<family>.<family>_<backend>", |
| 89 | + class_name="<YourClass>", |
| 90 | +), |
| 91 | +``` |
| 92 | + |
| 93 | +The key is **only** `(architecture, task, backend)` — **not** the variant. All variants of the same family share one entry. The variant suffix in the model id (e.g. `foo/tiny`) is resolved server-side by the weights provider. Don't add `"<arch>-tiny"` as a separate architecture. |
| 94 | + |
| 95 | +Use `RegistryEntry` (instead of `LazyClass` directly) if the model has optional features like fused NMS — see existing `yolov8` entries for the pattern. |
| 96 | + |
| 97 | +### 3. Unit tests — inference_models side |
| 98 | + |
| 99 | +Create `inference_models/tests/unit_tests/models/test_<family>.py`. Mock the backend library (transformers / onnxruntime / torch) so the test runs without weights. Copy the structure from a nearby test that targets the same backend. |
| 100 | + |
| 101 | +**Run from `inference_models/` cwd** so pytest uses `inference_models/pytest.ini` (or pass `-c inference_models/pytest.ini` from the repo root): |
| 102 | + |
| 103 | +```bash |
| 104 | +cd inference_models |
| 105 | +python -m pytest tests/unit_tests/models/test_<family>.py -W ignore |
| 106 | +``` |
| 107 | + |
| 108 | +Running from the repo root without `-c inference_models/pytest.ini` silently mis-collects. |
| 109 | + |
| 110 | +### 4. Integration test + fixture |
| 111 | + |
| 112 | +Add a `..._PACKAGE_URL` constant and a `pytest.fixture(scope="module")` in `inference_models/tests/integration_tests/models/conftest.py` (follow existing patterns near other HF / torch packages). Add `inference_models/tests/integration_tests/models/test_<family>_predictions.py` marked `@pytest.mark.slow`. These run after step 6 uploads. |
| 113 | + |
| 114 | +### 5. Workflow block (surface 3, optional) |
| 115 | + |
| 116 | +Skip this section unless surface 3 is needed. Create `inference/core/workflows/core_steps/models/foundation/<family>/v1.py` + `__init__.py`. Read 1-2 existing blocks that match your pattern (stateless per-image vs. stateful per-video-session) before writing. |
| 117 | + |
| 118 | +Block manifest fields to get right: |
| 119 | + |
| 120 | +- `model_id` default — use the variant-qualified id, e.g. `"foo/small"` (not bare `"foo"`) |
| 121 | +- `examples` — list every shipping variant |
| 122 | +- `get_supported_model_variants()` — list every variant; **put the default first** (used as display name by the air-gapped cache scanner in `inference/core/cache/air_gapped.py`) |
| 123 | + |
| 124 | +If the block holds per-video or otherwise per-request state, raise `NotImplementedError` in `__init__` when `step_execution_mode is StepExecutionMode.REMOTE` — remote sharding breaks stateful blocks. Fail at workflow-compile time, not at first-frame. |
| 125 | + |
| 126 | +Register the block with the block loader (grep for an existing block's name in `inference/core/workflows/core_steps/loader.py` or similar to find the registration site). |
| 127 | + |
| 128 | +Add unit tests at `tests/workflows/unit_tests/core_steps/models/foundation/test_<family>.py` — mock the inner `AutoModel.from_pretrained` and the model's inference call so the test isolates the block's branching/decision logic. Run from repo root: |
| 129 | + |
| 130 | +```bash |
| 131 | +python -m pytest tests/workflows/unit_tests/core_steps/models/foundation/test_<family>*.py -W ignore |
| 132 | +``` |
| 133 | + |
| 134 | +### 6. Weight zips |
| 135 | + |
| 136 | +For each variant, produce a **flat** zip — files at zip root, **no wrapping directory**. The test fixture `download_model_package` unzips the archive and calls `YourClass.from_pretrained(unzipped_dir)`; nested layouts break silently. |
| 137 | + |
| 138 | +Typical fetch + zip from HF: |
| 139 | + |
| 140 | +```python |
| 141 | +from huggingface_hub import snapshot_download |
| 142 | +snapshot_download(repo_id="…", local_dir=out_dir, allow_patterns=[...]) |
| 143 | +# then: zip every file at the root, no wrapping dir |
| 144 | +``` |
| 145 | + |
| 146 | +Verify each zip with `unzip -l <zip> | head -10` — first column should be bare filenames, not `wrapper/config.json`. |
| 147 | + |
| 148 | +**Smoke-test the zip before uploading** by extracting to a temp dir and loading: |
| 149 | + |
| 150 | +```python |
| 151 | +YourClass.from_pretrained(unzipped_dir, device="cpu") |
| 152 | +``` |
| 153 | + |
| 154 | +Upload to `gs://roboflow-tests-assets/rf-platform-models/<arch>-<variant>.zip`. Confirm each URL returns 200: |
| 155 | + |
| 156 | +```bash |
| 157 | +curl -sI https://storage.googleapis.com/roboflow-tests-assets/rf-platform-models/<arch>-<variant>.zip | head -1 |
| 158 | +``` |
| 159 | + |
| 160 | +### 7. Registration script (surface 2) |
| 161 | + |
| 162 | +Clone `roboflow/model-registry-sdk` if you haven't. Add a script at `scripts/core_models/register_<family>_models.py`. Browse `scripts/core_models/` for existing same-backend templates; copy the nearest one and swap constants. |
| 163 | + |
| 164 | +Shape of the registration flow (see the SDK's `registration_helpers.execute_model_package_registration` and the bare methods on `TheGOATModelsServiceClient`): |
| 165 | + |
| 166 | +```python |
| 167 | +client = TheGOATModelsServiceClient( |
| 168 | + api_host=API_HOSTS[env], # staging=api.roboflow.one, prod=api.roboflow.com |
| 169 | + service_secret=os.environ["MODELS_SERVICE_INTERNAL_SECRET"], |
| 170 | +) |
| 171 | +client.register_pre_trained_model(model_id=f"{arch}/{variant}", model_architecture=arch, |
| 172 | + model_variant=variant, model_access=..., task_type=...) |
| 173 | +reg = client.register_model_package(file_handles=[...], package_manifest=...) |
| 174 | +for spec in reg.file_upload_specs: |
| 175 | + upload_from_local_file(source_file=local, target_uri=spec.gcs_uri) |
| 176 | +client.confirm_model_package_artefacts(..., seal_model_package=True) |
| 177 | +``` |
| 178 | + |
| 179 | +Note: the base SDK's client **does not** have a `.init(current_environment=...)` classmethod — that helper only exists on the `exp-registry-migration` repo's vendored copy (and it reads from GCP Secret Manager). Script constructs the client directly from an env var + hardcoded host per `--env`. |
| 180 | + |
| 181 | +Open a PR against `roboflow/model-registry-sdk`. Do not run against production yet. |
| 182 | + |
| 183 | +### 8. Inference-models adapter (surface 4, optional) |
| 184 | + |
| 185 | +Skip unless the user wants plain `/infer` endpoint support. Add a subclass of `Model` to `inference/core/models/inference_models_adapters.py` matching your task (there are per-task parents: object detection, instance segmentation, classification, keypoints, semantic segmentation, etc. — read the existing adapters in that file). In the adapter `__init__`, follow the existing adapter constructors in that file: they call `AutoModel.from_pretrained(model_id_or_path=..., ...)` and pass through the additional flags they need (for example `allow_untrusted_packages`, `allow_direct_local_storage_loading`, backend selection, etc.), then store the result; predict / infer methods delegate. |
| 186 | + |
| 187 | +Register the adapter by model architecture in `inference/models/utils.py` so `/infer?model_id=<arch>/<variant>` resolves to it. Follow the pattern other entries in that file use. |
| 188 | + |
| 189 | +Most new models on this path will NOT need surface 4 — workflow blocks (surface 3) cover the majority use case. Add 4 only if there's a concrete requirement. |
| 190 | + |
| 191 | +### 9. Run registration against staging |
| 192 | + |
| 193 | +The user sets `MODELS_SERVICE_INTERNAL_SECRET` once per shell: |
| 194 | + |
| 195 | +```bash |
| 196 | +export MODELS_SERVICE_INTERNAL_SECRET=$(gcloud secrets versions access latest \ |
| 197 | + --secret=MODELS_SERVICE_INTERNAL_SECRET --project=878913763597) |
| 198 | +# 878913763597 = staging project; confirm before running |
| 199 | +``` |
| 200 | + |
| 201 | +Then a per-variant smoke test first, then the full set: |
| 202 | + |
| 203 | +```bash |
| 204 | +python scripts/core_models/register_<family>_models.py --env staging --variants <one> |
| 205 | +python scripts/core_models/register_<family>_models.py --env staging |
| 206 | +``` |
| 207 | + |
| 208 | +Verify via the staging API (needs a staging Roboflow API key): |
| 209 | + |
| 210 | +```bash |
| 211 | +curl -s "https://api.roboflow.one/models/v1/external/weights?modelId=<arch>/<variant>" \ |
| 212 | + -H "Authorization: Bearer <staging-api-key>" | python3 -m json.tool |
| 213 | +``` |
| 214 | + |
| 215 | +Expect `status: ok` and `modelPackages[0].packageFiles` listing every file with `md5Hash` set. |
| 216 | + |
| 217 | +### 10. End-to-end verify |
| 218 | + |
| 219 | +Run `AutoModel.from_pretrained("<arch>/<variant>", api_key=<staging-key>)` against staging (set `ROBOFLOW_ENVIRONMENT=staging` or `ROBOFLOW_API_HOST=https://api.roboflow.one`) and exercise the model with a real input. If surface 3 was built, also run it through `debugrun.py` / a short MP4 via `InferencePipeline`. If surface 4 was built, hit `/infer?model_id=<arch>/<variant>`. |
| 220 | + |
| 221 | +**When running `debugrun.py` or the inference server from the repo root**, avoid letting the repo-root `inference_models/` directory shadow the editable-installed `inference_models` package. On newer Python versions that support it, you can use `PYTHONSAFEPATH=1` (or `python -P`) so Python does not auto-add the script directory to `sys.path`. **Do not rely on `python -P` on Python 3.10**. For Python 3.10, prefer running from an installed environment via `python -m ...` instead of invoking a repo-root script directly, or adjust your `PYTHONPATH` / working directory so the repo-root namespace package is not on `sys.path`. |
| 222 | + |
| 223 | +## Gotchas (real, collected as hit) |
| 224 | + |
| 225 | +Add to this list as new surprises surface. |
| 226 | + |
| 227 | +- **HF gating**: some `facebook/*` repos (e.g. `facebook/sam3`) return 401 on every file without an `HF_TOKEN`. Accept terms on the model page + generate a token before any download. |
| 228 | +- **Zip layout**: files at the zip root, no wrapping directory. The fixture unzips and calls `from_pretrained(that_dir)` — nested layouts break silently. |
| 229 | +- **Nested-list shape for HF video processors**: some processor methods expect inputs at a very specific nesting depth (e.g. `input_boxes` at 3 levels `[image [boxes [coords]]]`, not 4). Unit tests that mock the processor won't catch wrong nesting — always include one integration or e2e test that exercises the real `from_pretrained` + predict path against real weights, even if tiny-variant. |
| 230 | +- **State-requiring `.track()` / similar must raise on missing state**, not silently create an empty session. Empty-state-then-silent-success bugs are hard to detect. |
| 231 | +- **Numpy array truthiness**: `dict.get(a) or dict.get(b)` raises on numpy arrays. Use explicit `"a" in d` / `"b" in d` checks, or a small `_first_present` helper. |
| 232 | +- **SDK client auth**: `TheGOATModelsServiceClient.init(current_environment=...)` doesn't exist on the base SDK — only on `exp-registry-migration`'s vendored client. Our scripts construct the client directly from the env var + hardcoded host per `--env`. |
| 233 | +- **Transformers import-time side effects**: some transformers model classes (e.g. SAM3 video) do `import torchvision` at module import. Missing torchvision surfaces as `ModuleNotFoundError: Could not import module 'Sam3VideoModel'` — misleading. Not a prod issue, but confuses local setup. |
| 234 | +- **Stateful workflow blocks + remote execution**: if your block keeps per-video or per-request state, raise `NotImplementedError` in `__init__` when the execution mode is `REMOTE`. Failing at compile time beats failing on first frame. |
| 235 | +- **`get_supported_model_variants` order**: the first entry is the display name for the air-gapped cache scanner. Put your default variant first. |
| 236 | +- **`PYTHONSAFEPATH=1`** when running scripts from the repo root — see step 10. |
| 237 | + |
| 238 | +## Verification checklist |
| 239 | + |
| 240 | +Before declaring done: |
| 241 | + |
| 242 | +- [ ] Architecture registered in `models_registry.py`; import + class resolve without error |
| 243 | +- [ ] Every variant zip uploads and `curl -sI` returns 200 |
| 244 | +- [ ] `inference_models` unit tests pass (from `inference_models/` cwd) |
| 245 | +- [ ] If surface 3: workflow-block unit tests pass (from repo root) |
| 246 | +- [ ] Registration script merged or at least open as a PR against `roboflow/model-registry-sdk` |
| 247 | +- [ ] `register_*_models.py --env staging` completes without errors (run per-variant smoke test first) |
| 248 | +- [ ] Staging metadata API returns the model with every file + MD5 + sealed |
| 249 | +- [ ] `AutoModel.from_pretrained("<arch>/<default>")` loads + runs against staging |
| 250 | +- [ ] If surface 3: block runs end-to-end on a real input (image or MP4 via `InferencePipeline`) |
| 251 | +- [ ] If surface 4: `/infer?model_id=...` returns a valid prediction |
| 252 | +- [ ] `make style` clean |
| 253 | +- [ ] At least one non-mock integration test exercises the real call path |
| 254 | +- [ ] PR descriptions list remaining TODOs (other variants, production registration, additional surfaces deferred) |
| 255 | + |
| 256 | +## Production registration |
| 257 | + |
| 258 | +Only after staging is fully verified and the user explicitly approves: |
| 259 | + |
| 260 | +```bash |
| 261 | +export MODELS_SERVICE_INTERNAL_SECRET=$(gcloud secrets versions access latest \ |
| 262 | + --secret=MODELS_SERVICE_INTERNAL_SECRET --project=481589474394) # prod |
| 263 | +python scripts/core_models/register_<family>_models.py --env production |
| 264 | +``` |
| 265 | + |
| 266 | +## Iterating on this skill |
| 267 | + |
| 268 | +Each new model added either confirms an assumption here (leave alone) or surfaces a gap (add a gotcha / template note). Non-HF backends (ONNX, TRT, TORCH) are underrepresented in today's templates — the next model through a non-HF path should add a step-1 note for its backend. |
0 commit comments