Skip to content

Commit 127dd3c

Browse files
hansentCopilotCopilot
authored
Add add-inference-model Claude skill (#2247)
* Add add-inference-model skill First draft of .claude/skills/add-inference-model/SKILL.md, capturing the end-to-end playbook for shipping a new pre-trained core model through inference_models + the model-registry-sdk. Covers the four surfaces a new model can touch (inference_models class + registry entry, weight zips + registration script, workflow block, legacy /infer adapter), notes that inference/models/<family>/ is deprecated, and instructs the skill to survey same-(backend, task) siblings in inference_models before scaffolding rather than copying any single model's idioms. Gotchas section captures the handful of real surprises collected while shipping the first model through this pattern; this file is explicitly a living document — future additions should extend it rather than rewrite. * Update .claude/skills/add-inference-model/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update .claude/skills/add-inference-model/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update .claude/skills/add-inference-model/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update .claude/skills/add-inference-model/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix test section: reference pytest.ini instead of non-existent conftest.py Agent-Logs-Url: https://github.com/roboflow/inference/sessions/100cdfb2-c19d-4a6b-b6c0-cc3f296db78b Co-authored-by: hansent <137617+hansent@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hansent <137617+hansent@users.noreply.github.com>
1 parent 45ef287 commit 127dd3c

1 file changed

Lines changed: 268 additions & 0 deletions

File tree

  • .claude/skills/add-inference-model
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
---
2+
name: add-inference-model
3+
description: Add a new core (pre-trained, non-user-fine-tuned) model to the inference repos. Covers the inference_models class + registry entry + tests, weight-zip preparation, the registration script against the Roboflow model registry, and the optional surfaces (workflow block, legacy-endpoint adapter). Trigger when the user asks to "add a model", "port a new model from HuggingFace", "wrap a transformers model", "expose X as a workflow block / model", or similar. This is a living skill — iterate on it each time a new model ships.
4+
---
5+
6+
# Adding a new core model
7+
8+
This skill is the end-to-end playbook for shipping a new **pre-trained / core** model. Don't invoke it for:
9+
10+
- user-fine-tuned models (different path — workspace/dataset/version ids)
11+
- new backends for an **existing** architecture (just extend the registry)
12+
- bug fixes inside an existing model (ordinary code change)
13+
14+
## Where the code goes (read this first)
15+
16+
There are two top-level Python packages in this repo:
17+
18+
- `inference_models/inference_models/`**the canonical place for new model implementations.** All new core models go here.
19+
- `inference/` — the older package. Its `inference/models/<family>/` subfolders (e.g. `inference/models/sam3/`, `inference/models/yolov8/`) hold legacy model implementations that predate `inference_models`. **Treat those as deprecated — never add a new one.** The path forward is always `inference_models/` first, then cross-reference via the surfaces below.
20+
21+
A new model has up to **four surfaces**, and you only wire up the ones you need:
22+
23+
| # | Surface | Required? | Location |
24+
| --- | --- | --- | --- |
25+
| 1 | Model class + registry entry | **Always** | `inference_models/inference_models/models/<family>/` + `models_registry.py` |
26+
| 2 | Weight zips + registry registration script | **Always** (so the model is actually loadable) | GCS test-assets bucket + PR to `roboflow/model-registry-sdk` |
27+
| 3 | Workflow block | Only if it should appear in workflows | `inference/core/workflows/core_steps/models/foundation/<name>/v1.py` |
28+
| 4 | Inference-models adapter | Only if it should serve on a plain `/infer` HTTP endpoint | Add a subclass in `inference/core/models/inference_models_adapters.py`, wire into `inference/models/utils.py` |
29+
30+
Quick guidance: plain HTTP `/infer` endpoint ⇒ add surface 4. Workflow visibility ⇒ add surface 3. Streaming video / stateful trackers typically only need 1-3 (state can't cross an `/infer` request boundary).
31+
32+
## Before scaffolding — survey existing models
33+
34+
`inference_models` has many models already. Before writing a single file, **read 2-3 same-backend, same-task siblings**. They carry patterns you should match (file layout, class naming, `from_pretrained` contract, how they handle device/dtype/quantization, how they shape the registry entry).
35+
36+
```
37+
ls inference_models/inference_models/models/
38+
ls inference_models/inference_models/models/auto_loaders/models_registry.py # see every registered arch
39+
```
40+
41+
Pick the 1-2 closest to the new model by (backend, task) and read:
42+
- the model class file
43+
- its registry entry
44+
- its unit test under `inference_models/tests/unit_tests/models/test_<family>.py`
45+
- its integration test + fixture if present
46+
47+
Only start scaffolding after you know which existing model yours most resembles. If nothing close exists (new backend, new task), flag that to the user — the skill's templates may not cover the gap.
48+
49+
## Discovery phase — ask the user
50+
51+
Before touching files, get concrete answers:
52+
53+
1. **Architecture name** (registry key string) — lower-case, hyphens OK, no slashes. This is the string matched in `models_registry.py`.
54+
2. **Task type** — choose one of the concrete task constants defined in `inference_models/inference_models/models/auto_loaders/models_registry.py` (for example `OBJECT_DETECTION_TASK`, etc.), and verify the exact service-side accepted string against the model-registry API/schema docs or the model-registry SDK.
55+
3. **Backend**`HF` / `TORCH` / `ONNX` / `TRT` (or `TORCH_SCRIPT`, `MEDIAPIPE`, etc. — check `BackendType`). Determines which sibling you survey.
56+
4. **Upstream weight source** — HF repo id, internal `.pt`, local files. **If HF and gated, stop** — the user needs to accept terms and supply an `HF_TOKEN` before any download.
57+
5. **Variants** — one id like `clip`, or a family like `foo/{tiny,small,large}` with a default. Variants share one registry entry; variant resolution lives in the weights provider.
58+
6. **Which surfaces?** Ask explicitly: workflow block? plain `/infer` adapter? If both, both get wired.
59+
7. **Any existing legacy implementation under `inference/models/<family>/`?** If yes, note that the new `inference_models` implementation is the replacement — don't delete the legacy in the same PR unless the user asks, but do avoid depending on it.
60+
61+
Write the plan back to the user and get an OK before moving on.
62+
63+
## Step-by-step
64+
65+
### 1. Model class (surface 1a)
66+
67+
Create `inference_models/inference_models/models/<family>/__init__.py` and `<family>/<family>_<backend>.py`. The **only** hard contract is a classmethod:
68+
69+
```python
70+
@classmethod
71+
def from_pretrained(cls, model_name_or_path: str, **kwargs) -> "YourModel": ...
72+
```
73+
74+
`model_name_or_path` points at a directory containing every file registered for the model package.
75+
76+
If your model fits a standard category, extend the corresponding base class (see `inference_models/docs/contributors/adding-model.md` for the catalog: `ObjectDetectionModel`, `InstanceSegmentationModel`, `ClassificationModel`, `KeyPointsDetectionModel`, `SemanticSegmentationModel`, etc.). If it doesn't, a standalone class is fine — base classes exist for consistency, not as a hard requirement.
77+
78+
For **shared plumbing across several HF models** (sessioned video trackers, etc.), check `inference_models/inference_models/models/common/` for reusable bases before writing your own.
79+
80+
Read `inference_models/docs/contributors/adding-model.md` and `inference_models/docs/contributors/writing-tests.md` — they cover the `from_pretrained` contract in more depth than this skill.
81+
82+
### 2. Registry entry (surface 1b)
83+
84+
Edit `inference_models/inference_models/models/auto_loaders/models_registry.py`. Add:
85+
86+
```python
87+
("<architecture>", <TASK_CONSTANT>, BackendType.<BACKEND>): LazyClass(
88+
module_name="inference_models.models.<family>.<family>_<backend>",
89+
class_name="<YourClass>",
90+
),
91+
```
92+
93+
The key is **only** `(architecture, task, backend)`**not** the variant. All variants of the same family share one entry. The variant suffix in the model id (e.g. `foo/tiny`) is resolved server-side by the weights provider. Don't add `"<arch>-tiny"` as a separate architecture.
94+
95+
Use `RegistryEntry` (instead of `LazyClass` directly) if the model has optional features like fused NMS — see existing `yolov8` entries for the pattern.
96+
97+
### 3. Unit tests — inference_models side
98+
99+
Create `inference_models/tests/unit_tests/models/test_<family>.py`. Mock the backend library (transformers / onnxruntime / torch) so the test runs without weights. Copy the structure from a nearby test that targets the same backend.
100+
101+
**Run from `inference_models/` cwd** so pytest uses `inference_models/pytest.ini` (or pass `-c inference_models/pytest.ini` from the repo root):
102+
103+
```bash
104+
cd inference_models
105+
python -m pytest tests/unit_tests/models/test_<family>.py -W ignore
106+
```
107+
108+
Running from the repo root without `-c inference_models/pytest.ini` silently mis-collects.
109+
110+
### 4. Integration test + fixture
111+
112+
Add a `..._PACKAGE_URL` constant and a `pytest.fixture(scope="module")` in `inference_models/tests/integration_tests/models/conftest.py` (follow existing patterns near other HF / torch packages). Add `inference_models/tests/integration_tests/models/test_<family>_predictions.py` marked `@pytest.mark.slow`. These run after step 6 uploads.
113+
114+
### 5. Workflow block (surface 3, optional)
115+
116+
Skip this section unless surface 3 is needed. Create `inference/core/workflows/core_steps/models/foundation/<family>/v1.py` + `__init__.py`. Read 1-2 existing blocks that match your pattern (stateless per-image vs. stateful per-video-session) before writing.
117+
118+
Block manifest fields to get right:
119+
120+
- `model_id` default — use the variant-qualified id, e.g. `"foo/small"` (not bare `"foo"`)
121+
- `examples` — list every shipping variant
122+
- `get_supported_model_variants()` — list every variant; **put the default first** (used as display name by the air-gapped cache scanner in `inference/core/cache/air_gapped.py`)
123+
124+
If the block holds per-video or otherwise per-request state, raise `NotImplementedError` in `__init__` when `step_execution_mode is StepExecutionMode.REMOTE` — remote sharding breaks stateful blocks. Fail at workflow-compile time, not at first-frame.
125+
126+
Register the block with the block loader (grep for an existing block's name in `inference/core/workflows/core_steps/loader.py` or similar to find the registration site).
127+
128+
Add unit tests at `tests/workflows/unit_tests/core_steps/models/foundation/test_<family>.py` — mock the inner `AutoModel.from_pretrained` and the model's inference call so the test isolates the block's branching/decision logic. Run from repo root:
129+
130+
```bash
131+
python -m pytest tests/workflows/unit_tests/core_steps/models/foundation/test_<family>*.py -W ignore
132+
```
133+
134+
### 6. Weight zips
135+
136+
For each variant, produce a **flat** zip — files at zip root, **no wrapping directory**. The test fixture `download_model_package` unzips the archive and calls `YourClass.from_pretrained(unzipped_dir)`; nested layouts break silently.
137+
138+
Typical fetch + zip from HF:
139+
140+
```python
141+
from huggingface_hub import snapshot_download
142+
snapshot_download(repo_id="", local_dir=out_dir, allow_patterns=[...])
143+
# then: zip every file at the root, no wrapping dir
144+
```
145+
146+
Verify each zip with `unzip -l <zip> | head -10` — first column should be bare filenames, not `wrapper/config.json`.
147+
148+
**Smoke-test the zip before uploading** by extracting to a temp dir and loading:
149+
150+
```python
151+
YourClass.from_pretrained(unzipped_dir, device="cpu")
152+
```
153+
154+
Upload to `gs://roboflow-tests-assets/rf-platform-models/<arch>-<variant>.zip`. Confirm each URL returns 200:
155+
156+
```bash
157+
curl -sI https://storage.googleapis.com/roboflow-tests-assets/rf-platform-models/<arch>-<variant>.zip | head -1
158+
```
159+
160+
### 7. Registration script (surface 2)
161+
162+
Clone `roboflow/model-registry-sdk` if you haven't. Add a script at `scripts/core_models/register_<family>_models.py`. Browse `scripts/core_models/` for existing same-backend templates; copy the nearest one and swap constants.
163+
164+
Shape of the registration flow (see the SDK's `registration_helpers.execute_model_package_registration` and the bare methods on `TheGOATModelsServiceClient`):
165+
166+
```python
167+
client = TheGOATModelsServiceClient(
168+
api_host=API_HOSTS[env], # staging=api.roboflow.one, prod=api.roboflow.com
169+
service_secret=os.environ["MODELS_SERVICE_INTERNAL_SECRET"],
170+
)
171+
client.register_pre_trained_model(model_id=f"{arch}/{variant}", model_architecture=arch,
172+
model_variant=variant, model_access=..., task_type=...)
173+
reg = client.register_model_package(file_handles=[...], package_manifest=...)
174+
for spec in reg.file_upload_specs:
175+
upload_from_local_file(source_file=local, target_uri=spec.gcs_uri)
176+
client.confirm_model_package_artefacts(..., seal_model_package=True)
177+
```
178+
179+
Note: the base SDK's client **does not** have a `.init(current_environment=...)` classmethod — that helper only exists on the `exp-registry-migration` repo's vendored copy (and it reads from GCP Secret Manager). Script constructs the client directly from an env var + hardcoded host per `--env`.
180+
181+
Open a PR against `roboflow/model-registry-sdk`. Do not run against production yet.
182+
183+
### 8. Inference-models adapter (surface 4, optional)
184+
185+
Skip unless the user wants plain `/infer` endpoint support. Add a subclass of `Model` to `inference/core/models/inference_models_adapters.py` matching your task (there are per-task parents: object detection, instance segmentation, classification, keypoints, semantic segmentation, etc. — read the existing adapters in that file). In the adapter `__init__`, follow the existing adapter constructors in that file: they call `AutoModel.from_pretrained(model_id_or_path=..., ...)` and pass through the additional flags they need (for example `allow_untrusted_packages`, `allow_direct_local_storage_loading`, backend selection, etc.), then store the result; predict / infer methods delegate.
186+
187+
Register the adapter by model architecture in `inference/models/utils.py` so `/infer?model_id=<arch>/<variant>` resolves to it. Follow the pattern other entries in that file use.
188+
189+
Most new models on this path will NOT need surface 4 — workflow blocks (surface 3) cover the majority use case. Add 4 only if there's a concrete requirement.
190+
191+
### 9. Run registration against staging
192+
193+
The user sets `MODELS_SERVICE_INTERNAL_SECRET` once per shell:
194+
195+
```bash
196+
export MODELS_SERVICE_INTERNAL_SECRET=$(gcloud secrets versions access latest \
197+
--secret=MODELS_SERVICE_INTERNAL_SECRET --project=878913763597)
198+
# 878913763597 = staging project; confirm before running
199+
```
200+
201+
Then a per-variant smoke test first, then the full set:
202+
203+
```bash
204+
python scripts/core_models/register_<family>_models.py --env staging --variants <one>
205+
python scripts/core_models/register_<family>_models.py --env staging
206+
```
207+
208+
Verify via the staging API (needs a staging Roboflow API key):
209+
210+
```bash
211+
curl -s "https://api.roboflow.one/models/v1/external/weights?modelId=<arch>/<variant>" \
212+
-H "Authorization: Bearer <staging-api-key>" | python3 -m json.tool
213+
```
214+
215+
Expect `status: ok` and `modelPackages[0].packageFiles` listing every file with `md5Hash` set.
216+
217+
### 10. End-to-end verify
218+
219+
Run `AutoModel.from_pretrained("<arch>/<variant>", api_key=<staging-key>)` against staging (set `ROBOFLOW_ENVIRONMENT=staging` or `ROBOFLOW_API_HOST=https://api.roboflow.one`) and exercise the model with a real input. If surface 3 was built, also run it through `debugrun.py` / a short MP4 via `InferencePipeline`. If surface 4 was built, hit `/infer?model_id=<arch>/<variant>`.
220+
221+
**When running `debugrun.py` or the inference server from the repo root**, avoid letting the repo-root `inference_models/` directory shadow the editable-installed `inference_models` package. On newer Python versions that support it, you can use `PYTHONSAFEPATH=1` (or `python -P`) so Python does not auto-add the script directory to `sys.path`. **Do not rely on `python -P` on Python 3.10**. For Python 3.10, prefer running from an installed environment via `python -m ...` instead of invoking a repo-root script directly, or adjust your `PYTHONPATH` / working directory so the repo-root namespace package is not on `sys.path`.
222+
223+
## Gotchas (real, collected as hit)
224+
225+
Add to this list as new surprises surface.
226+
227+
- **HF gating**: some `facebook/*` repos (e.g. `facebook/sam3`) return 401 on every file without an `HF_TOKEN`. Accept terms on the model page + generate a token before any download.
228+
- **Zip layout**: files at the zip root, no wrapping directory. The fixture unzips and calls `from_pretrained(that_dir)` — nested layouts break silently.
229+
- **Nested-list shape for HF video processors**: some processor methods expect inputs at a very specific nesting depth (e.g. `input_boxes` at 3 levels `[image [boxes [coords]]]`, not 4). Unit tests that mock the processor won't catch wrong nesting — always include one integration or e2e test that exercises the real `from_pretrained` + predict path against real weights, even if tiny-variant.
230+
- **State-requiring `.track()` / similar must raise on missing state**, not silently create an empty session. Empty-state-then-silent-success bugs are hard to detect.
231+
- **Numpy array truthiness**: `dict.get(a) or dict.get(b)` raises on numpy arrays. Use explicit `"a" in d` / `"b" in d` checks, or a small `_first_present` helper.
232+
- **SDK client auth**: `TheGOATModelsServiceClient.init(current_environment=...)` doesn't exist on the base SDK — only on `exp-registry-migration`'s vendored client. Our scripts construct the client directly from the env var + hardcoded host per `--env`.
233+
- **Transformers import-time side effects**: some transformers model classes (e.g. SAM3 video) do `import torchvision` at module import. Missing torchvision surfaces as `ModuleNotFoundError: Could not import module 'Sam3VideoModel'` — misleading. Not a prod issue, but confuses local setup.
234+
- **Stateful workflow blocks + remote execution**: if your block keeps per-video or per-request state, raise `NotImplementedError` in `__init__` when the execution mode is `REMOTE`. Failing at compile time beats failing on first frame.
235+
- **`get_supported_model_variants` order**: the first entry is the display name for the air-gapped cache scanner. Put your default variant first.
236+
- **`PYTHONSAFEPATH=1`** when running scripts from the repo root — see step 10.
237+
238+
## Verification checklist
239+
240+
Before declaring done:
241+
242+
- [ ] Architecture registered in `models_registry.py`; import + class resolve without error
243+
- [ ] Every variant zip uploads and `curl -sI` returns 200
244+
- [ ] `inference_models` unit tests pass (from `inference_models/` cwd)
245+
- [ ] If surface 3: workflow-block unit tests pass (from repo root)
246+
- [ ] Registration script merged or at least open as a PR against `roboflow/model-registry-sdk`
247+
- [ ] `register_*_models.py --env staging` completes without errors (run per-variant smoke test first)
248+
- [ ] Staging metadata API returns the model with every file + MD5 + sealed
249+
- [ ] `AutoModel.from_pretrained("<arch>/<default>")` loads + runs against staging
250+
- [ ] If surface 3: block runs end-to-end on a real input (image or MP4 via `InferencePipeline`)
251+
- [ ] If surface 4: `/infer?model_id=...` returns a valid prediction
252+
- [ ] `make style` clean
253+
- [ ] At least one non-mock integration test exercises the real call path
254+
- [ ] PR descriptions list remaining TODOs (other variants, production registration, additional surfaces deferred)
255+
256+
## Production registration
257+
258+
Only after staging is fully verified and the user explicitly approves:
259+
260+
```bash
261+
export MODELS_SERVICE_INTERNAL_SECRET=$(gcloud secrets versions access latest \
262+
--secret=MODELS_SERVICE_INTERNAL_SECRET --project=481589474394) # prod
263+
python scripts/core_models/register_<family>_models.py --env production
264+
```
265+
266+
## Iterating on this skill
267+
268+
Each new model added either confirms an assumption here (leave alone) or surfaces a gap (add a gotcha / template note). Non-HF backends (ONNX, TRT, TORCH) are underrepresented in today's templates — the next model through a non-HF path should add a step-1 note for its backend.

0 commit comments

Comments
 (0)