Commit 0085452
authored
Add SAM2 streaming video tracker (inference_models + workflow block) (#2245)
* Add SAM3ForStream and streaming-video tests in inference_models
Introduces a SAM3 streaming tracker that mirrors the existing
SAM2ForStream interface (prompt / track returning (masks, object_ids,
state_dict)) so both can be used interchangeably by upstream code.
- inference_models/models/sam3_rt/sam3_pytorch.py: SAM3ForStream
backed by HuggingFace transformers' Sam3VideoModel /
Sam3VideoProcessor. The native sam3 package's video predictor
requires a full video resource upfront; the transformers port
exposes init_video_session + per-frame model(frame=...), which is
the shape we need for InferencePipeline-style streaming.
- Accepts bbox and/or text prompts; state_dict is opaque (wraps the
HF Sam3VideoInferenceSession) and must be kept in memory by the
caller — it's not serializable across processes.
- Register (segment-anything-3-rt, INSTANCE_SEGMENTATION_TASK,
BackendType.HF) in models_registry alongside the existing
SAM2-RT entry.
Tests:
- inference_models/tests/unit_tests/models/test_sam3_rt.py — 24
unit tests covering helpers (_normalise_bboxes,
_unpack_processed_outputs, etc.) plus class behaviour using
MagicMock model/processor. No weights required.
- inference_models/tests/integration_tests/models/test_sam2_rt_predictions.py
— new integration suite for the existing SAM2ForStream
(prompt -> track on synthetic frames, centroid-moves assertion,
track-without-prompt raises, torch.Tensor input).
- inference_models/tests/integration_tests/models/test_sam3_rt_predictions.py
— analogous suite for SAM3ForStream.
- conftest.py: sam2_rt_package and sam3_rt_package fixtures
download zips from rf-platform-models; docstrings list the
expected file contents for upload.
https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN
* Add SAM2/SAM3 streaming video tracker workflow blocks
Two new LOCAL-only workflow blocks that drive the inference_models
streaming trackers (SAM2ForStream / SAM3ForStream) from workflows
powered by InferencePipeline. Both blocks multiplex a single model
instance across many videos by keying state_dicts on
video_metadata.video_identifier, reset sessions when frame_number
rolls back, and support three prompt modes: first_frame,
every_n_frames, every_frame.
- inference/core/workflows/core_steps/models/foundation/
_streaming_video_common.py: shared helpers (state bookkeeping,
prompt-vs-track decision logic, sv.Detections assembly with
SAM-assigned tracker_ids).
- segment_anything2_video/v1.py: SAM2VideoTrackerBlockV1
(type: roboflow_core/segment_anything_2_video@v1,
default model_id: segment-anything-2-rt).
- segment_anything3_video/v1.py: SAM3VideoTrackerBlockV1
(type: roboflow_core/sam3_video@v1,
default model_id: segment-anything-3-rt). Additionally accepts
text prompts via class_names; boxes win when both are supplied.
- Both raise NotImplementedError on REMOTE step execution — per-video
session state cannot survive a remote boundary.
- Models are loaded via inference_models.AutoModel.from_pretrained
so backend negotiation / package download / caching flow through
the standard inference_models pipeline.
- Registered in core_steps/loader.py.
Tests (30 total, all passing, no weights required):
- test_segment_anything2_video.py — 10 tests covering manifest,
REMOTE rejection, first_frame/every_n_frames/every_frame modes,
state threading across track calls, multi-stream isolation,
stream-restart detection.
- test_segment_anything3_video.py — 9 tests with similar coverage
plus text-vs-box prompt routing.
- test_streaming_video_common.py — 11 tests for the shared helpers.
https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN
* Rename SAM video trackers to sam2video/sam3video; add SAM2Video (HF)
Refactors the streaming trackers into a shared HuggingFace
transformers base and adds a SAM2Video counterpart to the existing
SAM3Video. The older sam2_rt (SAM2ForStream using Meta's sam2 camera
predictor) is kept untouched — per the feedback it hasn't been
exercised much in practice.
Model classes
-------------
- inference_models/models/common/hf_streaming_video.py:
HFStreamingVideoBase containing all the HF streaming boilerplate —
session init, prompt/track methods, mask/obj_id extraction, opaque
state_dict contract.
- inference_models/models/sam2_video/sam2_video_hf.py: SAM2Video
(lazy-imports transformers.Sam2VideoModel / Sam2VideoProcessor;
rejects text prompts).
- inference_models/models/sam3_video/sam3_video_hf.py: SAM3Video,
moved from the previous sam3_rt path; now a thin ~25-line subclass
after the shared base absorbed the helpers (lazy-imports
transformers.Sam3VideoModel / Sam3VideoProcessor; accepts both
text and box prompts).
Registry
--------
- sam2video: (INSTANCE_SEGMENTATION_TASK, BackendType.HF) -> SAM2Video
- sam3video: (INSTANCE_SEGMENTATION_TASK, BackendType.HF) -> SAM3Video
- segment-anything-2-rt stays registered against SAM2ForStream.
- segment-anything-3-rt entry dropped (never released).
Workflow blocks now default to these ids:
- roboflow_core/segment_anything_2_video@v1 -> "sam2video"
- roboflow_core/sam3_video@v1 -> "sam3video"
Tests
-----
- Unit tests: added test_sam2_video.py (4 SAM2-specific), renamed
test_sam3_rt.py -> test_sam3_video.py and updated imports (24 tests
covering helpers on the shared base plus SAM3 class behaviour).
- Integration tests: renamed SAM3 file, added SAM2 counterpart. New
fixtures sam2_video_package / sam3_video_package (expected zips at
rf-platform-models/sam2video.zip and rf-platform-models/sam3video.zip).
- Workflow block tests updated to use sam2video / sam3video ids.
- All 58 non-integration tests pass locally.
https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN
* Drop SAM2-RT tests/fixture so this PR doesn't touch the legacy model
Removes the integration tests and fixture I'd added for the existing
SAM2ForStream (sam2_rt) — keeping them would require uploading a
segment-anything-2-rt.zip to the test assets bucket, but the goal for
this PR is to leave that untested path alone.
- Deleted tests/integration_tests/models/test_sam2_rt_predictions.py
- Removed SAM2_RT_PACKAGE_URL + sam2_rt_package fixture from conftest.py
- Fixed two docstring references that still said SAM2ForStream when they
now point at the new SAM2Video.
The SAM2ForStream registry entry itself stays — it's the legacy model
that existed before this branch and we're not touching it.
https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN
* Add handoff doc for local testing + follow-up
Captures the state of this branch while session context is fresh —
what's where, how to test, what needs uploading, known gotchas,
and a sketch of the follow-up "add a model" Claude skill. Doc is
temporary and should be deleted before the PR merges.
https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN
* SAM2 video: variant model ids + input_boxes nesting fix
- Default workflow block to sam2video/small, advertise all four
Hiera backbones (tiny / small / base-plus / large) via examples
and get_supported_model_variants.
- Fix Sam2VideoProcessor.add_inputs_to_inference_session call: the
processor expects input_boxes with 3 nesting levels ([image,
boxes, coords]); we were passing 4, which raised ValueError on the
first real-weights prompt. Unit tests missed it because they mock
the processor — surfaced by end-to-end verify against the uploaded
sam2video-small.zip.
- Point the inference_models integration fixture URL at
sam2video-small.zip (the variant that matches the new default).
- Update sam2video workflow-block unit tests to pass the new
sam2video/small default through the mocks.
* Strip SAM3 video tracker from this PR
Descope the SAM3 streaming-video work to a follow-up so this PR can
ship SAM2 video alone. SAM3's HF port requires the gated facebook/sam3
checkpoints, which aren't available yet.
Removed:
- inference_models SAM3 model class, registry entry, unit + integration
tests, and test fixture
- sam3_video workflow block + loader registration + unit tests
- SAM_VIDEO_HANDOFF.md (served its purpose during the branch work)
Left in place:
- HFStreamingVideoBase in inference_models/models/common — reusable,
SAM2 uses it today and a future SAM3 port can inherit unchanged
- _streaming_video_common workflow helpers — still used by the SAM2
video block
- SAM2 video class, registry entry, workflow block, and all SAM2 tests
Comments that referenced SAM3Video / test_sam3_video.py have been
generalised or trimmed so nothing dangles.1 parent 127dd3c commit 0085452
13 files changed
Lines changed: 1746 additions & 0 deletions
File tree
- inference_models
- inference_models/models
- auto_loaders
- common
- sam2_video
- tests
- integration_tests/models
- unit_tests/models
- inference/core/workflows/core_steps
- models/foundation
- segment_anything2_video
- tests/workflows/unit_tests/core_steps/models/foundation
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
280 | 283 | | |
281 | 284 | | |
282 | 285 | | |
| |||
835 | 838 | | |
836 | 839 | | |
837 | 840 | | |
| 841 | + | |
838 | 842 | | |
839 | 843 | | |
840 | 844 | | |
| |||
Lines changed: 232 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
Whitespace-only changes.
0 commit comments