Draft - MLX Experimentation by Nash0x7E2 · Pull Request #474 · GetStream/Vision-Agents

Nash0x7E2 · 2026-04-07T20:47:26Z

Summary by CodeRabbit

New Features
- Added a runnable Resale-Advisor multimodal voice example.
- Added local Apple Silicon support for MLX text and vision‑language backends.
Refactor
- Centralized local text and VLM inference into shared base modules and migrated existing backends to use them.
- Introduced new MLX backend implementations and lazy exports for MLX components.
Tests
- Added macOS/Apple‑Silicon pytest markers to skip incompatible tests.
- Relaxed a detection test for broader label compatibility.
Chores
- Broadened Hugging Face/transformers dependency ranges and expanded MLX-related optional extras.

coderabbitai · 2026-04-07T20:47:38Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds shared local inference bases (LocalTextLLM, LocalVLM), MLX-backed plugins (MlxLLM, MlxVLM), refactors Transformers plugins to use the new bases, updates dependency constraints and MLX test skip markers, tweaks several plugin implementations/tests, and adds a resale‑advisor multimodal example.

Changes

Cohort / File(s)	Summary
Test Infrastructure `conftest.py`	Added `requires_mlx` and `requires_mlx_vlm` pytest skip markers that skip tests unless on macOS (darwin) with arm64 and the respective MLX modules discoverable.
Examples `examples/12_resale_advisor_example/pyproject.toml`, `examples/12_resale_advisor_example/resale_advisor_example.py`	New example project and runnable script wiring a multimodal “Resale Advisor” agent (GetStream connectivity, Hugging Face MLX‑VLM, Deepgram STT/TTS) with `create_agent()` and `join_call()`.
Local Inference Core `plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py`, `plugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.py`	New shared abstractions/utilities: `LocalTextLLM` and `LocalVLM` centralize warmup, message/frame building, streaming vs non‑streaming generation, tool‑call extraction, and local tool‑call followup loop.
MLX Plugins `plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py`, `plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py`	New MLX-backed implementations (`MlxLLM`, `MlxVLM`) implementing sync model loading, template application, streaming/non‑stream generation, and integrating with Local* bases.
Transformers Refactor `plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`, `plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`	Refactored to inherit from `LocalTextLLM`/`LocalVLM`; device resolution added, generation helpers simplified, and warmup/video-forwarding/tool plumbing moved to shared bases.
Plugin Exports & Warnings `plugins/huggingface/vision_agents/plugins/huggingface/__init__.py`	Added lazy `__getattr__` exports for `MlxLLM`/`MlxVLM` with targeted warnings/classification for MLX-related ImportError cases.
Dependency & Config Updates `plugins/huggingface/pyproject.toml`, `plugins/moondream/pyproject.toml`, `plugins/smart_turn/pyproject.toml`, `pyproject.toml`, `examples/.../pyproject.toml`	Updated dependency bounds (`huggingface_hub`, `transformers`), added `mlx`/`mlx-vlm` optional groups and Darwin/arm64 markers, and added local editable sources for the new example.
Tests & Detection `plugins/huggingface/tests/test_transformers_vlm.py`, `plugins/roboflow/tests/test_roboflow.py`, `plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py`	Adjusted tests to pass explicit `processor` arg and relaxed label assertions; normalized `id2label` to a `dict[int, str]` in detection code.
Roboflow Processor `plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py`	Switched annotation/class mapping to indexed enumeration and rebuilt filtered `Detections` by subsetting arrays and reorganizing per-detection data/metadata.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant LocalLLM as LocalTextLLM/LocalVLM
    participant Video as VideoForwarder
    participant ToolRunner
    participant Events
    Client->>LocalLLM: request(messages, frames?, stream?)
    alt Video attached
        Video->>LocalLLM: supply buffered frames
    end
    LocalLLM->>LocalLLM: build messages / apply template
    LocalLLM->>LocalLLM: generate (stream / non-stream) -> raw_text
    LocalLLM->>Events: emit start / chunk / completed
    alt tool calls detected
        LocalLLM->>ToolRunner: execute tool calls
        ToolRunner-->>LocalLLM: tool results
        LocalLLM->>LocalLLM: regenerate follow-up (repeat up to max rounds)
        LocalLLM->>Events: emit followup events
    end
    LocalLLM-->>Client: final LLMResponseEvent

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Refactor hf tool calling #455 — overlaps Hugging Face plugin tool-calling paths and shared tool-call helpers.
add huggingface transformers plugin #350 — related changes to Hugging Face local inference backends and example integrations.

Suggested reviewers

aliev

Poem

The room is a small bright animal of glass,
the code strips its tongue and learns to hesitate,
it names the wrong thing softly, then corrects,
an apple warms the air with mute intent,
the agent answers, careful as a cut.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 32.47% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is vague and generic, using non-descriptive terms like 'Draft' and 'Experimentation' that don't convey meaningful information about the substantive changes.	Revise the title to clearly specify the main change, such as 'Add MLX and MLX-VLM support for local inference' or 'Implement local Apple Silicon inference backends'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/mlx-hf

⚔️ Resolve merge conflicts

Resolve merge conflict in branch feature/mlx-hf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

uv sync --all-extras in CI was activating the plugin's MLX extras on Linux, pulling in mlx/mlx-lm wheels that then failed to import at runtime because libmlx.so is not in the Linux wheel.

ImportError.name is None when a transitive dep fails on a missing shared library (e.g. libmlx.so on Linux), so the previous e.name set check let the exception escape. Match on the message as a fallback.

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (5)

conftest.py (1)
31-42: Optional: tighten platform gate to Apple Silicon.

sys.platform == "darwin" also matches Intel Macs, where mlx/mlx_lm won't actually run even if somehow importable. Since the root pyproject.toml already constrains installation via platform_machine == 'arm64', the marker can mirror that for consistency:
Suggested refinement
+import platform
 ...
 requires_mlx = pytest.mark.skipif(
-    sys.platform != "darwin" or importlib.util.find_spec("mlx_lm") is None,
+    sys.platform != "darwin"
+    or platform.machine() != "arm64"
+    or importlib.util.find_spec("mlx_lm") is None,
     reason="MLX tests require Apple Silicon with mlx-lm installed",
 )
 ...
 requires_mlx_vlm = pytest.mark.skipif(
-    sys.platform != "darwin" or importlib.util.find_spec("mlx_vlm") is None,
+    sys.platform != "darwin"
+    or platform.machine() != "arm64"
+    or importlib.util.find_spec("mlx_vlm") is None,
     reason="MLX-VLM tests require Apple Silicon with mlx-vlm installed",
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conftest.py` around lines 31 - 42, The current skip markers requires_mlx and
requires_mlx_vlm gate only on sys.platform == "darwin" which includes Intel
Macs; tighten them to also require Apple Silicon by adding a check that
platform.machine() == "arm64" (e.g., require sys.platform == "darwin" and
platform.machine() == "arm64" and the existing importlib.util.find_spec(...)
check) so the markers only run on Apple Silicon where mlx/mlx_vlm are supported.
plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py (1)
56-87: _load_model_sync tuple unpack — minor readability nit.

result = load(self.model_id); MlxModelResources(model=result[0], tokenizer=result[1]) reads as though load() might return more than two items; it doesn't. A direct unpack documents the contract better.
♻️ Tiny tidy-up
-    def _load_model_sync(self) -> MlxModelResources:
-        result = load(self.model_id)
-        return MlxModelResources(model=result[0], tokenizer=result[1])
+    def _load_model_sync(self) -> MlxModelResources:
+        model, tokenizer = load(self.model_id)
+        return MlxModelResources(model=model, tokenizer=tokenizer)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py` around
lines 56 - 87, The _load_model_sync implementation uses indexing into result
instead of unpacking; change load(self.model_id) to return two values captured
via direct tuple unpacking like model, tokenizer = load(self.model_id) and then
construct MlxModelResources(model=model, tokenizer=tokenizer) to make the
contract explicit and improve readability (modify the _load_model_sync function
and keep use of MlxModelResources).
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py (1)
146-160: do_sample is configurable on TransformersVLM but hardcoded to True here.

The sibling TransformersVLM in this same PR added a do_sample: bool = True constructor flag (see transformers_vlm.py lines 86, 100, 164-166). Keeping TransformersLLM hardcoded means users can't run greedy decoding on the text model, and users who configure both will get asymmetric behavior for the same parameter. Consider surfacing the same knob and mirroring the temperature if self._do_sample else 1.0 dance in both _generate_streaming and _generate_non_streaming.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`
around lines 146 - 160, Add a configurable do_sample flag to TransformersLLM by
adding a do_sample: bool = True parameter to the __init__ signature, store it as
self._do_sample, and update both _generate_streaming and _generate_non_streaming
to use temperature = self._temperature if self._do_sample else 1.0 (or the
existing temperature field name) so greedy decoding is possible and behavior
matches TransformersVLM; ensure any references to sampling behavior use
self._do_sample consistently.
plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py (1)
88-95: Prefer keyword args when forwarding to LocalVLM.__init__.

Six positional arguments to a base constructor defined in a sibling file is a time bomb: any reorder of LocalVLM.__init__ will silently rebind parameters here (e.g., swapping fps and max_frames would still type-check). Keyword passthrough is cheap insurance.
♻️ Proposed change
-        super().__init__(
-            model,
-            fps,
-            frame_buffer_seconds,
-            max_frames,
-            max_new_tokens,
-            max_tool_rounds,
-        )
+        super().__init__(
+            model=model,
+            fps=fps,
+            frame_buffer_seconds=frame_buffer_seconds,
+            max_frames=max_frames,
+            max_new_tokens=max_new_tokens,
+            max_tool_rounds=max_tool_rounds,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`
around lines 88 - 95, Replace the positional argument forwarding in the subclass
constructor with explicit keyword arguments to match LocalVLM.__init__; update
the call to super().__init__ in transformers_vlm (the subclass shown) to pass
model=model, fps=fps, frame_buffer_seconds=frame_buffer_seconds,
max_frames=max_frames, max_new_tokens=max_new_tokens, and
max_tool_rounds=max_tool_rounds so parameter bindings remain correct even if
LocalVLM.__init__ reorders parameters.
examples/12_resale_advisor_example/pyproject.toml (1)
6-14: Prune redundant / unused dependencies.

mlx-vlm is already pulled in by the [mlx-vlm] extra on vision-agents-plugins-huggingface; listing it again at the top level is duplication.

The transformers extra drags in transformers, torch, accelerate, supervision, and opencv-python (see plugins/huggingface/pyproject.toml lines 18-39), but this example only uses MlxVLM. That's a heavy install for nothing.

torchvision is declared directly but not imported by resale_advisor_example.py, nor required by the MLX path.
♻️ Proposed trim
 dependencies = [
   "python-dotenv>=1.0",
-  "vision-agents-plugins-huggingface[mlx-vlm,transformers]",
+  "vision-agents-plugins-huggingface[mlx-vlm]",
   "vision-agents-plugins-getstream",
   "vision-agents-plugins-deepgram",
   "vision-agents",
-  "mlx-vlm",
-  "torchvision",
 ]
If torchvision is genuinely required by some transitive code path in mlx-vlm, please leave a comment explaining why; otherwise drop it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/12_resale_advisor_example/pyproject.toml` around lines 6 - 14,
Remove redundant and heavy deps: delete the top-level "mlx-vlm" entry and the
"torchvision" entry from the dependencies array, and narrow the HuggingFace
plugin extra to only the MlxVLM extra by changing
"vision-agents-plugins-huggingface[mlx-vlm,transformers]" to
"vision-agents-plugins-huggingface[mlx-vlm]"; if you believe torchvision is
required transitively by mlx-vlm, add an inline comment in pyproject.toml
explaining that justification instead of listing it as a direct dependency.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/12_resale_advisor_example/resale_advisor_example.py`:
- Around line 1-21: The module is misnamed—resale_advisor_example—but contains a
generic vision-and-voice assistant; either rename the example (e.g., change the
directory/file name to 12_mlx_vision_assistant_example and update top-level
docstring/title and any README references) or convert the code to a true resale
advisor by replacing the generic docstring/prompt and adding resale-specific
logic (create a helper like generate_resale_advice or a tool function named
get_resale_estimate and update the assistant_prompt/agent configuration to
request item condition, age, comparable prices, and estimated resale value).
Ensure you update all references to the filename and example title so tests/docs
remain consistent.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py`:
- Around line 249-277: The tool-followup path currently re-enters
create_response without preserving the original generation options (stream,
max_new_tokens/ max_tokens, temperature), causing follow-up requests to use
default/incorrect settings; update the code so when detecting tool_calls in
create_response (the block that calls extract_tool_calls_from_text and then
_handle_tool_calls) you forward the resolved generation options (stream,
max_new_tokens or max_tokens, temperature) into the follow-up call (either via
kwargs or a dedicated param) and ensure _handle_tool_calls and any subsequent
create_response invocation accept and propagate those options; reference
create_response, _generate_streaming, _generate_non_streaming, is_tool_followup,
_handle_tool_calls and tool_calls when locating where to add and propagate these
parameters.
- Around line 64-74: The Hermes parsing loop (hermes_pattern,
_extract_json_objects) appends tool_calls with empty/default fields when decoded
JSON lacks required keys; update the loop to validate each obj before
enqueueing: ensure "name" is present and non-empty (same validation used by the
generic JSON path), ensure "arguments" is a dict (or parse/convert it when
valid), and only append entries that pass these checks; generate a UUID for "id"
as a fallback but do not enqueue if "name" or "arguments" are invalid—skip
and/or log the malformed object instead of pushing it into tool_calls.
- Around line 204-221: The simple_response method mutates conversation history
(calls _conversation.send_message) before verifying the model is loaded, which
can leave a stale user turn if _resources is missing; add the same loaded-ready
guard used in LocalVLM.simple_response at the top of simple_response (check the
loader flag or _resources existence, e.g., if not self._resources or not
self._is_loaded) and return an appropriate LLMResponseEvent or raise before
calling _conversation.send_message so create_response() doesn't discover missing
resources after you've already appended the user message.
- Around line 244-268: Wrap the template application and generation calls in
create_response (around _apply_template, _generate_streaming, and
_generate_non_streaming) with targeted exception handling so failures do not
escape: catch the specific exceptions those methods can raise (e.g.,
TemplateError/GenerationError or whatever concrete types they throw), call
self.events.send(LLMResponseEvent(original=None, text="", exception=exc)) to
emit the error event consistent with LocalVLM, and use logger.exception(...) to
log the traceback; do not use a bare “except Exception” and ensure the
LLMRequestStartedEvent is still emitted before attempting generation.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.py`:
- Around line 243-282: Tool-followup generations drop the caller's resolved
options; ensure the resolved generation options (max_tokens derived from
max_new_tokens or self._max_new_tokens and temperature) are preserved and passed
into follow-up generation and tool-handling code paths. Specifically, propagate
max_tokens and temperature when returning for tool follow-ups (when
is_tool_followup is true) and when calling self._handle_tool_calls so the
follow-up calls use the same options as the original call; update references
around max_tokens, temperature, is_tool_followup, _handle_tool_calls, and
_generate_with_frames (and the similar block at lines 308-318) to forward these
resolved options into subsequent generation/tool-call handling.
- Around line 82-142: The VideoForwarder lifecycle leaks when replacing or
detaching: add an ownership flag (e.g., self._owns_forwarder: bool) and ensure
any existing self._video_forwarder is fully detached and stopped before
assigning a new one; in watch_video_track(), before setting
self._video_forwarder (whether shared_forwarder or new VideoForwarder)
remove_frame_handler(self._frame_buffer.append) and if the existing forwarder
was owned call await stop() and clear it, then set self._owns_forwarder = False
for shared_forwarder or True for the newly created forwarder; update
stop_watching_video_track() to remove the handler and, if self._owns_forwarder
is True, await stop() and then clear self._video_forwarder and reset
self._owns_forwarder.

In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py`:
- Around line 188-225: The non-streaming method _generate_non_streaming lacks
error handling around the asyncio.to_thread(generate, ...) call so exceptions
propagate instead of being logged/emitted; wrap the to_thread call in try/except
matching _generate_streaming: catch RuntimeError and ValueError (or Exception if
you prefer parity), call logger.exception(...) with context, emit
events.LLMErrorEvent (using self.events.send) with plugin_name and
item_id=response_id, and return an empty LLMResponseEvent(original=None,
text="") on error; keep the existing successful-path behavior (sending
LLMResponseCompletedEvent when emit_events is true) and reuse symbols model,
tokenizer, sampler, response_id, request_start as in the current implementation.

In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py`:
- Around line 59-106: Initialize a one-time warning flag
self._tools_unsupported_warned = False in the class __init__, then in
_generate_with_frames check if tools_param is non-empty and the flag is False
(if tools_param and not self._tools_unsupported_warned) and emit a
logger.warning with the message "mlx-vlm's apply_chat_template does not accept
tools; registered functions (%d) will not be surfaced to the model",
len(tools_param) and set self._tools_unsupported_warned = True; follow the exact
warning-pattern used by self._multi_turn_warned and place this check before
calling apply_chat_template so callers know tools are ignored.

In `@plugins/roboflow/tests/test_roboflow.py`:
- Around line 81-82: The assertions in the test only check that objects is
non-empty and that the first object has a "label" key, which no longer verifies
the detector's correct class; update the test to assert that objects[0]["label"]
is one of an expected set of labels for the rfdetr-seg-preview output (e.g.,
{"cat", "kitty", "feline"} or a canonical taxonomy used by the model) instead of
merely checking existence, by replacing the current assertions that reference
objects and objects[0] with a membership assertion against that known label set
so the test fails for incorrect classes like "toaster".

In `@pyproject.toml`:
- Line 123: The dev install currently includes only the
"vision-agents-plugins-huggingface[mlx-vlm]" extra which omits the standalone
[mlx] extra that provides the mlx_lm module, causing the requires_mlx marker in
conftest.py (which checks importlib.util.find_spec("mlx_lm")) to always skip
tests; update the pyproject extras so the dev/test installation also includes
the [mlx] extra (or merge [mlx] and [mlx-vlm] into a single combined extra) so
that mlx_lm is installed and MlxLLM tests gated by requires_mlx will run.

---

Nitpick comments:
In `@conftest.py`:
- Around line 31-42: The current skip markers requires_mlx and requires_mlx_vlm
gate only on sys.platform == "darwin" which includes Intel Macs; tighten them to
also require Apple Silicon by adding a check that platform.machine() == "arm64"
(e.g., require sys.platform == "darwin" and platform.machine() == "arm64" and
the existing importlib.util.find_spec(...) check) so the markers only run on
Apple Silicon where mlx/mlx_vlm are supported.

In `@examples/12_resale_advisor_example/pyproject.toml`:
- Around line 6-14: Remove redundant and heavy deps: delete the top-level
"mlx-vlm" entry and the "torchvision" entry from the dependencies array, and
narrow the HuggingFace plugin extra to only the MlxVLM extra by changing
"vision-agents-plugins-huggingface[mlx-vlm,transformers]" to
"vision-agents-plugins-huggingface[mlx-vlm]"; if you believe torchvision is
required transitively by mlx-vlm, add an inline comment in pyproject.toml
explaining that justification instead of listing it as a direct dependency.

In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py`:
- Around line 56-87: The _load_model_sync implementation uses indexing into
result instead of unpacking; change load(self.model_id) to return two values
captured via direct tuple unpacking like model, tokenizer = load(self.model_id)
and then construct MlxModelResources(model=model, tokenizer=tokenizer) to make
the contract explicit and improve readability (modify the _load_model_sync
function and keep use of MlxModelResources).

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 146-160: Add a configurable do_sample flag to TransformersLLM by
adding a do_sample: bool = True parameter to the __init__ signature, store it as
self._do_sample, and update both _generate_streaming and _generate_non_streaming
to use temperature = self._temperature if self._do_sample else 1.0 (or the
existing temperature field name) so greedy decoding is possible and behavior
matches TransformersVLM; ensure any references to sampling behavior use
self._do_sample consistently.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`:
- Around line 88-95: Replace the positional argument forwarding in the subclass
constructor with explicit keyword arguments to match LocalVLM.__init__; update
the call to super().__init__ in transformers_vlm (the subclass shown) to pass
model=model, fps=fps, frame_buffer_seconds=frame_buffer_seconds,
max_frames=max_frames, max_new_tokens=max_new_tokens, and
max_tool_rounds=max_tool_rounds so parameter bindings remain correct even if
LocalVLM.__init__ reorders parameters.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50f042cf-458d-4f82-8f2a-fee86d976c48

📥 Commits

Reviewing files that changed from the base of the PR and between 9b276c6 and 95e0785.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (18)

conftest.py
examples/12_resale_advisor_example/pyproject.toml
examples/12_resale_advisor_example/resale_advisor_example.py
plugins/huggingface/pyproject.toml
plugins/huggingface/tests/test_transformers_vlm.py
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py
plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py
plugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.py
plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
plugins/moondream/pyproject.toml
plugins/roboflow/tests/test_roboflow.py
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
plugins/smart_turn/pyproject.toml
pyproject.toml

coderabbitai · 2026-04-21T21:13:20Z

+        template_result = self._apply_template(messages, tools_param)
+        if template_result is None:
+            return LLMResponseEvent(original=None, text="")
+        prepared_input, tools_applied = template_result
+
+        max_tokens = max_new_tokens or self._max_new_tokens
+        is_tool_followup = kwargs.pop("_tool_followup", False)
+        suppress_events = tools_applied
+
+        self.events.send(
+            LLMRequestStartedEvent(
+                plugin_name=self._plugin_name,
+                model=self.model_id,
+                streaming=stream and not suppress_events,
+            )
+        )
+
+        if stream and not suppress_events:
+            result = await self._generate_streaming(
+                prepared_input, max_tokens, temperature
+            )
+        else:
+            result = await self._generate_non_streaming(
+                prepared_input, max_tokens, temperature, not suppress_events
+            )


⚠️ Potential issue | 🟠 Major

Return an error response when local generation fails.

Backend template/generation failures currently propagate out of create_response(), skipping the error event and the LLMResponseEvent(exception=...) contract used by LocalVLM.

Wrap template and generation calls with specific exception handling

- template_result = self._apply_template(messages, tools_param) - if template_result is None: - return LLMResponseEvent(original=None, text="") - prepared_input, tools_applied = template_result - - max_tokens = max_new_tokens or self._max_new_tokens - is_tool_followup = kwargs.pop("_tool_followup", False) - suppress_events = tools_applied - - self.events.send( - LLMRequestStartedEvent( - plugin_name=self._plugin_name, - model=self.model_id, - streaming=stream and not suppress_events, + try: + template_result = self._apply_template(messages, tools_param) + if template_result is None: + return LLMResponseEvent(original=None, text="") + prepared_input, tools_applied = template_result + + max_tokens = max_new_tokens or self._max_new_tokens + is_tool_followup = kwargs.pop("_tool_followup", False) + suppress_events = tools_applied + + self.events.send( + LLMRequestStartedEvent( + plugin_name=self._plugin_name, + model=self.model_id, + streaming=stream and not suppress_events, + ) ) - ) - - if stream and not suppress_events: - result = await self._generate_streaming( - prepared_input, max_tokens, temperature - ) - else: - result = await self._generate_non_streaming( - prepared_input, max_tokens, temperature, not suppress_events + + if stream and not suppress_events: + result = await self._generate_streaming( + prepared_input, max_tokens, temperature + ) + else: + result = await self._generate_non_streaming( + prepared_input, max_tokens, temperature, not suppress_events + ) + except (RuntimeError, ValueError) as e: + logger.exception("LLM generation failed") + self.events.send( + events.LLMErrorEvent( + plugin_name=self._plugin_name, + error_message=str(e), + event_data=e, + ) ) + return LLMResponseEvent(original=None, text="", exception=e)

As per coding guidelines, “Never write except Exception as e. Catch specific exceptions instead” and “Prefer logger.exception() when logging an error with a traceback instead of logger.error("Error: {exc}").”

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py` around lines 244 - 268, Wrap the template application and generation calls in create_response (around _apply_template, _generate_streaming, and _generate_non_streaming) with targeted exception handling so failures do not escape: catch the specific exceptions those methods can raise (e.g., TemplateError/GenerationError or whatever concrete types they throw), call self.events.send(LLMResponseEvent(original=None, text="", exception=exc)) to emit the error event consistent with LocalVLM, and use logger.exception(...) to log the traceback; do not use a bare “except Exception” and ensure the LLMRequestStartedEvent is still emitted before attempting generation.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

plugins/huggingface/vision_agents/plugins/huggingface/__init__.py (1)

40-40: Move warnings to a module-level import.

The new MLX handlers add local imports on Lines 40 and 56. Import warnings once at the top and reuse it in all handlers.

♻️ Proposed import cleanup

+import warnings
+
 from .events import DetectionCompletedEvent
 from .huggingface_llm import HuggingFaceLLM as LLM
 from .huggingface_vlm import HuggingFaceVLM as VLM
@@
 except ImportError as e:
-    import warnings
-
     optional = {"torch", "transformers", "av", "aiortc", "jinja2", "supervision", "cv2"}
@@
 except ImportError as e:
-    import warnings
-
     if _is_mlx_import_error(e):
@@
 except ImportError as e:
-    import warnings
-
     if _is_mlx_import_error(e) or e.name in {"av", "aiortc"}:

As per coding guidelines, Do not use local imports; import at the top of the module.

Also applies to: 56-56

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py` at line
40, Move the local "import warnings" statements into a single module-level
import at the top of
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py and remove the
local imports inside the MLX handler functions; update each handler that
currently does a local "import warnings" (the MLX handlers) to use the top-level
warnings module instead so all handlers reuse the single import.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py`:
- Around line 27-32: The _is_mlx_import_error function is too permissive: change
its fallback so it only treats exceptions with exc.name None (shared-library
failures) as MLX import errors instead of checking "mlx" in the entire exception
string; update the logic in _is_mlx_import_error to return True when exc.name is
one of {"mlx","mlx_lm","mlx_vlm","mlx.core"} or when exc.name is None and "mlx"
appears in the exception message, otherwise return False; also move any local
imports of warnings up to the module top so warnings is imported once at
top-level instead of inside functions (remove local imports at lines noted in
the review).

---

Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py`:
- Line 40: Move the local "import warnings" statements into a single
module-level import at the top of
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py and remove the
local imports inside the MLX handler functions; update each handler that
currently does a local "import warnings" (the MLX handlers) to use the top-level
warnings module instead so all handlers reuse the single import.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b39d63fc-b47f-4d30-8369-a0f2e3f80779

📥 Commits

Reviewing files that changed from the base of the PR and between abf231b and 12c321b.

📒 Files selected for processing (1)

plugins/huggingface/vision_agents/plugins/huggingface/__init__.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py (1)
277-286: ⚠️ Potential issue | 🟠 Major

Catch ValueError in the streaming worker thread to match the parent class exception handling.

The parent class LocalTextLLM.create_response() catches both RuntimeError and ValueError; MlxLLM._generate_streaming() does the same. If TransformersLLM.run_generation() only catches RuntimeError, a ValueError from model.generate() will leave generation_error unset, causing the stream to close with an empty successful response instead of propagating the error.
Fix
-            except RuntimeError as e:
+            except (RuntimeError, ValueError) as e:
                 generation_error = e
                 logger.exception("Generation failed")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`
around lines 277 - 286, The streaming worker run_generation currently only
catches RuntimeError so ValueError from model.generate will not set
generation_error; update run_generation (inside TransformersLLM.run_generation)
to catch both RuntimeError and ValueError, assign the caught exception to
generation_error, and call logger.exception with context before finally calling
loop.call_soon_threadsafe(async_queue.put_nowait, None) to close the stream;
this mirrors LocalTextLLM.create_response() and MlxLLM._generate_streaming()
behavior.

♻️ Duplicate comments (1)

plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py (1)

264-294: ⚠️ Potential issue | 🟠 Major

Guard template application with the same error contract as generation.

_apply_template() still runs before the try, so template failures bypass the logged LLMResponseEvent(exception=...) path and can escape directly to callers. Move template application into the guarded block and catch the concrete template/runtime failures there.

Proposed fix

+import jinja2
+
 ...
-        template_result = self._apply_template(messages, tools_param)
-        if template_result is None:
-            return LLMResponseEvent(original=None, text="")
-        prepared_input, tools_applied = template_result
-
-        max_tokens = max_new_tokens or self._max_new_tokens
-        is_tool_followup = kwargs.pop("_tool_followup", False)
-        suppress_events = tools_applied
-
-        self.events.send(
-            LLMRequestStartedEvent(
-                plugin_name=self._plugin_name,
-                model=self.model_id,
-                streaming=stream and not suppress_events,
-            )
-        )
-
         try:
+            template_result = self._apply_template(messages, tools_param)
+            if template_result is None:
+                return LLMResponseEvent(original=None, text="")
+            prepared_input, tools_applied = template_result
+
+            max_tokens = max_new_tokens or self._max_new_tokens
+            is_tool_followup = kwargs.pop("_tool_followup", False)
+            suppress_events = tools_applied
+
+            self.events.send(
+                LLMRequestStartedEvent(
+                    plugin_name=self._plugin_name,
+                    model=self.model_id,
+                    streaming=stream and not suppress_events,
+                )
+            )
+
             if stream and not suppress_events:
                 result = await self._generate_streaming(
                     prepared_input, max_tokens, temperature
                 )
             else:
                 result = await self._generate_non_streaming(
                     prepared_input, max_tokens, temperature, not suppress_events
                 )
-        except (RuntimeError, ValueError) as exc:
+        except (jinja2.TemplateError, RuntimeError, TypeError, ValueError) as exc:
             logger.exception("Local text generation failed")
             error_event = LLMResponseEvent(original=None, text="", exception=exc)
             self.events.send(error_event)
             return error_event

Verify _apply_template() is inside the protected region after the change:

#!/bin/bash
python - <<'PY'
from pathlib import Path

path = next(Path(".").rglob("_local_inference.py"))
text = path.read_text()
start = text.index("        template_result = self._apply_template")
end = text.index("        if suppress_events", start)
print(text[start:end])
PY

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py`
around lines 264 - 294, Move the call to self._apply_template(...) into the try
block that wraps generation so template failures are handled the same as
generation errors: call template_result = self._apply_template(messages,
tools_param) inside the existing try, keep the existing None check and return of
LLMResponseEvent(original=None, text="") if template_result is None, then
proceed to unpack prepared_input, tools_applied and the existing generation
branches (_generate_streaming/_generate_non_streaming); also ensure any
exceptions from _apply_template are caught by the except (RuntimeError,
ValueError) handler, log via logger.exception("Local text generation failed")
and send/return an LLMResponseEvent(original=None, text="", exception=exc) via
self.events.send to preserve the same error contract.

🧹 Nitpick comments (2)

plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py (2)

50-53: **kwargs: Any hides the public constructor surface.

LocalVLM.__init__ has a well-defined signature (model, fps, frame_buffer_seconds, max_frames, max_new_tokens, max_tool_rounds). Forwarding via **kwargs: Any drops IDE autocompletion, type checking at call sites, and the docstring's argument contract (which is itself lying about defaults). Positional MlxVLM("mlx-community/...") also stops working.

♻️ Proposed refactor — mirror the base signature

-    def __init__(self, **kwargs: Any):
-        super().__init__(**kwargs)
+    def __init__(
+        self,
+        model: str,
+        fps: int = 1,
+        frame_buffer_seconds: int = 10,
+        max_frames: int = 4,
+        max_new_tokens: int = 512,
+        max_tool_rounds: int = 3,
+    ):
+        super().__init__(
+            model=model,
+            fps=fps,
+            frame_buffer_seconds=frame_buffer_seconds,
+            max_frames=max_frames,
+            max_new_tokens=max_new_tokens,
+            max_tool_rounds=max_tool_rounds,
+        )
         self._multi_turn_warned = False
         self._tools_unsupported_warned = False

As per coding guidelines: "Avoid using Any type" and "Use type annotations everywhere."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py` around
lines 50 - 53, The MlxVLM constructor currently uses **kwargs: Any which hides
the public constructor surface; change MlxVLM.__init__ to mirror the LocalVLM
signature (accept explicit parameters: model: str, fps: int,
frame_buffer_seconds: float, max_frames: int, max_new_tokens: int,
max_tool_rounds: int — using the same defaults as LocalVLM), update type
annotations accordingly, call super().__init__(model=model, fps=fps,
frame_buffer_seconds=frame_buffer_seconds, max_frames=max_frames,
max_new_tokens=max_new_tokens, max_tool_rounds=max_tool_rounds), preserve the
_multi_turn_warned and _tools_unsupported_warned initializations, and update the
docstring to reflect the explicit parameters and defaults so IDEs and type
checkers get accurate signatures.

73-76: Frame downsampling drops the final frame.

With step = len(frames) / self._max_frames and i in range(self._max_frames), indices span [0, len(frames) - step) — the most recent frame is never sampled. For a video agent where recency usually matters most, this is a mild but real bias.

♻️ Proposed fix — even spread including endpoints

-        sampled = frames
-        if len(frames) > self._max_frames:
-            step = len(frames) / self._max_frames
-            sampled = [frames[int(i * step)] for i in range(self._max_frames)]
+        sampled = frames
+        if len(frames) > self._max_frames:
+            last = len(frames) - 1
+            sampled = [
+                frames[round(i * last / (self._max_frames - 1))]
+                for i in range(self._max_frames)
+            ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py` around
lines 73 - 76, The current downsampling logic using step = len(frames) /
self._max_frames and sampled = [frames[int(i * step)] for i in
range(self._max_frames)] never includes the final frame; change the sampling to
evenly include endpoints when downsampling: if len(frames) > self._max_frames
compute indices that span 0..len(frames)-1 inclusive (e.g. step =
(len(frames)-1) / (self._max_frames-1) when self._max_frames > 1 or use linspace
semantics) and then select frames at those rounded indices; keep the early
return/assignment to sampled when len(frames) <= self._max_frames and ensure
this logic is applied where sampled, frames, and self._max_frames are used.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py`:
- Around line 105-114: In the generate() call inside mlx_vlm.py (the invocation
that builds `result = generate(...)`) replace the deprecated kwarg
temp=temperature with the standardized temperature=temperature so the mlx-vlm
library actually receives the temperature setting; keep returning result.text
as-is.

---

Outside diff comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 277-286: The streaming worker run_generation currently only
catches RuntimeError so ValueError from model.generate will not set
generation_error; update run_generation (inside TransformersLLM.run_generation)
to catch both RuntimeError and ValueError, assign the caught exception to
generation_error, and call logger.exception with context before finally calling
loop.call_soon_threadsafe(async_queue.put_nowait, None) to close the stream;
this mirrors LocalTextLLM.create_response() and MlxLLM._generate_streaming()
behavior.

---

Duplicate comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py`:
- Around line 264-294: Move the call to self._apply_template(...) into the try
block that wraps generation so template failures are handled the same as
generation errors: call template_result = self._apply_template(messages,
tools_param) inside the existing try, keep the existing None check and return of
LLMResponseEvent(original=None, text="") if template_result is None, then
proceed to unpack prepared_input, tools_applied and the existing generation
branches (_generate_streaming/_generate_non_streaming); also ensure any
exceptions from _apply_template are caught by the except (RuntimeError,
ValueError) handler, log via logger.exception("Local text generation failed")
and send/return an LLMResponseEvent(original=None, text="", exception=exc) via
self.events.send to preserve the same error contract.

---

Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py`:
- Around line 50-53: The MlxVLM constructor currently uses **kwargs: Any which
hides the public constructor surface; change MlxVLM.__init__ to mirror the
LocalVLM signature (accept explicit parameters: model: str, fps: int,
frame_buffer_seconds: float, max_frames: int, max_new_tokens: int,
max_tool_rounds: int — using the same defaults as LocalVLM), update type
annotations accordingly, call super().__init__(model=model, fps=fps,
frame_buffer_seconds=frame_buffer_seconds, max_frames=max_frames,
max_new_tokens=max_new_tokens, max_tool_rounds=max_tool_rounds), preserve the
_multi_turn_warned and _tools_unsupported_warned initializations, and update the
docstring to reflect the explicit parameters and defaults so IDEs and type
checkers get accurate signatures.
- Around line 73-76: The current downsampling logic using step = len(frames) /
self._max_frames and sampled = [frames[int(i * step)] for i in
range(self._max_frames)] never includes the final frame; change the sampling to
evenly include endpoints when downsampling: if len(frames) > self._max_frames
compute indices that span 0..len(frames)-1 inclusive (e.g. step =
(len(frames)-1) / (self._max_frames-1) when self._max_frames > 1 or use linspace
semantics) and then select frames at those rounded indices; keep the early
return/assignment to sampled when len(frames) <= self._max_frames and ensure
this logic is applied where sampled, frames, and self._max_frames are used.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a3c92ede-8dcc-4ed0-8bca-3843192983fd

📥 Commits

Reviewing files that changed from the base of the PR and between 12c321b and b0726e6.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

conftest.py
examples/12_resale_advisor_example/pyproject.toml
examples/12_resale_advisor_example/resale_advisor_example.py
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py
plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py
plugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.py
plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
plugins/roboflow/tests/test_roboflow.py
pyproject.toml

✅ Files skipped from review due to trivial changes (1)

plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

🚧 Files skipped from review as they are similar to previous changes (7)

pyproject.toml
plugins/roboflow/tests/test_roboflow.py
conftest.py
plugins/huggingface/vision_agents/plugins/huggingface/init.py
examples/12_resale_advisor_example/resale_advisor_example.py
examples/12_resale_advisor_example/pyproject.toml
plugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.py

coderabbitai · 2026-04-22T19:49:39Z

+        result = generate(
+            model,
+            processor,
+            prompt=prompt,
+            image=images if images else None,
+            max_tokens=max_tokens,
+            temp=temperature,
+        )
+
+        return result.text


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect mlx-vlm pin and verify generate() signature + return type for that version. rg -nP --iglob '*.toml' --iglob '*.txt' '\bmlx[-_]vlm\b' python - <<'PY' import inspect, mlx_vlm from mlx_vlm import generate print("version:", getattr(mlx_vlm, "__version__", "unknown")) print("generate sig:", inspect.signature(generate)) src = inspect.getsource(generate) print("returns GenerationResult?", "GenerationResult" in src or ".text" in src) print("accepts temp kwarg?", "temp" in inspect.signature(generate).parameters) print("accepts temperature kwarg?", "temperature" in inspect.signature(generate).parameters) PY

Repository: GetStream/Vision-Agents

Length of output: 604

🌐 Web query:

mlx-vlm generate function signature temp vs temperature kwarg return type GenerationResult

💡 Result:

The generate function in mlx-vlm has the following signature: python def generate( model: nn.Module, processor: PreTrainedTokenizer, prompt: str, image: Union[str, List[str]] = None, audio: Union[str, List[str]] = None, verbose: bool = False, kwargs, ) -> GenerationResult: It accepts temperature (float, default 0.0) as a kwargs parameter for sampling control. Older code or examples may use temp, but this was standardized to temperature (Issue #203, closed March 2025). Both may work due to kwargs forwarding, but temperature is the current standard. The function returns a GenerationResult object containing fields like text, prompt_tokens, generation_tokens, prompt_tps, generation_tps, peak_memory, etc.

Citations:

1: https://github.com/Blaizzy/mlx-vlm/blob/718f69e9/mlx_vlm/generate.py

2: https://www.mintlify.com/yocxy2/mlx-vlm/api/generate

3: temp or temperature? Blaizzy/mlx-vlm#203

4: Improve mlx-vlm type hints for generate() / stream_generate() so Pyright/Pylance can catch API mistakes Blaizzy/mlx-vlm#825

Fix mlx-vlm generate() kwarg: use temperature= instead of temp=.

The current code passes temp=temperature on line 110, but the mlx-vlm API standardized on temperature= as of March 2025 (Issue #203). While older code using temp= may be silently absorbed by **kwargs forwarding in some versions, the parameter will likely be ignored, causing the temperature setting to have no effect.

The pinned version (mlx-vlm>=0.4.0) supports the standard temperature= kwarg. Update line 110 to:

temperature=temperature,

The .text attribute access on line 114 is correct—generate() returns a GenerationResult object with a .text field.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py` around lines 105 - 114, In the generate() call inside mlx_vlm.py (the invocation that builds `result = generate(...)`) replace the deprecated kwarg temp=temperature with the standardized temperature=temperature so the mlx-vlm library actually receives the temperature setting; keep returning result.text as-is.

coderabbitai

🧹 Nitpick comments (1)

plugins/huggingface/vision_agents/plugins/huggingface/__init__.py (1)
38-61: Non-MLX ImportErrors from mlx_llm/mlx_vlm re-raise silently — consider surfacing them.

When _is_mlx_import_error(e) is False (and, for MlxVLM, e.name isn't av/aiortc), the bare raise propagates the original ImportError with no context. That's technically fine, but if a user hits e.g. a broken mlx_lm install that raises ImportError: cannot import name 'stream_generate' from 'mlx_lm', they'll get the raw traceback with no hint it came from huggingface.MlxLLM lazy loading. Optional: attach context via raise ... from e with a short message, or at least log at debug for diagnostics.

Also nit: __getattr__ is missing a return annotation. Per the guideline "Use type annotations everywhere", a -> type would be consistent with the rest of the codebase (the return is a class object, so type works without resorting to Any).
♻️ Optional refinement
-def __getattr__(name: str):
+def __getattr__(name: str) -> type:
     if name == "MlxLLM":
         try:
             return import_module(".mlx_llm", __name__).MlxLLM
         except ImportError as e:
             if _is_mlx_import_error(e):
                 warnings.warn(
                     "MLX is not available on this platform. "
                     "Install the [mlx] extra on Apple Silicon to enable MLX plugins.",
                     stacklevel=2,
                 )
             raise
As per coding guidelines: "Use type annotations everywhere."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py` around
lines 38 - 61, The lazy loader __getattr__ swallows context for non-MLX
ImportErrors in MlxLLM and MlxVLM and lacks a return annotation; update
__getattr__ (the function handling MlxLLM and MlxVLM imports) to (1) add a
return type annotation -> type, and (2) when catching ImportError e that is not
identified by _is_mlx_import_error (and for MlxVLM not av/aiortc), re-raise a
new ImportError with a short contextual message (e.g., "failed to import MlxLLM
via huggingface lazy loader") using "raise ImportError(...) from e" or at
minimum log the original exception at debug before re-raising, so users see that
the error originated from the huggingface lazy import.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py`:
- Around line 38-61: The lazy loader __getattr__ swallows context for non-MLX
ImportErrors in MlxLLM and MlxVLM and lacks a return annotation; update
__getattr__ (the function handling MlxLLM and MlxVLM imports) to (1) add a
return type annotation -> type, and (2) when catching ImportError e that is not
identified by _is_mlx_import_error (and for MlxVLM not av/aiortc), re-raise a
new ImportError with a short contextual message (e.g., "failed to import MlxLLM
via huggingface lazy loader") using "raise ImportError(...) from e" or at
minimum log the original exception at debug before re-raising, so users see that
the error originated from the huggingface lazy import.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f1b110a7-c560-431b-8df9-898450c67944

📥 Commits

Reviewing files that changed from the base of the PR and between b0726e6 and 4cfc1f5.

📒 Files selected for processing (3)

plugins/huggingface/vision_agents/plugins/huggingface/__init__.py
plugins/roboflow/tests/test_roboflow.py
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py

🚧 Files skipped from review as they are similar to previous changes (2)

plugins/roboflow/tests/test_roboflow.py
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py

maxkahan and others added 7 commits April 7, 2026 14:43

add resale advisor example

38f8ba5

Upgrade transformers library version

36a3160

Change example to voice only for local testing

e4d7035

Add resolve_device for mapping DeviceType

ec6719e

Extractr common logic to _local_inference

047d7d6

Add API for MLX models

17df094

Convert example to use Gemma MLX Quant

6dafa40

github-actions Bot added dependencies plugins examples config labels Apr 7, 2026

Nash0x7E2 requested review from aliev and dangusev April 7, 2026 20:47

Nash0x7E2 added 8 commits April 7, 2026 15:49

Add huggingface to root toml

cad81d7

Fixes for formatting and mypy

61861f1

Fast-path tool-call extraction when output has no JSON braces

61674a6

Share _extract_last_user_text helper between local VLMs

8e31104

Pass PIL images directly to mlx-vlm generate, drop temp-file PNGs

09f4f56

Fix mlx_lm.generate string return; warn on hung generation threads

57430fe

Gate MLX dev dep on Apple Silicon so Linux CI skips install

c098cfd

Add requires_mlx pytest skip markers for Apple-only tests

95e0785

Nash0x7E2 marked this pull request as ready for review April 21, 2026 21:06

Nash0x7E2 added 2 commits April 21, 2026 15:12

Gate huggingface [mlx] and [mlx-vlm] extras on Apple Silicon

9ad044c

uv sync --all-extras in CI was activating the plugin's MLX extras on Linux, pulling in mlx/mlx-lm wheels that then failed to import at runtime because libmlx.so is not in the Linux wheel.

Harden MLX import-failure detection for missing shared libs

abf231b

ImportError.name is None when a transitive dep fails on a missing shared library (e.g. libmlx.so on Linux), so the previous e.name set check let the exception escape. Match on the message as a fallback.

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Format __init__.py per ruff

12c321b

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread plugins/huggingface/vision_agents/plugins/huggingface/__init__.py Outdated

Nash0x7E2 added 2 commits April 22, 2026 13:38

fix(huggingface): preserve local inference followup behavior

6471a4d

fix(huggingface): align mlx and transformers local settings

9649575

chore(examples): tighten resale example dependencies

b0726e6

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Nash0x7E2 added 2 commits April 22, 2026 13:53

fix(huggingface): lazy-load mlx plugins

2af000d

fix(roboflow): handle filtered detections safely

4cfc1f5

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Uh oh!

Conversation

Nash0x7E2 commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Nash0x7E2 commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading