Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds shared local inference bases (LocalTextLLM, LocalVLM), MLX-backed plugins (MlxLLM, MlxVLM), refactors Transformers plugins to use the new bases, updates dependency constraints and MLX test skip markers, tweaks several plugin implementations/tests, and adds a resale‑advisor multimodal example. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant LocalLLM as LocalTextLLM/LocalVLM
participant Video as VideoForwarder
participant ToolRunner
participant Events
Client->>LocalLLM: request(messages, frames?, stream?)
alt Video attached
Video->>LocalLLM: supply buffered frames
end
LocalLLM->>LocalLLM: build messages / apply template
LocalLLM->>LocalLLM: generate (stream / non-stream) -> raw_text
LocalLLM->>Events: emit start / chunk / completed
alt tool calls detected
LocalLLM->>ToolRunner: execute tool calls
ToolRunner-->>LocalLLM: tool results
LocalLLM->>LocalLLM: regenerate follow-up (repeat up to max rounds)
LocalLLM->>Events: emit followup events
end
LocalLLM-->>Client: final LLMResponseEvent
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
uv sync --all-extras in CI was activating the plugin's MLX extras on Linux, pulling in mlx/mlx-lm wheels that then failed to import at runtime because libmlx.so is not in the Linux wheel.
ImportError.name is None when a transitive dep fails on a missing shared library (e.g. libmlx.so on Linux), so the previous e.name set check let the exception escape. Match on the message as a fallback.
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (5)
conftest.py (1)
31-42: Optional: tighten platform gate to Apple Silicon.
sys.platform == "darwin"also matches Intel Macs, wheremlx/mlx_lmwon't actually run even if somehow importable. Since the rootpyproject.tomlalready constrains installation viaplatform_machine == 'arm64', the marker can mirror that for consistency:Suggested refinement
+import platform ... requires_mlx = pytest.mark.skipif( - sys.platform != "darwin" or importlib.util.find_spec("mlx_lm") is None, + sys.platform != "darwin" + or platform.machine() != "arm64" + or importlib.util.find_spec("mlx_lm") is None, reason="MLX tests require Apple Silicon with mlx-lm installed", ) ... requires_mlx_vlm = pytest.mark.skipif( - sys.platform != "darwin" or importlib.util.find_spec("mlx_vlm") is None, + sys.platform != "darwin" + or platform.machine() != "arm64" + or importlib.util.find_spec("mlx_vlm") is None, reason="MLX-VLM tests require Apple Silicon with mlx-vlm installed", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conftest.py` around lines 31 - 42, The current skip markers requires_mlx and requires_mlx_vlm gate only on sys.platform == "darwin" which includes Intel Macs; tighten them to also require Apple Silicon by adding a check that platform.machine() == "arm64" (e.g., require sys.platform == "darwin" and platform.machine() == "arm64" and the existing importlib.util.find_spec(...) check) so the markers only run on Apple Silicon where mlx/mlx_vlm are supported.plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py (1)
56-87:_load_model_synctuple unpack — minor readability nit.
result = load(self.model_id); MlxModelResources(model=result[0], tokenizer=result[1])reads as thoughload()might return more than two items; it doesn't. A direct unpack documents the contract better.♻️ Tiny tidy-up
- def _load_model_sync(self) -> MlxModelResources: - result = load(self.model_id) - return MlxModelResources(model=result[0], tokenizer=result[1]) + def _load_model_sync(self) -> MlxModelResources: + model, tokenizer = load(self.model_id) + return MlxModelResources(model=model, tokenizer=tokenizer)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py` around lines 56 - 87, The _load_model_sync implementation uses indexing into result instead of unpacking; change load(self.model_id) to return two values captured via direct tuple unpacking like model, tokenizer = load(self.model_id) and then construct MlxModelResources(model=model, tokenizer=tokenizer) to make the contract explicit and improve readability (modify the _load_model_sync function and keep use of MlxModelResources).plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py (1)
146-160:do_sampleis configurable onTransformersVLMbut hardcoded toTruehere.The sibling
TransformersVLMin this same PR added ado_sample: bool = Trueconstructor flag (seetransformers_vlm.pylines 86, 100, 164-166). KeepingTransformersLLMhardcoded means users can't run greedy decoding on the text model, and users who configure both will get asymmetric behavior for the same parameter. Consider surfacing the same knob and mirroring thetemperature if self._do_sample else 1.0dance in both_generate_streamingand_generate_non_streaming.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py` around lines 146 - 160, Add a configurable do_sample flag to TransformersLLM by adding a do_sample: bool = True parameter to the __init__ signature, store it as self._do_sample, and update both _generate_streaming and _generate_non_streaming to use temperature = self._temperature if self._do_sample else 1.0 (or the existing temperature field name) so greedy decoding is possible and behavior matches TransformersVLM; ensure any references to sampling behavior use self._do_sample consistently.plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py (1)
88-95: Prefer keyword args when forwarding toLocalVLM.__init__.Six positional arguments to a base constructor defined in a sibling file is a time bomb: any reorder of
LocalVLM.__init__will silently rebind parameters here (e.g., swappingfpsandmax_frameswould still type-check). Keyword passthrough is cheap insurance.♻️ Proposed change
- super().__init__( - model, - fps, - frame_buffer_seconds, - max_frames, - max_new_tokens, - max_tool_rounds, - ) + super().__init__( + model=model, + fps=fps, + frame_buffer_seconds=frame_buffer_seconds, + max_frames=max_frames, + max_new_tokens=max_new_tokens, + max_tool_rounds=max_tool_rounds, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py` around lines 88 - 95, Replace the positional argument forwarding in the subclass constructor with explicit keyword arguments to match LocalVLM.__init__; update the call to super().__init__ in transformers_vlm (the subclass shown) to pass model=model, fps=fps, frame_buffer_seconds=frame_buffer_seconds, max_frames=max_frames, max_new_tokens=max_new_tokens, and max_tool_rounds=max_tool_rounds so parameter bindings remain correct even if LocalVLM.__init__ reorders parameters.examples/12_resale_advisor_example/pyproject.toml (1)
6-14: Prune redundant / unused dependencies.
mlx-vlmis already pulled in by the[mlx-vlm]extra onvision-agents-plugins-huggingface; listing it again at the top level is duplication.- The
transformersextra drags intransformers,torch,accelerate,supervision, andopencv-python(seeplugins/huggingface/pyproject.tomllines 18-39), but this example only usesMlxVLM. That's a heavy install for nothing.torchvisionis declared directly but not imported byresale_advisor_example.py, nor required by the MLX path.♻️ Proposed trim
dependencies = [ "python-dotenv>=1.0", - "vision-agents-plugins-huggingface[mlx-vlm,transformers]", + "vision-agents-plugins-huggingface[mlx-vlm]", "vision-agents-plugins-getstream", "vision-agents-plugins-deepgram", "vision-agents", - "mlx-vlm", - "torchvision", ]If
torchvisionis genuinely required by some transitive code path inmlx-vlm, please leave a comment explaining why; otherwise drop it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/12_resale_advisor_example/pyproject.toml` around lines 6 - 14, Remove redundant and heavy deps: delete the top-level "mlx-vlm" entry and the "torchvision" entry from the dependencies array, and narrow the HuggingFace plugin extra to only the MlxVLM extra by changing "vision-agents-plugins-huggingface[mlx-vlm,transformers]" to "vision-agents-plugins-huggingface[mlx-vlm]"; if you believe torchvision is required transitively by mlx-vlm, add an inline comment in pyproject.toml explaining that justification instead of listing it as a direct dependency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/12_resale_advisor_example/resale_advisor_example.py`:
- Around line 1-21: The module is misnamed—resale_advisor_example—but contains a
generic vision-and-voice assistant; either rename the example (e.g., change the
directory/file name to 12_mlx_vision_assistant_example and update top-level
docstring/title and any README references) or convert the code to a true resale
advisor by replacing the generic docstring/prompt and adding resale-specific
logic (create a helper like generate_resale_advice or a tool function named
get_resale_estimate and update the assistant_prompt/agent configuration to
request item condition, age, comparable prices, and estimated resale value).
Ensure you update all references to the filename and example title so tests/docs
remain consistent.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py`:
- Around line 249-277: The tool-followup path currently re-enters
create_response without preserving the original generation options (stream,
max_new_tokens/ max_tokens, temperature), causing follow-up requests to use
default/incorrect settings; update the code so when detecting tool_calls in
create_response (the block that calls extract_tool_calls_from_text and then
_handle_tool_calls) you forward the resolved generation options (stream,
max_new_tokens or max_tokens, temperature) into the follow-up call (either via
kwargs or a dedicated param) and ensure _handle_tool_calls and any subsequent
create_response invocation accept and propagate those options; reference
create_response, _generate_streaming, _generate_non_streaming, is_tool_followup,
_handle_tool_calls and tool_calls when locating where to add and propagate these
parameters.
- Around line 64-74: The Hermes parsing loop (hermes_pattern,
_extract_json_objects) appends tool_calls with empty/default fields when decoded
JSON lacks required keys; update the loop to validate each obj before
enqueueing: ensure "name" is present and non-empty (same validation used by the
generic JSON path), ensure "arguments" is a dict (or parse/convert it when
valid), and only append entries that pass these checks; generate a UUID for "id"
as a fallback but do not enqueue if "name" or "arguments" are invalid—skip
and/or log the malformed object instead of pushing it into tool_calls.
- Around line 204-221: The simple_response method mutates conversation history
(calls _conversation.send_message) before verifying the model is loaded, which
can leave a stale user turn if _resources is missing; add the same loaded-ready
guard used in LocalVLM.simple_response at the top of simple_response (check the
loader flag or _resources existence, e.g., if not self._resources or not
self._is_loaded) and return an appropriate LLMResponseEvent or raise before
calling _conversation.send_message so create_response() doesn't discover missing
resources after you've already appended the user message.
- Around line 244-268: Wrap the template application and generation calls in
create_response (around _apply_template, _generate_streaming, and
_generate_non_streaming) with targeted exception handling so failures do not
escape: catch the specific exceptions those methods can raise (e.g.,
TemplateError/GenerationError or whatever concrete types they throw), call
self.events.send(LLMResponseEvent(original=None, text="", exception=exc)) to
emit the error event consistent with LocalVLM, and use logger.exception(...) to
log the traceback; do not use a bare “except Exception” and ensure the
LLMRequestStartedEvent is still emitted before attempting generation.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.py`:
- Around line 243-282: Tool-followup generations drop the caller's resolved
options; ensure the resolved generation options (max_tokens derived from
max_new_tokens or self._max_new_tokens and temperature) are preserved and passed
into follow-up generation and tool-handling code paths. Specifically, propagate
max_tokens and temperature when returning for tool follow-ups (when
is_tool_followup is true) and when calling self._handle_tool_calls so the
follow-up calls use the same options as the original call; update references
around max_tokens, temperature, is_tool_followup, _handle_tool_calls, and
_generate_with_frames (and the similar block at lines 308-318) to forward these
resolved options into subsequent generation/tool-call handling.
- Around line 82-142: The VideoForwarder lifecycle leaks when replacing or
detaching: add an ownership flag (e.g., self._owns_forwarder: bool) and ensure
any existing self._video_forwarder is fully detached and stopped before
assigning a new one; in watch_video_track(), before setting
self._video_forwarder (whether shared_forwarder or new VideoForwarder)
remove_frame_handler(self._frame_buffer.append) and if the existing forwarder
was owned call await stop() and clear it, then set self._owns_forwarder = False
for shared_forwarder or True for the newly created forwarder; update
stop_watching_video_track() to remove the handler and, if self._owns_forwarder
is True, await stop() and then clear self._video_forwarder and reset
self._owns_forwarder.
In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py`:
- Around line 188-225: The non-streaming method _generate_non_streaming lacks
error handling around the asyncio.to_thread(generate, ...) call so exceptions
propagate instead of being logged/emitted; wrap the to_thread call in try/except
matching _generate_streaming: catch RuntimeError and ValueError (or Exception if
you prefer parity), call logger.exception(...) with context, emit
events.LLMErrorEvent (using self.events.send) with plugin_name and
item_id=response_id, and return an empty LLMResponseEvent(original=None,
text="") on error; keep the existing successful-path behavior (sending
LLMResponseCompletedEvent when emit_events is true) and reuse symbols model,
tokenizer, sampler, response_id, request_start as in the current implementation.
In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py`:
- Around line 59-106: Initialize a one-time warning flag
self._tools_unsupported_warned = False in the class __init__, then in
_generate_with_frames check if tools_param is non-empty and the flag is False
(if tools_param and not self._tools_unsupported_warned) and emit a
logger.warning with the message "mlx-vlm's apply_chat_template does not accept
tools; registered functions (%d) will not be surfaced to the model",
len(tools_param) and set self._tools_unsupported_warned = True; follow the exact
warning-pattern used by self._multi_turn_warned and place this check before
calling apply_chat_template so callers know tools are ignored.
In `@plugins/roboflow/tests/test_roboflow.py`:
- Around line 81-82: The assertions in the test only check that objects is
non-empty and that the first object has a "label" key, which no longer verifies
the detector's correct class; update the test to assert that objects[0]["label"]
is one of an expected set of labels for the rfdetr-seg-preview output (e.g.,
{"cat", "kitty", "feline"} or a canonical taxonomy used by the model) instead of
merely checking existence, by replacing the current assertions that reference
objects and objects[0] with a membership assertion against that known label set
so the test fails for incorrect classes like "toaster".
In `@pyproject.toml`:
- Line 123: The dev install currently includes only the
"vision-agents-plugins-huggingface[mlx-vlm]" extra which omits the standalone
[mlx] extra that provides the mlx_lm module, causing the requires_mlx marker in
conftest.py (which checks importlib.util.find_spec("mlx_lm")) to always skip
tests; update the pyproject extras so the dev/test installation also includes
the [mlx] extra (or merge [mlx] and [mlx-vlm] into a single combined extra) so
that mlx_lm is installed and MlxLLM tests gated by requires_mlx will run.
---
Nitpick comments:
In `@conftest.py`:
- Around line 31-42: The current skip markers requires_mlx and requires_mlx_vlm
gate only on sys.platform == "darwin" which includes Intel Macs; tighten them to
also require Apple Silicon by adding a check that platform.machine() == "arm64"
(e.g., require sys.platform == "darwin" and platform.machine() == "arm64" and
the existing importlib.util.find_spec(...) check) so the markers only run on
Apple Silicon where mlx/mlx_vlm are supported.
In `@examples/12_resale_advisor_example/pyproject.toml`:
- Around line 6-14: Remove redundant and heavy deps: delete the top-level
"mlx-vlm" entry and the "torchvision" entry from the dependencies array, and
narrow the HuggingFace plugin extra to only the MlxVLM extra by changing
"vision-agents-plugins-huggingface[mlx-vlm,transformers]" to
"vision-agents-plugins-huggingface[mlx-vlm]"; if you believe torchvision is
required transitively by mlx-vlm, add an inline comment in pyproject.toml
explaining that justification instead of listing it as a direct dependency.
In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.py`:
- Around line 56-87: The _load_model_sync implementation uses indexing into
result instead of unpacking; change load(self.model_id) to return two values
captured via direct tuple unpacking like model, tokenizer = load(self.model_id)
and then construct MlxModelResources(model=model, tokenizer=tokenizer) to make
the contract explicit and improve readability (modify the _load_model_sync
function and keep use of MlxModelResources).
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 146-160: Add a configurable do_sample flag to TransformersLLM by
adding a do_sample: bool = True parameter to the __init__ signature, store it as
self._do_sample, and update both _generate_streaming and _generate_non_streaming
to use temperature = self._temperature if self._do_sample else 1.0 (or the
existing temperature field name) so greedy decoding is possible and behavior
matches TransformersVLM; ensure any references to sampling behavior use
self._do_sample consistently.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`:
- Around line 88-95: Replace the positional argument forwarding in the subclass
constructor with explicit keyword arguments to match LocalVLM.__init__; update
the call to super().__init__ in transformers_vlm (the subclass shown) to pass
model=model, fps=fps, frame_buffer_seconds=frame_buffer_seconds,
max_frames=max_frames, max_new_tokens=max_new_tokens, and
max_tool_rounds=max_tool_rounds so parameter bindings remain correct even if
LocalVLM.__init__ reorders parameters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 50f042cf-458d-4f82-8f2a-fee86d976c48
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
conftest.pyexamples/12_resale_advisor_example/pyproject.tomlexamples/12_resale_advisor_example/resale_advisor_example.pyplugins/huggingface/pyproject.tomlplugins/huggingface/tests/test_transformers_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/__init__.pyplugins/huggingface/vision_agents/plugins/huggingface/_local_inference.pyplugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyplugins/moondream/pyproject.tomlplugins/roboflow/tests/test_roboflow.pyplugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.pyplugins/smart_turn/pyproject.tomlpyproject.toml
| template_result = self._apply_template(messages, tools_param) | ||
| if template_result is None: | ||
| return LLMResponseEvent(original=None, text="") | ||
| prepared_input, tools_applied = template_result | ||
|
|
||
| max_tokens = max_new_tokens or self._max_new_tokens | ||
| is_tool_followup = kwargs.pop("_tool_followup", False) | ||
| suppress_events = tools_applied | ||
|
|
||
| self.events.send( | ||
| LLMRequestStartedEvent( | ||
| plugin_name=self._plugin_name, | ||
| model=self.model_id, | ||
| streaming=stream and not suppress_events, | ||
| ) | ||
| ) | ||
|
|
||
| if stream and not suppress_events: | ||
| result = await self._generate_streaming( | ||
| prepared_input, max_tokens, temperature | ||
| ) | ||
| else: | ||
| result = await self._generate_non_streaming( | ||
| prepared_input, max_tokens, temperature, not suppress_events | ||
| ) |
There was a problem hiding this comment.
Return an error response when local generation fails.
Backend template/generation failures currently propagate out of create_response(), skipping the error event and the LLMResponseEvent(exception=...) contract used by LocalVLM.
Wrap template and generation calls with specific exception handling
- template_result = self._apply_template(messages, tools_param)
- if template_result is None:
- return LLMResponseEvent(original=None, text="")
- prepared_input, tools_applied = template_result
-
- max_tokens = max_new_tokens or self._max_new_tokens
- is_tool_followup = kwargs.pop("_tool_followup", False)
- suppress_events = tools_applied
-
- self.events.send(
- LLMRequestStartedEvent(
- plugin_name=self._plugin_name,
- model=self.model_id,
- streaming=stream and not suppress_events,
+ try:
+ template_result = self._apply_template(messages, tools_param)
+ if template_result is None:
+ return LLMResponseEvent(original=None, text="")
+ prepared_input, tools_applied = template_result
+
+ max_tokens = max_new_tokens or self._max_new_tokens
+ is_tool_followup = kwargs.pop("_tool_followup", False)
+ suppress_events = tools_applied
+
+ self.events.send(
+ LLMRequestStartedEvent(
+ plugin_name=self._plugin_name,
+ model=self.model_id,
+ streaming=stream and not suppress_events,
+ )
)
- )
-
- if stream and not suppress_events:
- result = await self._generate_streaming(
- prepared_input, max_tokens, temperature
- )
- else:
- result = await self._generate_non_streaming(
- prepared_input, max_tokens, temperature, not suppress_events
+
+ if stream and not suppress_events:
+ result = await self._generate_streaming(
+ prepared_input, max_tokens, temperature
+ )
+ else:
+ result = await self._generate_non_streaming(
+ prepared_input, max_tokens, temperature, not suppress_events
+ )
+ except (RuntimeError, ValueError) as e:
+ logger.exception("LLM generation failed")
+ self.events.send(
+ events.LLMErrorEvent(
+ plugin_name=self._plugin_name,
+ error_message=str(e),
+ event_data=e,
+ )
)
+ return LLMResponseEvent(original=None, text="", exception=e)As per coding guidelines, “Never write except Exception as e. Catch specific exceptions instead” and “Prefer logger.exception() when logging an error with a traceback instead of logger.error("Error: {exc}").”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py`
around lines 244 - 268, Wrap the template application and generation calls in
create_response (around _apply_template, _generate_streaming, and
_generate_non_streaming) with targeted exception handling so failures do not
escape: catch the specific exceptions those methods can raise (e.g.,
TemplateError/GenerationError or whatever concrete types they throw), call
self.events.send(LLMResponseEvent(original=None, text="", exception=exc)) to
emit the error event consistent with LocalVLM, and use logger.exception(...) to
log the traceback; do not use a bare “except Exception” and ensure the
LLMRequestStartedEvent is still emitted before attempting generation.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py (1)
40-40: Movewarningsto a module-level import.The new MLX handlers add local imports on Lines 40 and 56. Import
warningsonce at the top and reuse it in all handlers.♻️ Proposed import cleanup
+import warnings + from .events import DetectionCompletedEvent from .huggingface_llm import HuggingFaceLLM as LLM from .huggingface_vlm import HuggingFaceVLM as VLM @@ except ImportError as e: - import warnings - optional = {"torch", "transformers", "av", "aiortc", "jinja2", "supervision", "cv2"} @@ except ImportError as e: - import warnings - if _is_mlx_import_error(e): @@ except ImportError as e: - import warnings - if _is_mlx_import_error(e) or e.name in {"av", "aiortc"}:As per coding guidelines, Do not use local imports; import at the top of the module.
Also applies to: 56-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py` at line 40, Move the local "import warnings" statements into a single module-level import at the top of plugins/huggingface/vision_agents/plugins/huggingface/__init__.py and remove the local imports inside the MLX handler functions; update each handler that currently does a local "import warnings" (the MLX handlers) to use the top-level warnings module instead so all handlers reuse the single import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py`:
- Around line 27-32: The _is_mlx_import_error function is too permissive: change
its fallback so it only treats exceptions with exc.name None (shared-library
failures) as MLX import errors instead of checking "mlx" in the entire exception
string; update the logic in _is_mlx_import_error to return True when exc.name is
one of {"mlx","mlx_lm","mlx_vlm","mlx.core"} or when exc.name is None and "mlx"
appears in the exception message, otherwise return False; also move any local
imports of warnings up to the module top so warnings is imported once at
top-level instead of inside functions (remove local imports at lines noted in
the review).
---
Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py`:
- Line 40: Move the local "import warnings" statements into a single
module-level import at the top of
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py and remove the
local imports inside the MLX handler functions; update each handler that
currently does a local "import warnings" (the MLX handlers) to use the top-level
warnings module instead so all handlers reuse the single import.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b39d63fc-b47f-4d30-8369-a0f2e3f80779
📒 Files selected for processing (1)
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py (1)
277-286:⚠️ Potential issue | 🟠 MajorCatch
ValueErrorin the streaming worker thread to match the parent class exception handling.The parent class
LocalTextLLM.create_response()catches bothRuntimeErrorandValueError;MlxLLM._generate_streaming()does the same. IfTransformersLLM.run_generation()only catchesRuntimeError, aValueErrorfrommodel.generate()will leavegeneration_errorunset, causing the stream to close with an empty successful response instead of propagating the error.Fix
- except RuntimeError as e: + except (RuntimeError, ValueError) as e: generation_error = e logger.exception("Generation failed")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py` around lines 277 - 286, The streaming worker run_generation currently only catches RuntimeError so ValueError from model.generate will not set generation_error; update run_generation (inside TransformersLLM.run_generation) to catch both RuntimeError and ValueError, assign the caught exception to generation_error, and call logger.exception with context before finally calling loop.call_soon_threadsafe(async_queue.put_nowait, None) to close the stream; this mirrors LocalTextLLM.create_response() and MlxLLM._generate_streaming() behavior.
♻️ Duplicate comments (1)
plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py (1)
264-294:⚠️ Potential issue | 🟠 MajorGuard template application with the same error contract as generation.
_apply_template()still runs before thetry, so template failures bypass the loggedLLMResponseEvent(exception=...)path and can escape directly to callers. Move template application into the guarded block and catch the concrete template/runtime failures there.Proposed fix
+import jinja2 + ... - template_result = self._apply_template(messages, tools_param) - if template_result is None: - return LLMResponseEvent(original=None, text="") - prepared_input, tools_applied = template_result - - max_tokens = max_new_tokens or self._max_new_tokens - is_tool_followup = kwargs.pop("_tool_followup", False) - suppress_events = tools_applied - - self.events.send( - LLMRequestStartedEvent( - plugin_name=self._plugin_name, - model=self.model_id, - streaming=stream and not suppress_events, - ) - ) - try: + template_result = self._apply_template(messages, tools_param) + if template_result is None: + return LLMResponseEvent(original=None, text="") + prepared_input, tools_applied = template_result + + max_tokens = max_new_tokens or self._max_new_tokens + is_tool_followup = kwargs.pop("_tool_followup", False) + suppress_events = tools_applied + + self.events.send( + LLMRequestStartedEvent( + plugin_name=self._plugin_name, + model=self.model_id, + streaming=stream and not suppress_events, + ) + ) + if stream and not suppress_events: result = await self._generate_streaming( prepared_input, max_tokens, temperature ) else: result = await self._generate_non_streaming( prepared_input, max_tokens, temperature, not suppress_events ) - except (RuntimeError, ValueError) as exc: + except (jinja2.TemplateError, RuntimeError, TypeError, ValueError) as exc: logger.exception("Local text generation failed") error_event = LLMResponseEvent(original=None, text="", exception=exc) self.events.send(error_event) return error_eventVerify
_apply_template()is inside the protected region after the change:#!/bin/bash python - <<'PY' from pathlib import Path path = next(Path(".").rglob("_local_inference.py")) text = path.read_text() start = text.index(" template_result = self._apply_template") end = text.index(" if suppress_events", start) print(text[start:end]) PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py` around lines 264 - 294, Move the call to self._apply_template(...) into the try block that wraps generation so template failures are handled the same as generation errors: call template_result = self._apply_template(messages, tools_param) inside the existing try, keep the existing None check and return of LLMResponseEvent(original=None, text="") if template_result is None, then proceed to unpack prepared_input, tools_applied and the existing generation branches (_generate_streaming/_generate_non_streaming); also ensure any exceptions from _apply_template are caught by the except (RuntimeError, ValueError) handler, log via logger.exception("Local text generation failed") and send/return an LLMResponseEvent(original=None, text="", exception=exc) via self.events.send to preserve the same error contract.
🧹 Nitpick comments (2)
plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py (2)
50-53:**kwargs: Anyhides the public constructor surface.
LocalVLM.__init__has a well-defined signature (model,fps,frame_buffer_seconds,max_frames,max_new_tokens,max_tool_rounds). Forwarding via**kwargs: Anydrops IDE autocompletion, type checking at call sites, and the docstring's argument contract (which is itself lying about defaults). PositionalMlxVLM("mlx-community/...")also stops working.♻️ Proposed refactor — mirror the base signature
- def __init__(self, **kwargs: Any): - super().__init__(**kwargs) + def __init__( + self, + model: str, + fps: int = 1, + frame_buffer_seconds: int = 10, + max_frames: int = 4, + max_new_tokens: int = 512, + max_tool_rounds: int = 3, + ): + super().__init__( + model=model, + fps=fps, + frame_buffer_seconds=frame_buffer_seconds, + max_frames=max_frames, + max_new_tokens=max_new_tokens, + max_tool_rounds=max_tool_rounds, + ) self._multi_turn_warned = False self._tools_unsupported_warned = FalseAs per coding guidelines: "Avoid using
Anytype" and "Use type annotations everywhere."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py` around lines 50 - 53, The MlxVLM constructor currently uses **kwargs: Any which hides the public constructor surface; change MlxVLM.__init__ to mirror the LocalVLM signature (accept explicit parameters: model: str, fps: int, frame_buffer_seconds: float, max_frames: int, max_new_tokens: int, max_tool_rounds: int — using the same defaults as LocalVLM), update type annotations accordingly, call super().__init__(model=model, fps=fps, frame_buffer_seconds=frame_buffer_seconds, max_frames=max_frames, max_new_tokens=max_new_tokens, max_tool_rounds=max_tool_rounds), preserve the _multi_turn_warned and _tools_unsupported_warned initializations, and update the docstring to reflect the explicit parameters and defaults so IDEs and type checkers get accurate signatures.
73-76: Frame downsampling drops the final frame.With
step = len(frames) / self._max_framesandi in range(self._max_frames), indices span[0, len(frames) - step)— the most recent frame is never sampled. For a video agent where recency usually matters most, this is a mild but real bias.♻️ Proposed fix — even spread including endpoints
- sampled = frames - if len(frames) > self._max_frames: - step = len(frames) / self._max_frames - sampled = [frames[int(i * step)] for i in range(self._max_frames)] + sampled = frames + if len(frames) > self._max_frames: + last = len(frames) - 1 + sampled = [ + frames[round(i * last / (self._max_frames - 1))] + for i in range(self._max_frames) + ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py` around lines 73 - 76, The current downsampling logic using step = len(frames) / self._max_frames and sampled = [frames[int(i * step)] for i in range(self._max_frames)] never includes the final frame; change the sampling to evenly include endpoints when downsampling: if len(frames) > self._max_frames compute indices that span 0..len(frames)-1 inclusive (e.g. step = (len(frames)-1) / (self._max_frames-1) when self._max_frames > 1 or use linspace semantics) and then select frames at those rounded indices; keep the early return/assignment to sampled when len(frames) <= self._max_frames and ensure this logic is applied where sampled, frames, and self._max_frames are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py`:
- Around line 105-114: In the generate() call inside mlx_vlm.py (the invocation
that builds `result = generate(...)`) replace the deprecated kwarg
temp=temperature with the standardized temperature=temperature so the mlx-vlm
library actually receives the temperature setting; keep returning result.text
as-is.
---
Outside diff comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 277-286: The streaming worker run_generation currently only
catches RuntimeError so ValueError from model.generate will not set
generation_error; update run_generation (inside TransformersLLM.run_generation)
to catch both RuntimeError and ValueError, assign the caught exception to
generation_error, and call logger.exception with context before finally calling
loop.call_soon_threadsafe(async_queue.put_nowait, None) to close the stream;
this mirrors LocalTextLLM.create_response() and MlxLLM._generate_streaming()
behavior.
---
Duplicate comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_local_inference.py`:
- Around line 264-294: Move the call to self._apply_template(...) into the try
block that wraps generation so template failures are handled the same as
generation errors: call template_result = self._apply_template(messages,
tools_param) inside the existing try, keep the existing None check and return of
LLMResponseEvent(original=None, text="") if template_result is None, then
proceed to unpack prepared_input, tools_applied and the existing generation
branches (_generate_streaming/_generate_non_streaming); also ensure any
exceptions from _apply_template are caught by the except (RuntimeError,
ValueError) handler, log via logger.exception("Local text generation failed")
and send/return an LLMResponseEvent(original=None, text="", exception=exc) via
self.events.send to preserve the same error contract.
---
Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py`:
- Around line 50-53: The MlxVLM constructor currently uses **kwargs: Any which
hides the public constructor surface; change MlxVLM.__init__ to mirror the
LocalVLM signature (accept explicit parameters: model: str, fps: int,
frame_buffer_seconds: float, max_frames: int, max_new_tokens: int,
max_tool_rounds: int — using the same defaults as LocalVLM), update type
annotations accordingly, call super().__init__(model=model, fps=fps,
frame_buffer_seconds=frame_buffer_seconds, max_frames=max_frames,
max_new_tokens=max_new_tokens, max_tool_rounds=max_tool_rounds), preserve the
_multi_turn_warned and _tools_unsupported_warned initializations, and update the
docstring to reflect the explicit parameters and defaults so IDEs and type
checkers get accurate signatures.
- Around line 73-76: The current downsampling logic using step = len(frames) /
self._max_frames and sampled = [frames[int(i * step)] for i in
range(self._max_frames)] never includes the final frame; change the sampling to
evenly include endpoints when downsampling: if len(frames) > self._max_frames
compute indices that span 0..len(frames)-1 inclusive (e.g. step =
(len(frames)-1) / (self._max_frames-1) when self._max_frames > 1 or use linspace
semantics) and then select frames at those rounded indices; keep the early
return/assignment to sampled when len(frames) <= self._max_frames and ensure
this logic is applied where sampled, frames, and self._max_frames are used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a3c92ede-8dcc-4ed0-8bca-3843192983fd
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
conftest.pyexamples/12_resale_advisor_example/pyproject.tomlexamples/12_resale_advisor_example/resale_advisor_example.pyplugins/huggingface/vision_agents/plugins/huggingface/__init__.pyplugins/huggingface/vision_agents/plugins/huggingface/_local_inference.pyplugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/mlx_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyplugins/roboflow/tests/test_roboflow.pypyproject.toml
✅ Files skipped from review due to trivial changes (1)
- plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
🚧 Files skipped from review as they are similar to previous changes (7)
- pyproject.toml
- plugins/roboflow/tests/test_roboflow.py
- conftest.py
- plugins/huggingface/vision_agents/plugins/huggingface/init.py
- examples/12_resale_advisor_example/resale_advisor_example.py
- examples/12_resale_advisor_example/pyproject.toml
- plugins/huggingface/vision_agents/plugins/huggingface/_local_vlm.py
| result = generate( | ||
| model, | ||
| processor, | ||
| prompt=prompt, | ||
| image=images if images else None, | ||
| max_tokens=max_tokens, | ||
| temp=temperature, | ||
| ) | ||
|
|
||
| return result.text |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect mlx-vlm pin and verify generate() signature + return type for that version.
rg -nP --iglob '*.toml' --iglob '*.txt' '\bmlx[-_]vlm\b'
python - <<'PY'
import inspect, mlx_vlm
from mlx_vlm import generate
print("version:", getattr(mlx_vlm, "__version__", "unknown"))
print("generate sig:", inspect.signature(generate))
src = inspect.getsource(generate)
print("returns GenerationResult?", "GenerationResult" in src or ".text" in src)
print("accepts temp kwarg?", "temp" in inspect.signature(generate).parameters)
print("accepts temperature kwarg?", "temperature" in inspect.signature(generate).parameters)
PYRepository: GetStream/Vision-Agents
Length of output: 604
🌐 Web query:
mlx-vlm generate function signature temp vs temperature kwarg return type GenerationResult
💡 Result:
The generate function in mlx-vlm has the following signature: python def generate( model: nn.Module, processor: PreTrainedTokenizer, prompt: str, image: Union[str, List[str]] = None, audio: Union[str, List[str]] = None, verbose: bool = False, kwargs, ) -> GenerationResult: It accepts temperature (float, default 0.0) as a kwargs parameter for sampling control. Older code or examples may use temp, but this was standardized to temperature (Issue #203, closed March 2025). Both may work due to kwargs forwarding, but temperature is the current standard. The function returns a GenerationResult object containing fields like text, prompt_tokens, generation_tokens, prompt_tps, generation_tps, peak_memory, etc.
Citations:
- 1: https://github.com/Blaizzy/mlx-vlm/blob/718f69e9/mlx_vlm/generate.py
- 2: https://www.mintlify.com/yocxy2/mlx-vlm/api/generate
- 3: temp or temperature? Blaizzy/mlx-vlm#203
- 4: Improve
mlx-vlmtype hints forgenerate()/stream_generate()so Pyright/Pylance can catch API mistakes Blaizzy/mlx-vlm#825
Fix mlx-vlm generate() kwarg: use temperature= instead of temp=.
The current code passes temp=temperature on line 110, but the mlx-vlm API standardized on temperature= as of March 2025 (Issue #203). While older code using temp= may be silently absorbed by **kwargs forwarding in some versions, the parameter will likely be ignored, causing the temperature setting to have no effect.
The pinned version (mlx-vlm>=0.4.0) supports the standard temperature= kwarg. Update line 110 to:
temperature=temperature,The .text attribute access on line 114 is correct—generate() returns a GenerationResult object with a .text field.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/mlx_vlm.py` around
lines 105 - 114, In the generate() call inside mlx_vlm.py (the invocation that
builds `result = generate(...)`) replace the deprecated kwarg temp=temperature
with the standardized temperature=temperature so the mlx-vlm library actually
receives the temperature setting; keep returning result.text as-is.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py (1)
38-61: Non-MLXImportErrors frommlx_llm/mlx_vlmre-raise silently — consider surfacing them.When
_is_mlx_import_error(e)is False (and, forMlxVLM,e.nameisn'tav/aiortc), the bareraisepropagates the originalImportErrorwith no context. That's technically fine, but if a user hits e.g. a brokenmlx_lminstall that raisesImportError: cannot import name 'stream_generate' from 'mlx_lm', they'll get the raw traceback with no hint it came fromhuggingface.MlxLLMlazy loading. Optional: attach context viaraise ... from ewith a short message, or at least log atdebugfor diagnostics.Also nit:
__getattr__is missing a return annotation. Per the guideline "Use type annotations everywhere", a-> typewould be consistent with the rest of the codebase (the return is a class object, sotypeworks without resorting toAny).♻️ Optional refinement
-def __getattr__(name: str): +def __getattr__(name: str) -> type: if name == "MlxLLM": try: return import_module(".mlx_llm", __name__).MlxLLM except ImportError as e: if _is_mlx_import_error(e): warnings.warn( "MLX is not available on this platform. " "Install the [mlx] extra on Apple Silicon to enable MLX plugins.", stacklevel=2, ) raiseAs per coding guidelines: "Use type annotations everywhere."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py` around lines 38 - 61, The lazy loader __getattr__ swallows context for non-MLX ImportErrors in MlxLLM and MlxVLM and lacks a return annotation; update __getattr__ (the function handling MlxLLM and MlxVLM imports) to (1) add a return type annotation -> type, and (2) when catching ImportError e that is not identified by _is_mlx_import_error (and for MlxVLM not av/aiortc), re-raise a new ImportError with a short contextual message (e.g., "failed to import MlxLLM via huggingface lazy loader") using "raise ImportError(...) from e" or at minimum log the original exception at debug before re-raising, so users see that the error originated from the huggingface lazy import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/__init__.py`:
- Around line 38-61: The lazy loader __getattr__ swallows context for non-MLX
ImportErrors in MlxLLM and MlxVLM and lacks a return annotation; update
__getattr__ (the function handling MlxLLM and MlxVLM imports) to (1) add a
return type annotation -> type, and (2) when catching ImportError e that is not
identified by _is_mlx_import_error (and for MlxVLM not av/aiortc), re-raise a
new ImportError with a short contextual message (e.g., "failed to import MlxLLM
via huggingface lazy loader") using "raise ImportError(...) from e" or at
minimum log the original exception at debug before re-raising, so users see that
the error originated from the huggingface lazy import.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f1b110a7-c560-431b-8df9-898450c67944
📒 Files selected for processing (3)
plugins/huggingface/vision_agents/plugins/huggingface/__init__.pyplugins/roboflow/tests/test_roboflow.pyplugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/roboflow/tests/test_roboflow.py
- plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
Summary by CodeRabbit
New Features
Refactor
Tests
Chores