-
Notifications
You must be signed in to change notification settings - Fork 633
Draft - MLX Experimentation #474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Nash0x7E2
wants to merge
23
commits into
main
Choose a base branch
from
feature/mlx-hf
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
38f8ba5
add resale advisor example
maxkahan 36a3160
Upgrade transformers library version
Nash0x7E2 e4d7035
Change example to voice only for local testing
Nash0x7E2 ec6719e
Add resolve_device for mapping DeviceType
Nash0x7E2 047d7d6
Extractr common logic to _local_inference
Nash0x7E2 17df094
Add API for MLX models
Nash0x7E2 6dafa40
Convert example to use Gemma MLX Quant
Nash0x7E2 cad81d7
Add huggingface to root toml
Nash0x7E2 61861f1
Fixes for formatting and mypy
Nash0x7E2 61674a6
Fast-path tool-call extraction when output has no JSON braces
Nash0x7E2 8e31104
Share _extract_last_user_text helper between local VLMs
Nash0x7E2 09f4f56
Pass PIL images directly to mlx-vlm generate, drop temp-file PNGs
Nash0x7E2 57430fe
Fix mlx_lm.generate string return; warn on hung generation threads
Nash0x7E2 c098cfd
Gate MLX dev dep on Apple Silicon so Linux CI skips install
Nash0x7E2 95e0785
Add requires_mlx pytest skip markers for Apple-only tests
Nash0x7E2 9ad044c
Gate huggingface [mlx] and [mlx-vlm] extras on Apple Silicon
Nash0x7E2 abf231b
Harden MLX import-failure detection for missing shared libs
Nash0x7E2 12c321b
Format __init__.py per ruff
Nash0x7E2 6471a4d
fix(huggingface): preserve local inference followup behavior
Nash0x7E2 9649575
fix(huggingface): align mlx and transformers local settings
Nash0x7E2 b0726e6
chore(examples): tighten resale example dependencies
Nash0x7E2 2af000d
fix(huggingface): lazy-load mlx plugins
Nash0x7E2 4cfc1f5
fix(roboflow): handle filtered detections safely
Nash0x7E2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| [project] | ||
| name = "resale-advisor-example" | ||
| version = "0.0.0" | ||
| requires-python = ">=3.10" | ||
|
|
||
| dependencies = [ | ||
| "python-dotenv>=1.0", | ||
| "vision-agents-plugins-huggingface[mlx-vlm]", | ||
| "vision-agents-plugins-getstream", | ||
| "vision-agents-plugins-deepgram", | ||
| "vision-agents", | ||
| ] | ||
|
|
||
| [tool.uv.sources] | ||
| "vision-agents-plugins-huggingface" = {path = "../../plugins/huggingface", editable=true} | ||
| "vision-agents-plugins-getstream" = {path = "../../plugins/getstream", editable=true} | ||
| "vision-agents-plugins-deepgram" = {path = "../../plugins/deepgram", editable=true} | ||
| "vision-agents" = {path = "../../agents-core", editable=true} |
77 changes: 77 additions & 0 deletions
77
examples/12_resale_advisor_example/resale_advisor_example.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| """ | ||
| Resale Advisor with Gemma 4 - Local VLM Agent (MLX) | ||
|
|
||
| A real-time resale advisor powered by Gemma 4 E4B running on Apple Silicon via | ||
| MLX. Demonstrates how to build a multimodal AI agent that can see an item on | ||
| camera, discuss its condition, and provide resale-oriented guidance with voice: | ||
|
|
||
| - Gemma 4 E4B (8-bit quantized) via mlx-vlm for vision-language inference | ||
| - Deepgram for speech-to-text and text-to-speech | ||
| - GetStream for real-time communication | ||
|
|
||
| The user speaks naturally and the agent responds with voice, describing the | ||
| item, asking clarifying questions when needed, and giving a rough resale view. | ||
|
|
||
| Requirements: | ||
| - STREAM_API_KEY and STREAM_API_SECRET environment variables | ||
| - DEEPGRAM_API_KEY environment variable | ||
| - Apple Silicon Mac with 16GB+ unified memory | ||
|
|
||
| First run will download the MLX model (~8GB). | ||
| """ | ||
|
|
||
| import asyncio | ||
| import logging | ||
|
|
||
| from dotenv import load_dotenv | ||
| from vision_agents.core import Agent, Runner, User | ||
| from vision_agents.core.agents import AgentLauncher | ||
| from vision_agents.plugins import deepgram, getstream, huggingface | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| load_dotenv() | ||
|
|
||
| SYSTEM_PROMPT = ( | ||
| "You are a resale advisor running on a local Gemma 4 model. " | ||
| "You can see the user's camera feed. Identify the item, comment on visible " | ||
| "condition, ask for age or brand details when needed, and give a cautious " | ||
| "resale estimate or range when the user asks. Speak naturally, with no " | ||
| "lists or formatting. Never use emojis or special characters. Keep " | ||
| "responses under 60 words and be explicit when you are uncertain." | ||
| ) | ||
|
|
||
|
|
||
| async def create_agent(**kwargs) -> Agent: | ||
| """Create a resale advisor agent with Gemma 4 VLM.""" | ||
| agent = Agent( | ||
| edge=getstream.Edge(), | ||
| agent_user=User(name="Resale Advisor", id="agent"), | ||
| instructions=SYSTEM_PROMPT, | ||
| llm=huggingface.MlxVLM( | ||
| model="mlx-community/gemma-4-e4b-it-8bit", | ||
| max_new_tokens=150, | ||
| ), | ||
| tts=deepgram.TTS(), | ||
| stt=deepgram.STT(), | ||
| ) | ||
|
|
||
| return agent | ||
|
|
||
|
|
||
| async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: | ||
| """Join the call and run the agent.""" | ||
| call = await agent.create_call(call_type, call_id) | ||
|
|
||
| logger.info("Starting Resale Advisor...") | ||
|
|
||
| async with agent.join(call): | ||
| await asyncio.sleep(2) | ||
| await agent.llm.simple_response( | ||
| text="Greet the user briefly. Tell them you can inspect items on camera and help with resale guidance.", | ||
| ) | ||
| await agent.finish() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.