Environment
- Device: Mac Mini M4
- OS: macOS 26.2 (Apple Silicon / arm64)
- Flutter Gemma version: 0.15.0
- Flutter version: 3.41.6 (stable)
- Dart version: 3.11.4
- Model:
FastVLM-0.5B.litertlm
Bug Description
When using FastVLM-0.5B on macOS, both text-only and image+text inputs produce garbled output consisting entirely of raw <start_of_*> tokens instead of readable text.
Steps to Reproduce
- Download
FastVLM-0.5B.litertlm from HuggingFace
- Run the example app on macOS (M4)
- Send a text message (e.g. "hi") — broken output
- Send a message with an image attached — also broken output
Actual Output
<start_of_9!!!<start_of_something!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Expected Output
A normal text response describing the input.
Key Comparison
✅ Gemma 4 E2B (vision + text) works correctly on the same machine and setup
❌ FastVLM-0.5B produces garbage tokens for both text-only and image+text
This suggests the issue is FastVLM-specific, not a general macOS vision limitation.
Additional Notes
The model downloads and loads successfully (HTTP 200, 100% progress). The failure is purely at inference/decoding time.
Environment
FastVLM-0.5B.litertlmBug Description
When using FastVLM-0.5B on macOS, both text-only and image+text inputs produce garbled output consisting entirely of raw
<start_of_*>tokens instead of readable text.Steps to Reproduce
FastVLM-0.5B.litertlmfrom HuggingFaceActual Output
Expected Output
A normal text response describing the input.
Key Comparison
✅ Gemma 4 E2B (vision + text) works correctly on the same machine and setup
❌ FastVLM-0.5B produces garbage tokens for both text-only and image+text
This suggests the issue is FastVLM-specific, not a general macOS vision limitation.
Additional Notes
The model downloads and loads successfully (HTTP 200, 100% progress). The failure is purely at inference/decoding time.