Skip to content

[Bug] FastVLM-0.5B outputs raw <start_of_*> tokens on macOS (M4) — text and vision both broken #268

@Melaga

Description

@Melaga

Environment

  • Device: Mac Mini M4
  • OS: macOS 26.2 (Apple Silicon / arm64)
  • Flutter Gemma version: 0.15.0
  • Flutter version: 3.41.6 (stable)
  • Dart version: 3.11.4
  • Model: FastVLM-0.5B.litertlm

Bug Description

When using FastVLM-0.5B on macOS, both text-only and image+text inputs produce garbled output consisting entirely of raw <start_of_*> tokens instead of readable text.

Steps to Reproduce

  1. Download FastVLM-0.5B.litertlm from HuggingFace
  2. Run the example app on macOS (M4)
  3. Send a text message (e.g. "hi") — broken output
  4. Send a message with an image attached — also broken output

Actual Output

<start_of_9!!!<start_of_something!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Expected Output

A normal text response describing the input.

Key Comparison

Gemma 4 E2B (vision + text) works correctly on the same machine and setup
FastVLM-0.5B produces garbage tokens for both text-only and image+text

This suggests the issue is FastVLM-specific, not a general macOS vision limitation.

Additional Notes

The model downloads and loads successfully (HTTP 200, 100% progress). The failure is purely at inference/decoding time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions