Commit 26a4936
Port Qwen3-VL processor to torch-free numpy implementation
HF's Qwen3-VL image/video processors hard-require torch/torchvision.
Inline the numpy port adapted from mlx-vlm (commit 1bf7742, unreleased)
so mlx-embeddings can run without torch installed — including real
checkpoints like mlx-community/Qwen3-VL-2B-Instruct-4bit.
Drops the AutoImageProcessor.from_pretrained(use_fast=False) path, the
_UnsupportedVideoProcessor stub, and the object.__new__(Qwen3VLProcessor)
trick. Processor.from_pretrained now delegates to the local torch-free
Qwen3VLProcessor.from_pretrained, which reads processor_config.json /
preprocessor_config.json / video_preprocessor_config.json directly and
builds numpy Qwen3VLImageProcessor / Qwen3VLVideoProcessor.
Small fixes on top of the mlx-vlm source:
- Flatten list-of-list image/video batches (HF's apply_chat_template
nests them that way).
- Treat explicit None in preprocessor_config.json (min_pixels/max_pixels)
the same as missing — the 2B Instruct checkpoint ships nulls alongside
valid size.shortest_edge/longest_edge.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent ea6d739 commit 26a4936
2 files changed
Lines changed: 655 additions & 134 deletions
0 commit comments