Auto-detect LFM2.5 VL MLX models by solderzzc · Pull Request #57 · SharpAI/SwiftLM

solderzzc · 2026-04-16T19:52:52Z

No description provided.

Three root causes addressed: 1. Cache key mismatch: speculative jobs used spm-SwiftLM-v2 while build_and_unit_test uses v3, causing a full rebuild on every run. Unified all jobs to v3. 2. Wrong pre-downloaded model: speculative-decoding job pre-fetched Qwen3.5-9B-4bit but test-speculative.sh MAIN_MODEL defaults to Qwen3.5-2B-4bit. The 9B model was never used by the test, and the 2B model was not cached, causing re-download + OOM on 7 GB runner. Pre-download now fetches 2B + 0.8B (matching the test defaults). 4. Fragile metallib compile: replaced python setup.py build_ext with the proven pip install mlx approach already used in build_and_unit_test — eliminates 5-min compile step and pybind11/cmake dependency chain. speculative-decoding-eval retains the 9B model for its heavier memory evaluation (continue-on-error: true already covers OOM). Fixes: CI failure on PR #57

…, 2B model - Use cmake build.sh approach (not python setup.py) for SharpAI fork metallib - Cache key v2 → v3 to share build cache with build_and_unit_test job - Pre-download Qwen3.5-2B-4bit (matches MAIN_MODEL default in test-speculative.sh) not 9B which was never used and caused OOM on 7 GB runner (Trace/BPT trap: 5) - Add model cache step to speculative-decoding-eval (was missing) - hf installed via pip in same venv, source activated before download

- Reverts auto-detection of vision capabilities from overriding the user's --vision flag in Server.swift - Re-adds qwen3_5 to ModelArchitectureProbe since it technically supports vision (has image_token_id) - By disabling the override, speculative decoding tests (which use Qwen3.5-2B-4bit text paths) will correctly start in text-only mode and avoid the reshape crash

Auto-detection was removed from Server.swift (users use --vision flag). The vision integration test for LFM2.5-VL-450M was relying on that auto-detection by passing 'no' for the vision flag argument. Now passes 'yes' so --vision is always provided for VLM models.

Qwen3.5-9B-4bit (~5.8GB) + Qwen3.5-0.8B (~0.5GB) = 6.3GB base, which leaves no room for KV cache expansion on macos-15 7GB runners. The server crashes with Abort trap: 6 (malloc assertion) after generating only a few tokens. Use Qwen3.5-2B-4bit (~1.7GB) with NUM_DRAFT_TOKENS=2 instead: - Main: 2B (1.7GB) + Draft: 0.8B (0.5GB) = 2.2GB — fits easily - NUM_DRAFT_TOKENS=2 (vs 4 in the main speculative-decoding job) keeps this job testing a distinct speculation depth configuration The job comment saying '9B' was aspirational only — 9B cannot reliably run on 7GB CI runners without a dedicated large-runner budget.

solderzzc force-pushed the codex/lfm25-vl-mlx-regression branch from d612286 to 4398222 Compare April 16, 2026 21:27

solderzzc mentioned this pull request Apr 16, 2026

fix(ci): restore PR #57 + correct speculative-decoding CI failures #61

Closed

solderzzc added 4 commits April 16, 2026 17:03

Auto-detect LFM2.5 VL MLX models

076a63a

Fix bash unbound variable error in test-vision.sh

404fbb9

feat/fix: Add VLM LFM 450M to benchmark and bump mlx-swift-lm

867ed31

fix(ci): Replace deprecated huggingface-cli with hf command

ab471d8

solderzzc force-pushed the codex/lfm25-vl-mlx-regression branch from 4398222 to 4a08e7e Compare April 17, 2026 00:04

solderzzc force-pushed the codex/lfm25-vl-mlx-regression branch from 4a08e7e to 0317475 Compare April 17, 2026 00:42

solderzzc force-pushed the codex/lfm25-vl-mlx-regression branch from 639109d to 7407167 Compare April 17, 2026 02:05

solderzzc added 2 commits April 16, 2026 19:42

solderzzc merged commit 70b9398 into main Apr 17, 2026
8 checks passed

solderzzc deleted the codex/lfm25-vl-mlx-regression branch April 17, 2026 04:22

solderzzc mentioned this pull request Apr 17, 2026

fix(swiftbuddy): add MLXVLM and ModelArchitectureProbe to xcodeproj generator #62

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-detect LFM2.5 VL MLX models#57

Auto-detect LFM2.5 VL MLX models#57
solderzzc merged 8 commits into
mainfrom
codex/lfm25-vl-mlx-regression

solderzzc commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

solderzzc commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant