fix(ci): restore PR #57 + correct speculative-decoding CI failures by solderzzc · Pull Request #61 · SharpAI/SwiftLM

solderzzc · 2026-04-16T23:50:55Z

Summary

This PR re-applies the changes from the failed PR #57 and adds targeted fixes for the three root causes of the CI failure.

What PR #57 contained (cherry-picked cleanly)

Auto-detect LFM2.5 VL MLX models (ModelArchitectureProbe)
Fix bash unbound variable error in test-vision.sh
Add VLM LFM 450M to benchmark
Replace deprecated huggingface-cli → hf command in CI

Additional fixes for the CI failures

1. Cache key mismatch (v2 → v3)

The speculative jobs used spm-SwiftLM-v2 while build_and_unit_test uses v3. This caused a full rebuild on every run instead of sharing the cache. Unified all jobs to v3.

2. Wrong pre-downloaded model (9B → 2B)

The speculative-decoding job pre-fetched Qwen3.5-9B-4bit but test-speculative.sh defaults MAIN_MODEL to Qwen3.5-2B-4bit. The 9B model was never used, the 2B wasn't cached, leading to a re-download at test time + OOM crash on the 7 GB runner (Trace/BPT trap: 5). Pre-download now fetches 2B + 0.8B.

3. Fragile metallib compile → simple pip install

Replaced the 5-minute python setup.py build_ext → pip-based approach (already proven in build_and_unit_test). Removes the pybind11/cmake dependency chain and the fragile glob path.

speculative-decoding-eval retains Qwen3.5-9B for its heavier memory evaluation (continue-on-error: true already covers OOM there).

Closes #57

Three root causes addressed: 1. Cache key mismatch: speculative jobs used spm-SwiftLM-v2 while build_and_unit_test uses v3, causing a full rebuild on every run. Unified all jobs to v3. 2. Wrong pre-downloaded model: speculative-decoding job pre-fetched Qwen3.5-9B-4bit but test-speculative.sh MAIN_MODEL defaults to Qwen3.5-2B-4bit. The 9B model was never used by the test, and the 2B model was not cached, causing re-download + OOM on 7 GB runner. Pre-download now fetches 2B + 0.8B (matching the test defaults). 4. Fragile metallib compile: replaced python setup.py build_ext with the proven pip install mlx approach already used in build_and_unit_test — eliminates 5-min compile step and pybind11/cmake dependency chain. speculative-decoding-eval retains the 9B model for its heavier memory evaluation (continue-on-error: true already covers OOM). Fixes: CI failure on PR #57

solderzzc · 2026-04-17T00:03:43Z

Closing — wrong approach. Fixing the conflict on PR #57 directly.

solderzzc added 4 commits April 16, 2026 16:48

Auto-detect LFM2.5 VL MLX models

f4c73e1

Fix bash unbound variable error in test-vision.sh

87a2fea

feat/fix: Add VLM LFM 450M to benchmark and bump mlx-swift-lm

096f8fb

fix(ci): Replace deprecated huggingface-cli with hf command

5578cc0

solderzzc force-pushed the fix/pr57-speculative-ci branch from 7b5b14b to 668d474 Compare April 16, 2026 23:55

solderzzc force-pushed the fix/pr57-speculative-ci branch from 668d474 to 186b8f3 Compare April 16, 2026 23:58

solderzzc closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): restore PR #57 + correct speculative-decoding CI failures#61

fix(ci): restore PR #57 + correct speculative-decoding CI failures#61
solderzzc wants to merge 5 commits into
mainfrom
fix/pr57-speculative-ci

solderzzc commented Apr 16, 2026

Uh oh!

solderzzc commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

solderzzc commented Apr 16, 2026

Summary

What PR #57 contained (cherry-picked cleanly)

Additional fixes for the CI failures

1. Cache key mismatch (v2 → v3)

2. Wrong pre-downloaded model (9B → 2B)

3. Fragile metallib compile → simple pip install

Uh oh!

solderzzc commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant