Skip to content

fix(ci): restore PR #57 + correct speculative-decoding CI failures#61

Closed
solderzzc wants to merge 5 commits into
mainfrom
fix/pr57-speculative-ci
Closed

fix(ci): restore PR #57 + correct speculative-decoding CI failures#61
solderzzc wants to merge 5 commits into
mainfrom
fix/pr57-speculative-ci

Conversation

@solderzzc
Copy link
Copy Markdown
Member

Summary

This PR re-applies the changes from the failed PR #57 and adds targeted fixes for the three root causes of the CI failure.

What PR #57 contained (cherry-picked cleanly)

  • Auto-detect LFM2.5 VL MLX models (ModelArchitectureProbe)
  • Fix bash unbound variable error in test-vision.sh
  • Add VLM LFM 450M to benchmark
  • Replace deprecated huggingface-clihf command in CI

Additional fixes for the CI failures

1. Cache key mismatch (v2 → v3)

The speculative jobs used spm-SwiftLM-v2 while build_and_unit_test uses v3. This caused a full rebuild on every run instead of sharing the cache. Unified all jobs to v3.

2. Wrong pre-downloaded model (9B → 2B)

The speculative-decoding job pre-fetched Qwen3.5-9B-4bit but test-speculative.sh defaults MAIN_MODEL to Qwen3.5-2B-4bit. The 9B model was never used, the 2B wasn't cached, leading to a re-download at test time + OOM crash on the 7 GB runner (Trace/BPT trap: 5). Pre-download now fetches 2B + 0.8B.

3. Fragile metallib compile → simple pip install

Replaced the 5-minute python setup.py build_ext → pip-based approach (already proven in build_and_unit_test). Removes the pybind11/cmake dependency chain and the fragile glob path.

speculative-decoding-eval retains Qwen3.5-9B for its heavier memory evaluation (continue-on-error: true already covers OOM there).

Closes #57

@solderzzc solderzzc force-pushed the fix/pr57-speculative-ci branch from 7b5b14b to 668d474 Compare April 16, 2026 23:55
Three root causes addressed:

1. Cache key mismatch: speculative jobs used spm-SwiftLM-v2 while
   build_and_unit_test uses v3, causing a full rebuild on every run.
   Unified all jobs to v3.

2. Wrong pre-downloaded model: speculative-decoding job pre-fetched
   Qwen3.5-9B-4bit but test-speculative.sh MAIN_MODEL defaults to
   Qwen3.5-2B-4bit. The 9B model was never used by the test, and the
   2B model was not cached, causing re-download + OOM on 7 GB runner.
   Pre-download now fetches 2B + 0.8B (matching the test defaults).

4. Fragile metallib compile: replaced python setup.py build_ext
   with the proven pip install mlx approach already used in
   build_and_unit_test — eliminates 5-min compile step and
   pybind11/cmake dependency chain.

speculative-decoding-eval retains the 9B model for its heavier
memory evaluation (continue-on-error: true already covers OOM).

Fixes: CI failure on PR #57
@solderzzc solderzzc force-pushed the fix/pr57-speculative-ci branch from 668d474 to 186b8f3 Compare April 16, 2026 23:58
@solderzzc
Copy link
Copy Markdown
Member Author

Closing — wrong approach. Fixing the conflict on PR #57 directly.

@solderzzc solderzzc closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant