Skip to content

feat: bump mlx-swift-lm for DeepSeek-V4 support#79

Merged
solderzzc merged 3 commits into
mainfrom
feat/deepseek-v4
Apr 24, 2026
Merged

feat: bump mlx-swift-lm for DeepSeek-V4 support#79
solderzzc merged 3 commits into
mainfrom
feat/deepseek-v4

Conversation

@solderzzc
Copy link
Copy Markdown
Member

Summary

Bumps the mlx-swift-lm submodule to include DeepSeek-V4 model support.

Changes

Notes

  • No mlx-lm Python reference exists yet for V4 — implementation ported from HF inference code (deepseek-ai/DeepSeek-V4-Pro/inference/model.py)
  • Known simplifications: inverse RoPE approximated, hash routing not implemented, sliding window not enforced

Points mlx-swift-lm to feat/deepseek-v4 branch (SharpAI/mlx-swift-lm#33)
which adds DeepseekV4.swift and registers the deepseek_v4 model type.
Copilot AI review requested due to automatic review settings April 24, 2026 07:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bumps the mlx-swift-lm git submodule to a newer revision that adds DeepSeek-V4 model support, including a new model implementation and factory registration (per PR description).

Changes:

  • Update mlx-swift-lm submodule reference to SharpAI/mlx-swift-lm#33 (feat/deepseek-v4).
  • Add DeepSeek-V4 architecture + deepseek_v4 registration inside the submodule (as described in the PR notes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- README: add DeepSeek-V4-Flash (126GB Q3) benchmark table for M5 Pro 64GB
  SSD+TurboQuant delivers 4.16 tok/s at 40K context (13x vs plain SSD Stream)
- profile_runner.py: track peak GPU InUse via background polling thread (0.5s)
  instead of single post-generation snapshot; rename gpu_in_use → gpu_in_use_peak
  throughout; add separate GPU_InUse peak visualization section
- run_benchmark.sh: add Thump604/DeepSeek-V4-Flash-MLX-Q3-mixed-gs128-affine
  to Test 1 model list (option 11)
- mlx-swift-lm: bump submodule to 8a8da29 (attn_sink dtype fix)
@solderzzc solderzzc merged commit 9533e45 into main Apr 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants