Model Support Request
We are trying to run these models with mlx-lm but get:
ValueError: Model type deepseek_v4 not supported.
ValueError: Model type bailing_hybrid not supported.
1. DeepSeek-V4-Flash (DeepSeekMoE)
- HF repo: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
- Model type in config.json:
deepseek_v4
- Architecture: DeepSeekMoE with DSA, MLA
- Total params: 284B, Active: ~13B per token
- MLX quantized versions exist: mlx-community has 6bit, mxfp4, 3bit-DQ, 2bit-DQ
- Already supported in SGLang and vLLM
2. Ling-2.6-flash (Bailing Hybrid)
- HF repo: https://huggingface.co/inclusionAI/Ling-2.6-flash
- Model type in config.json:
bailing_hybrid
- Architecture: MoE with MLA + hybrid attention, bailing_moe_v2_5
- Total params: 104B, Active: ~7.4B per token
- MLX quantized version exists: mlx-community/Ling-2.6-flash-mlx-4bit
- Already supported in SGLang
Why this matters
Both models open-sourced April 2026 for agent workflows. MLX is the primary inference backend for Apple Silicon Macs but users cannot run these models locally.
Implementation approach
Model code available in HF repos:
- DeepSeek-V4:
modeling_deepseek.py
- Ling-2.6-flash:
modeling_bailing_moe_v2_5.py
PyTorch implementations need translation to MLX. Architecture follows patterns similar to existing supported models.
Request
Please add support for both model types in the mlx-lm model registry.
Model Support Request
We are trying to run these models with
mlx-lmbut get:1. DeepSeek-V4-Flash (DeepSeekMoE)
deepseek_v42. Ling-2.6-flash (Bailing Hybrid)
bailing_hybridWhy this matters
Both models open-sourced April 2026 for agent workflows. MLX is the primary inference backend for Apple Silicon Macs but users cannot run these models locally.
Implementation approach
Model code available in HF repos:
modeling_deepseek.pymodeling_bailing_moe_v2_5.pyPyTorch implementations need translation to MLX. Architecture follows patterns similar to existing supported models.
Request
Please add support for both model types in the mlx-lm model registry.