Skip to content

Add model support for DeepSeek-V4 (deepseek_v4) and Ling-2.6-flash (bailing_hybrid) #1233

@Keenni

Description

@Keenni

Model Support Request

We are trying to run these models with mlx-lm but get:

ValueError: Model type deepseek_v4 not supported.
ValueError: Model type bailing_hybrid not supported.

1. DeepSeek-V4-Flash (DeepSeekMoE)

  • HF repo: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
  • Model type in config.json: deepseek_v4
  • Architecture: DeepSeekMoE with DSA, MLA
  • Total params: 284B, Active: ~13B per token
  • MLX quantized versions exist: mlx-community has 6bit, mxfp4, 3bit-DQ, 2bit-DQ
  • Already supported in SGLang and vLLM

2. Ling-2.6-flash (Bailing Hybrid)

  • HF repo: https://huggingface.co/inclusionAI/Ling-2.6-flash
  • Model type in config.json: bailing_hybrid
  • Architecture: MoE with MLA + hybrid attention, bailing_moe_v2_5
  • Total params: 104B, Active: ~7.4B per token
  • MLX quantized version exists: mlx-community/Ling-2.6-flash-mlx-4bit
  • Already supported in SGLang

Why this matters

Both models open-sourced April 2026 for agent workflows. MLX is the primary inference backend for Apple Silicon Macs but users cannot run these models locally.

Implementation approach

Model code available in HF repos:

  • DeepSeek-V4: modeling_deepseek.py
  • Ling-2.6-flash: modeling_bailing_moe_v2_5.py

PyTorch implementations need translation to MLX. Architecture follows patterns similar to existing supported models.

Request

Please add support for both model types in the mlx-lm model registry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions