Add JANG model loader integration#212
Open
samuelfaj wants to merge 27 commits into
Open
Conversation
Add JANG model loader integration
Contributor
Author
|
Validation update:
|
Contributor
Author
|
Validation update:
|
Contributor
Author
|
Final validation update:
|
Contributor
Author
|
Performance/streaming update:
|
2f48ce6 to
0ee615b
Compare
# Conflicts: # vllm_mlx/routes/chat.py
ea128df to
9b0bb10
Compare
Owner
|
Hi @samuelfaj — thanks for the work. Applying our new SOP §0 necessity gate (see docs/development/pr_merge_sop.md) I need a demand signal before merging. Holding for clarification, not closing yet. Reasoning:
To unlock merge, I need one or more of:
For now please rebase on top of latest Apologies for the friction — the necessity gate is new this week and I'm working through the backlog. Your #204 (Qwen tool-call fix) is being reviewed now since it has clear user value. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
jang_config.jsonbefore the vendored architecture fallback.jang_tools.load_jangtq.load_jangtq_modeland standard JANG models throughjang_tools.loader.load_jang_model.rapid-mlx[jang]dependency extra and regression tests for JANGTQ, JANG v2, and normal DeepSeek V4 fallback behavior.jang-toolsdoes not fall through Transformers AutoConfig for the vendoreddeepseek_v4architecture.Root cause
DeepSeek V4 JANGTQ bundles declare
weight_format: mxtqand store routed experts astq_packed/tq_normstensors. The existing loader treated them like normal DeepSeek V4 MLX weights, somlx_lm.load_modelrejected thousands of unexpected JANGTQ parameters. During live validation,jang-toolsalso hit a DSV4 tokenizer/EOS expansion path that calls Transformers AutoConfig; the wrapper now patches that call for DSV4 JANGTQ to loadtokenizer.jsondirectly.Validation
uv run --extra dev --extra jang python -m pytest tests/test_jangtq_loader.py tests/test_deepseek_v4_vendored.py -quv run --extra dev ruff check pyproject.toml vllm_mlx/utils/tokenizer.py tests/test_jangtq_loader.pyuv run --extra jang python - <<'PY' ... import jang_tools ... PYDeepSeek-V4-Flash-JANGTQdetected asweight_format=mxtq,profile=JANGTQ2.