Skip to content

Commit 47e5f99

Browse files
committed
up
1 parent 92c541a commit 47e5f99

2 files changed

Lines changed: 39 additions & 2 deletions

File tree

examples/models/qwen3_5_moe/README.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,16 @@ Export produces a `model.pte` and `aoti_cuda_blob.ptd` containing the
3030
compiled CUDA kernels and quantized weights. Int4 quantization is
3131
recommended — the model is too large to fit in VRAM at bf16.
3232

33+
```bash
34+
python export.py \
35+
--model-id Qwen/Qwen3.5-35B-A3B \
36+
--output-dir ./qwen35_moe_exports \
37+
--qlinear 4w \
38+
--qembedding 8w
39+
```
40+
41+
Or with a local directory:
42+
3343
```bash
3444
python export.py \
3545
--model-dir ~/models/Qwen3.5-35B-A3B \
@@ -42,7 +52,8 @@ python export.py \
4252

4353
| Flag | Default | Description |
4454
|------|---------|-------------|
45-
| `--model-dir` | (required) | HuggingFace model directory with `config.json` + safetensors |
55+
| `--model-id` | (none) | HuggingFace model ID (e.g. `Qwen/Qwen3.5-35B-A3B`). Downloads automatically. |
56+
| `--model-dir` | (none) | Local HuggingFace model directory with `config.json` + safetensors |
4657
| `--output-dir` | `./qwen35_moe_exports` | Output directory |
4758
| `--max-seq-len` | `4096` | KV cache length |
4859
| `--qlinear` | (none) | Linear layer quantization: `4w`, `8w`, `8da4w`, `8da8w` |
@@ -144,6 +155,17 @@ with MLX custom ops (`mlx::gather_qmm`, `mlx::gated_delta_rule`, `mlx::rope`).
144155

145156
### Export (MLX)
146157

158+
```bash
159+
python export.py \
160+
--model-id Qwen/Qwen3.5-35B-A3B \
161+
--backend mlx \
162+
--qlinear 4w \
163+
--qlinear-group-size 64 \
164+
--output-dir ./qwen35_moe_mlx
165+
```
166+
167+
Or with a local directory:
168+
147169
```bash
148170
python export.py \
149171
--model-dir ~/models/Qwen3.5-35B-A3B \
@@ -158,6 +180,8 @@ python export.py \
158180
| Flag | Default | Description |
159181
|------|---------|-------------|
160182
| `--backend mlx` | `cuda` | Use MLX backend for Apple Silicon |
183+
| `--model-id` | (none) | HuggingFace model ID (downloads automatically) |
184+
| `--model-dir` | (none) | Local model directory |
161185
| `--qlinear` | (none) | Linear layer quantization: `4w`, `8w` |
162186
| `--qlinear-group-size` | `32` | Group size (64 recommended for MLX) |
163187
| `--qembedding` | (none) | Embedding quantization: `8w` |

examples/models/qwen3_5_moe/export.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,11 @@
44
Supports CUDA and MLX backends.
55
66
Usage:
7+
python export.py --model-id Qwen/Qwen3.5-35B-A3B
78
python export.py --model-dir /path/to/Qwen3.5-MoE-A3B
89
python export.py --model-dir /path/to/model --qlinear 4w
910
python export.py --prequantized /path/to/quantized_bundle/
10-
python export.py --model-dir /path/to/model --backend mlx --qlinear 4w
11+
python export.py --model-id Qwen/Qwen3.5-35B-A3B --backend mlx --qlinear 4w
1112
"""
1213

1314
import argparse
@@ -673,6 +674,11 @@ def main():
673674
default=None,
674675
help="HuggingFace model directory (not needed with --prequantized)",
675676
)
677+
parser.add_argument(
678+
"--model-id",
679+
default=None,
680+
help="HuggingFace model-id",
681+
)
676682
parser.add_argument(
677683
"--output-dir", default="./qwen35_moe_exports", help="Output directory"
678684
)
@@ -731,6 +737,13 @@ def main():
731737
)
732738
args = parser.parse_args()
733739

740+
if args.model_id:
741+
if args.model_dir is not None:
742+
raise ValueError("Cannot specify model_dir when model_id is provided.")
743+
from huggingface_hub import snapshot_download
744+
745+
args.model_dir = snapshot_download(repo_id=args.model_id)
746+
734747
if not args.prequantized and not args.model_dir and not args.tiny_test:
735748
parser.error(
736749
"--model-dir is required unless --prequantized or --tiny-test is provided."

0 commit comments

Comments
 (0)