@@ -30,6 +30,16 @@ Export produces a `model.pte` and `aoti_cuda_blob.ptd` containing the
3030compiled CUDA kernels and quantized weights. Int4 quantization is
3131recommended — the model is too large to fit in VRAM at bf16.
3232
33+ ``` bash
34+ python export.py \
35+ --model-id Qwen/Qwen3.5-35B-A3B \
36+ --output-dir ./qwen35_moe_exports \
37+ --qlinear 4w \
38+ --qembedding 8w
39+ ```
40+
41+ Or with a local directory:
42+
3343``` bash
3444python export.py \
3545 --model-dir ~ /models/Qwen3.5-35B-A3B \
@@ -42,7 +52,8 @@ python export.py \
4252
4353| Flag | Default | Description |
4454| ------| ---------| -------------|
45- | ` --model-dir ` | (required) | HuggingFace model directory with ` config.json ` + safetensors |
55+ | ` --model-id ` | (none) | HuggingFace model ID (e.g. ` Qwen/Qwen3.5-35B-A3B ` ). Downloads automatically. |
56+ | ` --model-dir ` | (none) | Local HuggingFace model directory with ` config.json ` + safetensors |
4657| ` --output-dir ` | ` ./qwen35_moe_exports ` | Output directory |
4758| ` --max-seq-len ` | ` 4096 ` | KV cache length |
4859| ` --qlinear ` | (none) | Linear layer quantization: ` 4w ` , ` 8w ` , ` 8da4w ` , ` 8da8w ` |
@@ -144,6 +155,17 @@ with MLX custom ops (`mlx::gather_qmm`, `mlx::gated_delta_rule`, `mlx::rope`).
144155
145156### Export (MLX)
146157
158+ ``` bash
159+ python export.py \
160+ --model-id Qwen/Qwen3.5-35B-A3B \
161+ --backend mlx \
162+ --qlinear 4w \
163+ --qlinear-group-size 64 \
164+ --output-dir ./qwen35_moe_mlx
165+ ```
166+
167+ Or with a local directory:
168+
147169``` bash
148170python export.py \
149171 --model-dir ~ /models/Qwen3.5-35B-A3B \
@@ -158,6 +180,8 @@ python export.py \
158180| Flag | Default | Description |
159181| ------| ---------| -------------|
160182| ` --backend mlx ` | ` cuda ` | Use MLX backend for Apple Silicon |
183+ | ` --model-id ` | (none) | HuggingFace model ID (downloads automatically) |
184+ | ` --model-dir ` | (none) | Local model directory |
161185| ` --qlinear ` | (none) | Linear layer quantization: ` 4w ` , ` 8w ` |
162186| ` --qlinear-group-size ` | ` 32 ` | Group size (64 recommended for MLX) |
163187| ` --qembedding ` | (none) | Embedding quantization: ` 8w ` |
0 commit comments