Command-line tool for downloading models and datasets from HuggingFace Hub and converting model weights for use with @mlx-node/* packages.
- macOS with Apple Silicon (M1 or later)
- Node.js 18+
npm install -g @mlx-node/cliOr run via your package manager:
npx @mlx-node/cli download model --model Qwen/Qwen3-0.6BDownload model weights and tokenizer files from HuggingFace Hub:
mlx download model --model Qwen/Qwen3-0.6BDownloads to .cache/models/<model-slug> by default. Skips if already downloaded.
| Flag | Short | Default | Description |
|---|---|---|---|
--model |
-m |
Qwen/Qwen3-0.6B |
HuggingFace model name |
--output |
-o |
.cache/models/<slug> |
Output directory |
--glob |
-g |
(all supported files) | Filter files by glob pattern (repeatable) |
--set-token |
Set up HuggingFace authentication |
For gated or private models, set up your HuggingFace token:
mlx download model --set-tokenThis validates the token against the HuggingFace API and stores it securely in your OS keychain. The token is automatically used for subsequent downloads.
By default, downloads config files, tokenizer files, and weight files (.safetensors, .json, .pdiparams, .yml). Use --glob to filter specific files:
# Download only bf16 safetensors shards
mlx download model --model Qwen/Qwen3-7B --glob "*.bf16*.safetensors"
# Download specific files
mlx download model --model org/model --glob "*.safetensors" --glob "*.json"Core config and tokenizer files are always included regardless of glob filters.
Download datasets from HuggingFace Hub with automatic Parquet-to-JSONL conversion:
mlx download dataset| Flag | Short | Default | Env Override | Description |
|---|---|---|---|---|
--dataset |
-d |
openai/gsm8k |
GSM8K_DATASET |
HuggingFace dataset name |
--revision |
-r |
main |
GSM8K_REVISION |
Dataset revision/branch |
--output |
-o |
data/gsm8k |
GSM8K_OUTPUT_DIR |
Output directory |
Produces train.jsonl and test.jsonl in the output directory. Automatically converts Parquet files to JSONL if the dataset doesn't include JSONL directly.
Convert model weights between formats with optional quantization:
# Dtype conversion (SafeTensors)
mlx convert --input ./model --output ./model-bf16 --dtype bfloat16
# Quantization
mlx convert --input ./model --output ./model-q4 --quantize --q-bits 4
# Mixed-precision quantization recipe
mlx convert --input ./model --output ./model-mixed --quantize --q-recipe mixed_4_6
# MXFP8 quantization
mlx convert --input ./model --output ./model-fp8 --quantize --q-mode mxfp8
# GGUF to SafeTensors
mlx convert --input ./model.gguf --output ./model-safetensors
# GGUF with vision encoder
mlx convert --input ./model.gguf --output ./model-vlm --mmproj ./mmproj.gguf
# imatrix AWQ pre-scaling with unsloth dynamic quantization
mlx convert --input ./model --output ./model-unsloth --quantize --q-recipe unsloth --imatrix-path ./imatrix.gguf| Flag | Short | Default | Description |
|---|---|---|---|
--input |
-i |
required | Input model directory or .gguf file |
--output |
-o |
required | Output directory |
--dtype |
-d |
bfloat16 |
Target dtype: float32, float16, bfloat16 |
--model-type |
-m |
auto-detected | Model type override |
--verbose |
-v |
false |
Verbose logging |
--quantize |
-q |
false |
Enable quantization |
--q-bits |
4 |
Quantization bits (4 or 8) | |
--q-group-size |
64 |
Quantization group size | |
--q-mode |
affine |
Mode: affine or mxfp8 |
|
--q-recipe |
Per-layer mixed-bit recipe | ||
--imatrix-path |
imatrix GGUF for AWQ pre-scaling | ||
--mmproj |
mmproj GGUF for vision encoder weights |
Auto-detected from config.json when not specified:
| Type | Description |
|---|---|
| (default) | Standard SafeTensors dtype conversion |
qwen3_5 |
Qwen3.5 Dense with FP8 dequant and key remapping |
qwen3_5_moe |
Qwen3.5 MoE with expert stacking |
paddleocr-vl |
PaddleOCR-VL weight sanitization |
pp-lcnet-ori |
PP-LCNet orientation classifier (Paddle to SafeTensors) |
uvdoc |
UVDoc dewarping model (Paddle/PyTorch to SafeTensors) |
| Recipe | Description |
|---|---|
mixed_2_6 |
2-bit base, 6-bit sensitive layers |
mixed_3_4 |
3-bit base, 4-bit sensitive layers |
mixed_3_6 |
3-bit base, 6-bit sensitive layers |
mixed_4_6 |
4-bit base, 6-bit sensitive layers |
qwen3_5 |
Optimized for Qwen3.5 hybrid architecture |
unsloth |
Unsloth Dynamic 2.0 (requires --imatrix-path) |
MLX affine equivalent of Unsloth Dynamic 2.0 (UD) GGUF quantization. Designed for Qwen3.5's hybrid GatedDeltaNet + full attention architecture. Requires imatrix for AWQ pre-scaling of attention/SSM weights.
# UD-Q3_K_XL equivalent (~17 GB for 35B-A3B)
mlx convert -i ./model -o ./model-q3 -q --q-bits 3 --q-recipe unsloth --imatrix-path ./imatrix.gguf
# UD-Q4_K_XL equivalent (~20 GB for 35B-A3B)
mlx convert -i ./model -o ./model-q4 -q --q-bits 4 --q-recipe unsloth --imatrix-path ./imatrix.ggufPer-tensor bit assignments (N = --q-bits):
| Weight | Bits | Rationale |
|---|---|---|
gate_proj, up_proj |
N | Bulk of MoE expert params, safe at low bits |
down_proj |
N+1 | Slightly more sensitive than other FFN weights |
embed_tokens |
N+2 | Very low KLD sensitivity (~0.15) |
self_attn.q/k/v_proj |
N+2 | AWQ-correctable via input_layernorm |
linear_attn.in_proj_qkv/z |
N+2 | AWQ-correctable via input_layernorm |
lm_head |
N+3 | Safest tensor (KLD ~0.05) |
| Router gates | 8 | Standard for MoE routing accuracy |
self_attn.o_proj |
bf16 | No preceding norm — not AWQ-correctable |
linear_attn.out_proj |
bf16 | Worst tensor (KLD ~6.0) — not AWQ-correctable |
GDN params (A_log, etc.) |
bf16 | Recurrent state params, errors compound over time |
# 1. Set up authentication (one-time)
mlx download model --set-token
# 2. Download a model
mlx download model --model Qwen/Qwen3-0.6B
# 3. Download training data
mlx download dataset --dataset openai/gsm8k
# 4. Quantize the model
mlx convert \
--input .cache/models/Qwen-Qwen3-0.6B \
--output .cache/models/Qwen3-0.6B-q4 \
--quantize --q-bits 4
# 5. Use in your application
# import { loadModel } from '@mlx-node/lm';
# const model = await loadModel('.cache/models/Qwen3-0.6B-q4');# Download a GGUF model
mlx download model --model user/model-gguf --glob "*.gguf"
# Convert to SafeTensors
mlx convert --input .cache/models/model-gguf/model.gguf --output ./model-converted