Skip to content
Discussion options

You must be logged in to vote

Hi @rguiscard, let me clarify both points:

1. Where models are stored

OptiLLM uses HuggingFace's standard cache (~/.cache/huggingface/hub/). When you pass a model ID like mlx-community/Qwen3-8B-4bit, it downloads once and reuses it on subsequent runs so it does not re-download. You can override the location with the HF_HOME environment variable.

2. Loading MLX models (including from local disk)

Yes, OptiLLM has native MLX support on Apple Silicon. Any MLX model is auto-detected and routed through the MLX inference pipeline (via mlx_lm) when the model name matches mlx-community/, mlx-, or -mlx- patterns. See optillm/inference.py (should_use_mlx, MLXInferencePipeline).

Quick start:

# Instal…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by rguiscard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #306 on May 07, 2026 03:27.