Download MiniCPM-V-4_6 PyTorch model from huggingface to "MiniCPM-V-4_6" folder.
The model must be the standard transformers v5.7.0+ checkpoint (no trust_remote_code); the architecture in config.json is MiniCPMV4_6ForConditionalGeneration with a qwen3_5_text text model and a SigLIP-based vision tower plus a window-attention vit_merger.
If there are differences in usage, please refer to the official build documentation
Clone llama.cpp:
git clone https://github.com/ggml-org/llama.cpp
cd llama.cppBuild llama.cpp using CMake:
cmake -B build
cmake --build build --config ReleaseUnlike older MiniCPM-V variants, MiniCPM-V 4.6 is converted directly through convert_hf_to_gguf.py. The same script is invoked twice on the original Hugging Face directory: once to produce the language-model GGUF and once with --mmproj to produce the multimodal projector GGUF.
# language model
python ./convert_hf_to_gguf.py ../MiniCPM-V-4_6 --outfile ../MiniCPM-V-4_6/ggml-model-f16.gguf
# multimodal projector (vision tower + window-attention vit_merger + DownsampleMLP merger)
python ./convert_hf_to_gguf.py ../MiniCPM-V-4_6 --mmproj --outfile ../MiniCPM-V-4_6/mmproj-model-f16.gguf
# optional: quantize to Q4_K_M
./build/bin/llama-quantize ../MiniCPM-V-4_6/ggml-model-f16.gguf ../MiniCPM-V-4_6/ggml-model-Q4_K_M.gguf Q4_K_MInference on Linux or Mac
# run in single-turn mode
./build/bin/llama-mtmd-cli -m ../MiniCPM-V-4_6/ggml-model-f16.gguf --mmproj ../MiniCPM-V-4_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
# run in conversation mode
./build/bin/llama-mtmd-cli -m ../MiniCPM-V-4_6/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-4_6/mmproj-model-f16.gguf