Skip to content

[Model] Support MiniCPM-V 4.6#22529

Open
tc-mb wants to merge 3 commits intoggml-org:masterfrom
tc-mb:Support-MiniCPM-V-4.6
Open

[Model] Support MiniCPM-V 4.6#22529
tc-mb wants to merge 3 commits intoggml-org:masterfrom
tc-mb:Support-MiniCPM-V-4.6

Conversation

@tc-mb
Copy link
Copy Markdown
Contributor

@tc-mb tc-mb commented Apr 29, 2026

This PR adds support for MiniCPM-V 4.6.

  1. v4.6 reworks the vision tower — the resampler is replaced by a new merger structure, which improves ViT efficiency.
  2. The GGUF conversion is also folded into the standard convert_hf_to_gguf.py flow.

tc-mb added 2 commits April 29, 2026 23:32
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
@tc-mb tc-mb requested review from a team and CISC as code owners April 29, 2026 18:50
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
@github-actions github-actions Bot added documentation Improvements or additions to documentation examples python python script changes labels Apr 29, 2026
Comment thread convert_hf_to_gguf.py
Comment on lines +1532 to +1534
if chkhsh == "1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f":
# ref: MiniCPM-V 4.6 (Qwen3.5 Flash based)
res = "qwen35"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add it directly like this, it needs to be added to convert_hf_to_gguf_update.py and ran, which will add it here.

Since it's a duplicate you need to add it to pre_computed_hashes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get, done.

Comment thread convert_hf_to_gguf.py
Comment on lines +5498 to +5501
# GGUF tensor names mirror the C++ definitions in tools/mtmd/clip-impl.h:
# TN_INSERT_MERGER_* -> v.insert_merger.*
# TN_MERGER_* -> merger.*
_VIT_MERGER_MAP = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not cool, use tensor_mapping.py like everyone else.

Copy link
Copy Markdown
Contributor

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change the naming to minicpmv2_6 everywhere for consistency

Comment on lines +227 to +233
ggml_tensor * kq = ggml_mul_mat(ctx0, k, q);
kq = ggml_soft_max_ext(ctx0, kq, nullptr, kq_scale, 0.0f);
ggml_tensor * kqv = ggml_mul_mat(ctx0, v, kq);
cur = ggml_permute(ctx0, kqv, 0, 2, 1, 3);
cur = ggml_cont_2d(ctx0, cur, n_embd, n_patches);

cur = ggml_mul_mat(ctx0, model.insert_merger_attn_o_w, cur);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use build_attn to allow flash attention support

ggml_cgraph * build() override;
};

struct clip_graph_minicpmv_merger : clip_graph {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh the naming has been quite messy, better to just name the model by its version instead:

Suggested change
struct clip_graph_minicpmv_merger : clip_graph {
struct clip_graph_minicpmv2_6 : clip_graph {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I accept the version number naming convention.
However, this version is V4.6. Could we use clip_graph_minicpmv4_6 instead?

Comment thread tools/mtmd/clip-model.h
Comment on lines +425 to +431
// MiniCPM-V 4.5 / 4.6 final merger (DownsampleMLP)
ggml_tensor * merger_pre_norm_w = nullptr;
ggml_tensor * merger_pre_norm_b = nullptr;
ggml_tensor * merger_mlp_up_w = nullptr;
ggml_tensor * merger_mlp_up_b = nullptr;
ggml_tensor * merger_mlp_down_w = nullptr;
ggml_tensor * merger_mlp_down_b = nullptr;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the existing mm_ffn_* tensors instead

Comment thread tools/mtmd/mtmd.cpp
tok_row_end_trail = false; // no trailing end-of-row token
ov_img_first = true;
} else if (minicpmv_version == 3 || minicpmv_version == 4 || minicpmv_version == 5 || minicpmv_version == 6 || minicpmv_version == 100045) {
} else if (minicpmv_version == 3 || minicpmv_version == 4 || minicpmv_version == 5 || minicpmv_version == 6 || minicpmv_version == 100045 || minicpmv_version == 46) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the level of consistency is quite questionable here

better to just don't use the minicpmv_version and add a new projector type for each version instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants