Skip to content

fix(vision): propagate mtmd media marker from backend via ModelMetadata#9412

Merged
mudler merged 1 commit intomasterfrom
issue-fix-vision-e2e
Apr 18, 2026
Merged

fix(vision): propagate mtmd media marker from backend via ModelMetadata#9412
mudler merged 1 commit intomasterfrom
issue-fix-vision-e2e

Conversation

@mudler
Copy link
Copy Markdown
Owner

@mudler mudler commented Apr 18, 2026

Upstream llama.cpp (PR #21962) switched the server-side mtmd media marker to a random per-server string and removed the legacy "<media>" backward-compat replacement in mtmd_tokenizer. The Go layer still emitted the hardcoded "<media>", so on the non-tokenizer-template path the prompt arrived with a marker mtmd did not recognize and tokenization failed with "number of bitmaps (1) does not match number of markers (0)".

Report the active media marker via ModelMetadataResponse.media_marker and substitute the sentinel "<media>" with it right before the gRPC call, after the backend has been loaded and probed. Also skip the Go-side multimodal templating entirely when UseTokenizerTemplate is true — llama.cpp's oaicompat_chat_params_parse already injects its own marker and StringContent is unused in that path. Backends that do not expose the field keep the legacy "<media>" behavior.

Upstream llama.cpp (PR #21962) switched the server-side mtmd media
marker to a random per-server string and removed the legacy
"<__media__>" backward-compat replacement in mtmd_tokenizer. The
Go layer still emitted the hardcoded "<__media__>", so on the
non-tokenizer-template path the prompt arrived with a marker mtmd
did not recognize and tokenization failed with "number of bitmaps
(1) does not match number of markers (0)".

Report the active media marker via ModelMetadataResponse.media_marker
and substitute the sentinel "<__media__>" with it right before the
gRPC call, after the backend has been loaded and probed. Also skip
the Go-side multimodal templating entirely when UseTokenizerTemplate
is true — llama.cpp's oaicompat_chat_params_parse already injects its
own marker and StringContent is unused in that path. Backends that do
not expose the field keep the legacy "<__media__>" behavior.
@mudler mudler added the bug Something isn't working label Apr 18, 2026
@mudler mudler merged commit 7809c5f into master Apr 18, 2026
50 of 60 checks passed
@mudler mudler deleted the issue-fix-vision-e2e branch April 18, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant