Skip to content

Commit 61874ea

Browse files
msluszniakclaude
andcommitted
fix(llm): auto-shape multimodal mediaPath messages in chat template
LLMController.generate() collected imagePaths from messages with a mediaPath but did not transform their content into the array form ([{type:'image'}, {type:'text', text}]) that the chat template needs to emit the image placeholder. Calling generate() directly with a vision-capable model thus threw "More images paths provided than '<image>' placeholders in prompt" from native. sendMessage() worked because it built its own historyForTemplate that did the transformation. Move the transformation into applyChatTemplate so both call sites get correct behavior, and remove the now-redundant historyForTemplate block from sendMessage. Public Message.content type unchanged; external callers always pass plain strings, the controller handles the array form internally. Refs #1086 (items 1 and 2 — with item 1 fixed, item 2's type mismatch no longer surfaces because external callers never need to construct the array form themselves). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent da22171 commit 61874ea

1 file changed

Lines changed: 23 additions & 14 deletions

File tree

packages/react-native-executorch/src/controllers/LLMController.ts

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -354,18 +354,6 @@ export class LLMController {
354354
const updatedHistory = [...this._messageHistory, newMessage];
355355
this.messageHistoryCallback(updatedHistory);
356356

357-
const historyForTemplate = updatedHistory.map((m) =>
358-
m.mediaPath
359-
? {
360-
...m,
361-
content: [
362-
{ type: 'image' },
363-
{ type: 'text', text: m.content },
364-
] as any,
365-
}
366-
: m
367-
);
368-
369357
const visualTokenCount = this.nativeModule.getVisualTokenCount();
370358
const countTokensCallback = (messages: Message[]) => {
371359
const rendered = this.applyChatTemplate(
@@ -383,7 +371,7 @@ export class LLMController {
383371
const messageHistoryWithPrompt =
384372
this.chatConfig.contextStrategy.buildContext(
385373
this.chatConfig.systemPrompt,
386-
historyForTemplate,
374+
updatedHistory,
387375
maxContextLength,
388376
countTokensCallback
389377
);
@@ -448,11 +436,32 @@ export class LLMController {
448436
);
449437

450438
const result = template.render({
451-
messages,
439+
messages: messagesForChatTemplate(messages),
452440
tools,
453441
...templateFlags,
454442
...specialTokens,
455443
});
456444
return result;
457445
}
458446
}
447+
448+
/**
449+
* Multimodal chat templates expect message content for image-bearing turns
450+
* to be an array of content parts with an `image` part as a placeholder.
451+
* Callers of `LLMController.generate` and `LLMController.sendMessage` pass
452+
* messages with a plain string `content` plus an optional `mediaPath`; this
453+
* helper rewrites them into the structured form that the template engine
454+
* understands.
455+
* @param messages - Messages to prepare for the chat template engine.
456+
* @returns Messages with image-bearing turns rewritten to structured content.
457+
*/
458+
function messagesForChatTemplate(messages: Message[]): any[] {
459+
return messages.map((m) =>
460+
m.mediaPath && typeof m.content === 'string'
461+
? {
462+
...m,
463+
content: [{ type: 'image' }, { type: 'text', text: m.content }],
464+
}
465+
: m
466+
);
467+
}

0 commit comments

Comments
 (0)