Skip to content

Commit b271aa6

Browse files
msluszniakclaude
andauthored
fix(llm): auto-shape multimodal mediaPath messages in chat template (#1089)
## Description `LLMController.generate()` collected `imagePaths` from messages with a `mediaPath` set, but never transformed their `content` into the `[{type:'image'}, {type:'text', text}]` form that the chat template needs to emit the `<image>` placeholder. Calling `generate()` directly with a vision-capable model (e.g. LFM2-VL) thus threw `"More images paths provided than '<image>' placeholders in prompt"` from native, even though `sendMessage()` worked because it built its own `historyForTemplate` that did the transformation. This PR moves the transformation into `applyChatTemplate` so both call sites (`generate` and `sendMessage`) get the correct behavior, and removes the now-redundant `historyForTemplate` block from `sendMessage`. The public `Message.content` type stays `string` — external callers always pass plain strings; the controller handles the structured array form internally. The helper is idempotent: messages whose `content` is already an array (e.g. callers who pre-shaped it as a workaround) are passed through unchanged. ### Introduces a breaking change? - [ ] Yes - [x] No Public types are unchanged. `sendMessage` produces an identical rendered chat-template string (the transformation just happens one step later in the pipeline; token count and rendered output are byte-identical). `generate` only changes behavior in cases that previously threw — pure bug fix. ### Type of change - [x] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [ ] Android The original bug was reproduced on a vision-capable model (LFM2-VL-1.6B-quantized) on Android while building a downstream consumer app. Re-verification of the fix on a real device is recommended before merge — see Testing instructions below. ~I have not personally re-run the failing scenario after the fix.~ ### Testing instructions To reproduce the original bug (without this PR): ```ts import { LLMModule, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch'; const llm = await LLMModule.fromModelName(LFM2_VL_1_6B_QUANTIZED); await llm.generate([ { role: 'user', content: 'Describe this image.', mediaPath: 'file:///path/to/image.jpg' }, ]); // Throws: "More images paths provided than '<image>' placeholders in prompt" ``` With this PR applied, the same call should succeed and return the model's description. Regression check: a vision-capable `sendMessage(text, { imagePath })` flow should continue producing identical output. ### Screenshots N/A (controller change, no UI). ### Related issues Addresses items 1 and 2 of #1086. With item 1 fixed, item 2's `Message.content` type mismatch no longer surfaces in practice because external callers never need to construct the array form themselves (the `as unknown as string` workaround that motivated #2 becomes unnecessary). ### Checklist - [ ] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [ ] My changes generate no new warnings ### Additional notes The `messagesForChatTemplate` helper lives at module scope rather than as a static class method because it doesn't depend on controller state. Internal `any[]` return is a deliberate concession to the dynamic shape the chat-template engine accepts; the public `Message[]` input/output contract stays well-typed. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent e8d4305 commit b271aa6

1 file changed

Lines changed: 23 additions & 14 deletions

File tree

packages/react-native-executorch/src/controllers/LLMController.ts

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -354,18 +354,6 @@ export class LLMController {
354354
const updatedHistory = [...this._messageHistory, newMessage];
355355
this.messageHistoryCallback(updatedHistory);
356356

357-
const historyForTemplate = updatedHistory.map((m) =>
358-
m.mediaPath
359-
? {
360-
...m,
361-
content: [
362-
{ type: 'image' },
363-
{ type: 'text', text: m.content },
364-
] as any,
365-
}
366-
: m
367-
);
368-
369357
const visualTokenCount = this.nativeModule.getVisualTokenCount();
370358
const countTokensCallback = (messages: Message[]) => {
371359
const rendered = this.applyChatTemplate(
@@ -383,7 +371,7 @@ export class LLMController {
383371
const messageHistoryWithPrompt =
384372
this.chatConfig.contextStrategy.buildContext(
385373
this.chatConfig.systemPrompt,
386-
historyForTemplate,
374+
updatedHistory,
387375
maxContextLength,
388376
countTokensCallback
389377
);
@@ -448,7 +436,7 @@ export class LLMController {
448436
);
449437

450438
const result = template.render({
451-
messages,
439+
messages: messagesForChatTemplate(messages),
452440
tools,
453441
...templateFlags,
454442
...specialTokens,
@@ -468,3 +456,24 @@ export class LLMController {
468456
function normalizeImagePath(path: string): string {
469457
return path.startsWith('file://') ? path : `file://${path}`;
470458
}
459+
460+
/**
461+
* Multimodal chat templates expect message content for image-bearing turns
462+
* to be an array of content parts with an `image` part as a placeholder.
463+
* Callers of `LLMController.generate` and `LLMController.sendMessage` pass
464+
* messages with a plain string `content` plus an optional `mediaPath`; this
465+
* helper rewrites them into the structured form that the template engine
466+
* understands.
467+
* @param messages - Messages to prepare for the chat template engine.
468+
* @returns Messages with image-bearing turns rewritten to structured content.
469+
*/
470+
function messagesForChatTemplate(messages: Message[]): any[] {
471+
return messages.map((m) =>
472+
m.mediaPath && typeof m.content === 'string'
473+
? {
474+
...m,
475+
content: [{ type: 'image' }, { type: 'text', text: m.content }],
476+
}
477+
: m
478+
);
479+
}

0 commit comments

Comments
 (0)