feat: implementation of multimodal runner#892
Merged
NorbertKlockiewicz merged 58 commits intomainfrom Mar 11, 2026
Merged
Conversation
3d22a30 to
0ac42b4
Compare
This was
linked to
issues
Mar 6, 2026
Closed
chmjkb
requested changes
Mar 6, 2026
benITo47
requested changes
Mar 6, 2026
benITo47
reviewed
Mar 6, 2026
benITo47
reviewed
Mar 6, 2026
msluszniak
requested changes
Mar 9, 2026
Contributor
Author
API reference will be generated after the PR is approved |
09954c9 to
8e8a304
Compare
Member
|
On huggingface, you can add information using which version of executorch was model exported. |
chmjkb
requested changes
Mar 10, 2026
Comment on lines
+288
to
+291
| public async generate( | ||
| messages: Message[], | ||
| tools?: LLMTool[] | ||
| tools?: LLMTool[], | ||
| imagePaths?: string[] |
Collaborator
There was a problem hiding this comment.
I'm not sure I understand this:
Why are we passing imagePaths if the Message type includes a mediaPath member? It seems like the user needs to pass the same things twice
271100f to
0eee8b6
Compare
msluszniak
approved these changes
Mar 10, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… EOS IDs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g cache Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kenCount JSI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… runner classes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ad image shape from model metadata Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mage_token from config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
63f27b8 to
4c8b0ff
Compare
chmjkb
approved these changes
Mar 11, 2026
Co-authored-by: Jakub Chmura <92989966+chmjkb@users.noreply.github.com>
benITo47
approved these changes
Mar 11, 2026
Contributor
benITo47
left a comment
There was a problem hiding this comment.
Great changes overall! Thanks!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds vision/multimodal support to useLLM: load a VLM by passing capabilities: ['vision'], then use sendMessage(text, { imagePath }) to send messages with images. Under the hood this introduces a pluggable encoder architecture (IEncoder / VisionEncoder), a dedicated MultimodalRunner, and a refactored BaseLLMRunner with cleaner ownership and shared state. Also exposes getVisualTokenCount() JSI method for accurate token counting with images. No changes to the text-only path.
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Run the
llmexample app, selectmultimodal llmscreen. Select an image and prompt the model.Screenshots
Related issues
Checklist
Additional notes