Skip to content

Commit 17ce70b

Browse files
docs: write an instruction for using llm with vision capabilities
1 parent b936b86 commit 17ce70b

File tree

3 files changed

+131
-9
lines changed

3 files changed

+131
-9
lines changed

.cspell-wordlist.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
multimodal
12
swmansion
23
executorch
34
execu

docs/docs/03-hooks/01-natural-language-processing/useLLM.md

Lines changed: 81 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -481,14 +481,86 @@ The response should include JSON:
481481

482482
Depending on selected model and the user's device generation speed can be above 60 tokens per second. If the [`tokenCallback`](../../06-api-reference/classes/LLMModule.md#tokencallback) from [`LLMModule`](../../06-api-reference/classes/LLMModule.md), which is used under the hood, triggers rerenders and is invoked on every single token it can significantly decrease the app's performance. To alleviate this and help improve performance we've implemented token batching. To configure this you need to call [`configure`](../../06-api-reference/interfaces/LLMType.md#configure) method and pass [`generationConfig`](../../06-api-reference/interfaces/LLMConfig.md#generationconfig). You can check what you can configure [Configuring the Model](../../03-hooks/01-natural-language-processing/useLLM.md#configuring-the-model). They set the size of the batch before tokens are emitted and the maximum time interval between consecutive batches respectively. Each batch is emitted if either `timeInterval` elapses since last batch or `countInterval` number of tokens are generated. This allows for smooth generation even if model lags during generation. Default parameters are set to 10 tokens and 80ms for time interval (~12 batches per second).
483483

484+
## Vision-Language Models (VLM)
485+
486+
Some models support multimodal input — text and images together. To use them, pass a `capabilities` array when loading the model.
487+
488+
### Loading a VLM
489+
490+
```tsx
491+
import { useLLM, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';
492+
493+
const llm = useLLM({ model: LFM2_VL_1_6B_QUANTIZED });
494+
```
495+
496+
The `capabilities` field is already set on the model constant. You can also construct the model object explicitly:
497+
498+
```tsx
499+
const llm = useLLM({
500+
model: {
501+
modelSource: '...',
502+
tokenizerSource: '...',
503+
tokenizerConfigSource: '...',
504+
capabilities: ['vision'],
505+
},
506+
});
507+
```
508+
509+
Passing `capabilities` unlocks the typed `media` argument on `sendMessage`.
510+
511+
### Sending a message with an image
512+
513+
```tsx
514+
const llm = useLLM({ model: LFM2_VL_1_6B_QUANTIZED });
515+
516+
const send = () => {
517+
llm.sendMessage('What is in this image?', {
518+
imagePath: '/path/to/image.jpg',
519+
});
520+
};
521+
522+
return (
523+
<View>
524+
<Button onPress={send} title="Send!" />
525+
<Text>{llm.response}</Text>
526+
</View>
527+
);
528+
```
529+
530+
The `imagePath` should be a local file path on the device.
531+
532+
### Functional generation with images
533+
534+
You can also use `generate` directly, passing `imagePaths` as the third argument:
535+
536+
```tsx
537+
const llm = useLLM({ model: LFM2_VL_1_6B_QUANTIZED });
538+
539+
const handleGenerate = async () => {
540+
const chat: Message[] = [
541+
{
542+
role: 'user',
543+
content: [
544+
{ type: 'image' },
545+
{ type: 'text', text: 'Describe this image.' },
546+
],
547+
},
548+
];
549+
550+
const response = await llm.generate(chat, undefined, ['/path/to/image.jpg']);
551+
console.log(response);
552+
};
553+
```
554+
484555
## Available models
485556

486-
| Model Family | Sizes | Quantized |
487-
| ------------------------------------------------------------------------------------------------------------ | :--------------: | :-------: |
488-
| [Hammer 2.1](https://huggingface.co/software-mansion/react-native-executorch-hammer-2.1) | 0.5B, 1.5B, 3B ||
489-
| [Qwen 2.5](https://huggingface.co/software-mansion/react-native-executorch-qwen-2.5) | 0.5B, 1.5B, 3B ||
490-
| [Qwen 3](https://huggingface.co/software-mansion/react-native-executorch-qwen-3) | 0.6B, 1.7B, 4B ||
491-
| [Phi 4 Mini](https://huggingface.co/software-mansion/react-native-executorch-phi-4-mini) | 4B ||
492-
| [SmolLM 2](https://huggingface.co/software-mansion/react-native-executorch-smolLm-2) | 135M, 360M, 1.7B ||
493-
| [LLaMA 3.2](https://huggingface.co/software-mansion/react-native-executorch-llama-3.2) | 1B, 3B ||
494-
| [LFM2.5-1.2B-Instruct](https://huggingface.co/software-mansion/react-native-executorch-lfm2.5-1.2B-instruct) | 1.2B ||
557+
| Model Family | Sizes | Quantized | Capabilities |
558+
| ------------------------------------------------------------------------------------------------------------ | :--------------: | :-------: | :----------: |
559+
| [Hammer 2.1](https://huggingface.co/software-mansion/react-native-executorch-hammer-2.1) | 0.5B, 1.5B, 3B || - |
560+
| [Qwen 2.5](https://huggingface.co/software-mansion/react-native-executorch-qwen-2.5) | 0.5B, 1.5B, 3B || - |
561+
| [Qwen 3](https://huggingface.co/software-mansion/react-native-executorch-qwen-3) | 0.6B, 1.7B, 4B || - |
562+
| [Phi 4 Mini](https://huggingface.co/software-mansion/react-native-executorch-phi-4-mini) | 4B || - |
563+
| [SmolLM 2](https://huggingface.co/software-mansion/react-native-executorch-smolLm-2) | 135M, 360M, 1.7B || - |
564+
| [LLaMA 3.2](https://huggingface.co/software-mansion/react-native-executorch-llama-3.2) | 1B, 3B || - |
565+
| [LFM2.5-1.2B-Instruct](https://huggingface.co/software-mansion/react-native-executorch-lfm2.5-1.2B-instruct) | 1.2B || - |
566+
| [LFM2.5-VL-1.6B](https://huggingface.co/nklockiewicz/lfm2-vl-et) | 1.6B || vision |

docs/docs/04-typescript-api/01-natural-language-processing/LLMModule.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,55 @@ To configure model (i.e. change system prompt, load initial conversation history
114114

115115
- [`topp`](../../06-api-reference/interfaces/GenerationConfig.md#topp) - Only samples from the smallest set of tokens whose cumulative probability exceeds topp.
116116

117+
## Vision-Language Models (VLM)
118+
119+
Some models support multimodal input — text and images together. To use them, pass `capabilities` in the model object when calling [`load`](../../06-api-reference/classes/LLMModule.md#load):
120+
121+
```typescript
122+
import { LLMModule, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';
123+
124+
const llm = new LLMModule({
125+
tokenCallback: (token) => console.log(token),
126+
});
127+
128+
await llm.load(LFM2_VL_1_6B_QUANTIZED);
129+
```
130+
131+
The `capabilities` field is already set on the model constant. You can also construct the model object explicitly:
132+
133+
```typescript
134+
await llm.load({
135+
modelSource: '...',
136+
tokenizerSource: '...',
137+
tokenizerConfigSource: '...',
138+
capabilities: ['vision'],
139+
});
140+
```
141+
142+
Once loaded, pass `imagePath` to [`sendMessage`](../../06-api-reference/classes/LLMModule.md#sendmessage):
143+
144+
```typescript
145+
const response = await llm.sendMessage('What is in this image?', {
146+
imagePath: '/path/to/image.jpg',
147+
});
148+
```
149+
150+
Or use [`generate`](../../06-api-reference/classes/LLMModule.md#generate) with `imagePaths` directly:
151+
152+
```typescript
153+
const chat: Message[] = [
154+
{
155+
role: 'user',
156+
content: [
157+
{ type: 'image' },
158+
{ type: 'text', text: 'Describe this image.' },
159+
],
160+
},
161+
];
162+
163+
const response = await llm.generate(chat, undefined, ['/path/to/image.jpg']);
164+
```
165+
117166
## Deleting the model from memory
118167

119168
To delete the model from memory, you can use the [`delete`](../../06-api-reference/classes/LLMModule.md#delete) method.

0 commit comments

Comments
 (0)