Skip to content
Merged
Show file tree
Hide file tree
Changes from 57 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
d6849bb
feat: initial implementation of multimodal runner with lfm vlm
NorbertKlockiewicz Feb 19, 2026
d9c3ef8
feat: unified LLM runner for text-only and multimodal PTEs
NorbertKlockiewicz Mar 2, 2026
b67405a
feat: add conversational VLM demo with multimodal/text-only support a…
NorbertKlockiewicz Mar 2, 2026
00fdcb0
fix: default UnifiedRunner temperature to 0.8 and topp to 0.9
NorbertKlockiewicz Mar 2, 2026
1e1b87c
feat: add NativeMessage struct and JSI conversion for message history
NorbertKlockiewicz Mar 2, 2026
a35e442
feat: declare generateMultimodal on LLM and register JSI binding
NorbertKlockiewicz Mar 2, 2026
e65af36
fix: remove redundant unordered_map and vector includes from LLM.h
NorbertKlockiewicz Mar 2, 2026
67aa7e7
feat: implement generateMultimodal with per-turn chat template and im…
NorbertKlockiewicz Mar 2, 2026
7f38df7
feat: add mediaPath to Message, remove sendMessageWithImage from LLMType
NorbertKlockiewicz Mar 2, 2026
f030b4a
feat: replace sendMessageWithImage with sendMessage(msg, mediaPath?) …
NorbertKlockiewicz Mar 2, 2026
9a04cfd
fix: use updatedHistory for multimodal routing, remove redundant rese…
NorbertKlockiewicz Mar 2, 2026
cf907de
fix: skip system messages in generateMultimodal, clear imageUri after…
NorbertKlockiewicz Mar 2, 2026
40fc187
feat: show image thumbnail in user message bubble when mediaPath is set
NorbertKlockiewicz Mar 2, 2026
6c2bdbb
fix: use resizeMode contain so full image is always visible in messag…
NorbertKlockiewicz Mar 2, 2026
ac8b4dc
refactor: derive isMultimodal from load param, unify load branches, r…
NorbertKlockiewicz Mar 2, 2026
e348e0e
refactor: remove isMultimodal flag, inline generateMultimodal into se…
NorbertKlockiewicz Mar 2, 2026
bf70644
fix: make tokenizerConfigSource required throughout load pipeline
NorbertKlockiewicz Mar 2, 2026
afd0dde
fix: prepend system prompt to multimodal history before generateMulti…
NorbertKlockiewicz Mar 2, 2026
4382750
refactor: unify generate — Jinja renders prompt+<image> tokens in JS,…
NorbertKlockiewicz Mar 2, 2026
13ddbc3
fix: collect imagePaths from messageHistoryWithPrompt, not full history
NorbertKlockiewicz Mar 2, 2026
df2119c
fix: typing
NorbertKlockiewicz Mar 2, 2026
fceed10
feat: correctly calculate image tokens
NorbertKlockiewicz Mar 2, 2026
af84703
fix: add missing import
NorbertKlockiewicz Mar 2, 2026
46cc472
fix: fall back to max_seq_len when model doesn't export max_context_len
NorbertKlockiewicz Mar 2, 2026
466dc0d
fix: address code review — error on image/placeholder mismatch, remov…
NorbertKlockiewicz Mar 2, 2026
7547083
feat: dynamic sendMessage type based on flag
NorbertKlockiewicz Mar 2, 2026
d4e1106
fix: model stopping generation in the middle of its answer
NorbertKlockiewicz Mar 3, 2026
45a0c26
feat: add LLMCapability type and parameterize LLMTypeMultimodal
NorbertKlockiewicz Mar 3, 2026
caa6ea4
feat: update sendMessage to accept typed media object
NorbertKlockiewicz Mar 3, 2026
34dc5e4
feat: add LFM2_VL_1_6B and LFM2_VL_1_6B_QUANTIZED model constants
NorbertKlockiewicz Mar 3, 2026
9f43e12
feat: add IEncoder interface and VisionEncoder
NorbertKlockiewicz Mar 3, 2026
c0d32eb
fix: address vision_encoder quality review issues
NorbertKlockiewicz Mar 3, 2026
34c500e
feat: add BaseLLMRunner with shared state and load()
NorbertKlockiewicz Mar 3, 2026
1e92c0f
feat: add TextRunner
NorbertKlockiewicz Mar 3, 2026
88e8443
feat: add MultimodalRunner with plug-in encoder map
NorbertKlockiewicz Mar 3, 2026
bddff5a
feat: wire capabilities through LLM.cpp, delete UnifiedRunner
NorbertKlockiewicz Mar 3, 2026
6a86444
feat: forward capabilities from LLMController to native
NorbertKlockiewicz Mar 3, 2026
60dbd0f
feat: add logging, fix metadata application, fix module ownership and…
NorbertKlockiewicz Mar 5, 2026
f489d45
refactor: replace Image class with ImagePath + VisionEncoder embeddin…
NorbertKlockiewicz Mar 5, 2026
21f5f59
test: add TextRunnerTests and VLMTests suites, register in CMake and …
NorbertKlockiewicz Mar 5, 2026
0790ea9
refactor: unify multimodal/text paths in sendMessage, add getVisualTo…
NorbertKlockiewicz Mar 5, 2026
9cf417b
refactor: replace example namespace with rnexecutorch::llm::runner in…
NorbertKlockiewicz Mar 5, 2026
f6d369d
refactor: collapse BaseLLMRunner constructor, deduplicate eos_ids, re…
NorbertKlockiewicz Mar 5, 2026
84e0b65
refactor: comments etc.
NorbertKlockiewicz Mar 5, 2026
2517431
fix: cap VLM generation tokens, propagate encoder load errors, pass i…
NorbertKlockiewicz Mar 5, 2026
1acc7a0
revert: remove TextRunnerTests and VLMTests suites
NorbertKlockiewicz Mar 5, 2026
caaa456
refactor: unify namespaces
NorbertKlockiewicz Mar 6, 2026
7da0875
fix: address PR review comments for VLM support
NorbertKlockiewicz Mar 6, 2026
56778d7
fix: use & instead of *
NorbertKlockiewicz Mar 9, 2026
47bfeaf
fix: requested changes
NorbertKlockiewicz Mar 9, 2026
1bc22b0
docs: write an instruction for using llm with vision capabilities
NorbertKlockiewicz Mar 9, 2026
c1b785d
chore: point to swmansion org on huggingface
NorbertKlockiewicz Mar 9, 2026
ac1ba44
tests: add tests for new runner
NorbertKlockiewicz Mar 10, 2026
4a82046
feat: add missing changes to LLMModule
NorbertKlockiewicz Mar 10, 2026
20ce02a
feat: requested changes
NorbertKlockiewicz Mar 10, 2026
b0fe6d3
fix: remove audioPath left after rebase
NorbertKlockiewicz Mar 10, 2026
4c8b0ff
fix: comment, throw when no image tag
NorbertKlockiewicz Mar 10, 2026
6e68958
Update packages/react-native-executorch/src/constants/modelUrls.ts
NorbertKlockiewicz Mar 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .cspell-wordlist.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
multimodal
swmansion
executorch
execu
Expand Down
8 changes: 8 additions & 0 deletions apps/llm/app/_layout.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,14 @@ export default function _layout() {
headerTitleStyle: { color: ColorPalette.primary },
}}
/>
<Drawer.Screen
name="multimodal_llm/index"
options={{
drawerLabel: 'Multimodal LLM (VLM)',
title: 'Multimodal LLM',
headerTitleStyle: { color: ColorPalette.primary },
}}
/>
<Drawer.Screen
name="index"
options={{
Expand Down
6 changes: 6 additions & 0 deletions apps/llm/app/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ export default function Home() {
>
<Text style={styles.buttonText}>Voice Chat</Text>
</TouchableOpacity>
<TouchableOpacity
style={styles.button}
onPress={() => router.navigate('multimodal_llm/')}
>
<Text style={styles.buttonText}>Multimodal LLM (VLM)</Text>
</TouchableOpacity>
</View>
</View>
);
Expand Down
310 changes: 310 additions & 0 deletions apps/llm/app/multimodal_llm/index.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,310 @@
import { useContext, useEffect, useRef, useState } from 'react';
import {
Image,
Keyboard,
KeyboardAvoidingView,
Platform,
StyleSheet,
Text,
TextInput,
TouchableOpacity,
TouchableWithoutFeedback,
View,
} from 'react-native';
import { launchImageLibrary } from 'react-native-image-picker';
import { useIsFocused } from '@react-navigation/native';
import { useLLM, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';
import SendIcon from '../../assets/icons/send_icon.svg';
import PauseIcon from '../../assets/icons/pause_icon.svg';
import ColorPalette from '../../colors';
import Messages from '../../components/Messages';
import Spinner from '../../components/Spinner';
import { GeneratingContext } from '../../context';

export default function MultimodalLLMScreenWrapper() {
const isFocused = useIsFocused();
return isFocused ? <MultimodalLLMScreen /> : null;
}

function MultimodalLLMScreen() {
const [imageUri, setImageUri] = useState<string | null>(null);
const [userInput, setUserInput] = useState('');
const [isTextInputFocused, setIsTextInputFocused] = useState(false);
const textInputRef = useRef<TextInput>(null);
const { setGlobalGenerating } = useContext(GeneratingContext);

const vlm = useLLM({
model: LFM2_VL_1_6B_QUANTIZED,
});

useEffect(() => {
setGlobalGenerating(vlm.isGenerating);
}, [vlm.isGenerating, setGlobalGenerating]);

useEffect(() => {
if (vlm.error) console.error('MultimodalLLM error:', vlm.error);
}, [vlm.error]);

const pickImage = async () => {
const result = await launchImageLibrary({ mediaType: 'photo' });
if (result.assets && result.assets.length > 0) {
const uri = result.assets[0]?.uri;
if (uri) setImageUri(uri);
}
};

const sendMessage = async () => {
if (!userInput.trim() || vlm.isGenerating) return;
const text = userInput.trim();
setUserInput('');
textInputRef.current?.clear();
Keyboard.dismiss();
const currentImageUri = imageUri;
setImageUri(null);
try {
await vlm.sendMessage(
text,
currentImageUri ? { imagePath: currentImageUri } : undefined
);
} catch (e) {
console.error('Generation error:', e);
}
};

if (!vlm.isReady) {
return (
<Spinner
visible={!vlm.isReady}
textContent={
vlm.error
? `Error: ${vlm.error.message}`
: `Loading model ${(vlm.downloadProgress * 100).toFixed(0)}%`
}
/>
);
}

return (
<TouchableWithoutFeedback onPress={Keyboard.dismiss}>
<KeyboardAvoidingView
style={styles.container}
collapsable={false}
behavior={Platform.OS === 'ios' ? 'padding' : undefined}
keyboardVerticalOffset={Platform.OS === 'ios' ? 120 : 40}
>
<View style={styles.container}>
{vlm.messageHistory.length ? (
<View style={styles.chatContainer}>
<Messages
chatHistory={vlm.messageHistory}
llmResponse={vlm.response}
isGenerating={vlm.isGenerating}
deleteMessage={vlm.deleteMessage}
/>
</View>
) : (
<View style={styles.helloMessageContainer}>
<Text style={styles.helloText}>Hello! 👋</Text>
<Text style={styles.bottomHelloText}>
Pick an image and ask me anything about it.
</Text>
</View>
)}

{/* Image thumbnail strip */}
{imageUri && (
<TouchableOpacity
style={styles.imageThumbnailContainer}
onPress={pickImage}
>
<Image
source={{ uri: imageUri }}
style={styles.imageThumbnail}
resizeMode="cover"
/>
<Text style={styles.imageThumbnailHint}>Tap to change</Text>
</TouchableOpacity>
)}

<View style={styles.bottomContainer}>
{/* Image picker button */}
<TouchableOpacity
style={styles.imageButton}
onPress={pickImage}
disabled={vlm.isGenerating}
>
<Text style={styles.imageButtonText}>📷</Text>
</TouchableOpacity>

<TextInput
autoCorrect={false}
ref={textInputRef}
onFocus={() => setIsTextInputFocused(true)}
onBlur={() => setIsTextInputFocused(false)}
style={[
styles.textInput,
{
borderColor: isTextInputFocused
? ColorPalette.blueDark
: ColorPalette.blueLight,
},
]}
placeholder={imageUri ? 'Ask about the image…' : 'Your message'}
placeholderTextColor="#C1C6E5"
multiline
onChangeText={setUserInput}
/>

{userInput.trim() && !vlm.isGenerating && (
<TouchableOpacity
style={styles.sendChatTouchable}
onPress={sendMessage}
>
<SendIcon height={24} width={24} padding={4} margin={8} />
</TouchableOpacity>
)}
{vlm.isGenerating && (
<TouchableOpacity
style={styles.sendChatTouchable}
onPress={vlm.interrupt}
>
<PauseIcon height={24} width={24} padding={4} margin={8} />
</TouchableOpacity>
)}
</View>
</View>
</KeyboardAvoidingView>
</TouchableWithoutFeedback>
);
}

const styles = StyleSheet.create({
// Setup phase
setupContainer: {
flex: 1,
padding: 24,
backgroundColor: '#fff',
justifyContent: 'center',
},
setupTitle: {
fontSize: 20,
fontFamily: 'medium',
color: ColorPalette.primary,
marginBottom: 8,
},
setupHint: {
fontSize: 13,
fontFamily: 'regular',
color: ColorPalette.blueDark,
marginBottom: 32,
lineHeight: 18,
},
filePickerRow: {
flexDirection: 'row',
alignItems: 'center',
borderWidth: 1,
borderColor: ColorPalette.blueLight,
borderRadius: 10,
padding: 14,
marginBottom: 12,
backgroundColor: '#fafbff',
},
filePickerInfo: { flex: 1 },
filePickerLabel: {
fontSize: 12,
fontFamily: 'medium',
color: ColorPalette.blueDark,
marginBottom: 2,
},
filePickerValue: { fontSize: 14, fontFamily: 'regular' },
filePickerValueSet: { color: ColorPalette.primary },
filePickerValueEmpty: { color: ColorPalette.blueLight },
filePickerChevron: {
fontSize: 24,
color: ColorPalette.blueLight,
marginLeft: 8,
},
loadButton: {
marginTop: 16,
backgroundColor: ColorPalette.strongPrimary,
borderRadius: 10,
padding: 14,
alignItems: 'center',
},
loadButtonDisabled: { backgroundColor: ColorPalette.blueLight },
loadButtonText: { color: '#fff', fontFamily: 'medium', fontSize: 15 },

// Chat phase
container: { flex: 1 },
chatContainer: { flex: 10, width: '100%' },
helloMessageContainer: {
flex: 10,
width: '100%',
alignItems: 'center',
justifyContent: 'center',
},
helloText: {
fontFamily: 'medium',
fontSize: 30,
color: ColorPalette.primary,
},
bottomHelloText: {
fontFamily: 'regular',
fontSize: 20,
lineHeight: 28,
textAlign: 'center',
color: ColorPalette.primary,
paddingHorizontal: 24,
},
imageThumbnailContainer: {
flexDirection: 'row',
alignItems: 'center',
paddingHorizontal: 16,
paddingVertical: 6,
gap: 8,
},
imageThumbnail: {
width: 48,
height: 48,
borderRadius: 8,
borderWidth: 1,
borderColor: ColorPalette.blueLight,
},
imageThumbnailHint: {
fontSize: 12,
fontFamily: 'regular',
color: ColorPalette.blueDark,
},
bottomContainer: {
height: 100,
width: '100%',
flexDirection: 'row',
justifyContent: 'space-between',
alignItems: 'center',
paddingHorizontal: 16,
},
imageButton: {
width: 40,
height: 40,
justifyContent: 'center',
alignItems: 'center',
marginRight: 4,
},
imageButtonText: { fontSize: 22 },
textInput: {
flex: 1,
borderWidth: 1,
borderRadius: 8,
lineHeight: 19.6,
fontFamily: 'regular',
fontSize: 14,
color: ColorPalette.primary,
padding: 16,
},
sendChatTouchable: {
height: '100%',
width: 48,
justifyContent: 'center',
alignItems: 'flex-end',
},
});
Loading
Loading