Skip to content

Commit 9f1c89f

Browse files
feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision (#1099)
## Description Adds `min_p` and `repetition_penalty` sampling parameters to `GenerationConfig`, plumbs them through the full stack (`Sampler` → `TextDecoderRunner` → `TextTokenGenerator` → `BaseLLMRunner` / `TextRunner` / `MultimodalRunner` → JSI bindings → `LLMController`), introduces a per-model default `generationConfig` that gets applied automatically on load (populated for Qwen3 and LFM2-VL from their upstream recommendations), and replaces the distorting `cv::resize` in `VisionEncoder` with the existing `resizePadded` helper so multimodal inputs keep their aspect ratio. Also fixes three silent pre-existing bugs surfaced along the way: an xorshift PRNG seeded with `0` that made sampling deterministic, a `Sampler::apply_min_p` renormalization gap, and inline `{}` no-op overrides in `MultimodalRunner` that would desync in future refactors. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [x] Bug fix (change which fixes an issue) - [x] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [x] iOS - [ ] Android ### Testing instructions **Sampling parameter plumbing** 1. Open `apps/llm`, load any supported model (e.g. `LFM2_VL_450M_QUANTIZED`). 2. Without any manual `configure()` call, send a prompt. The model card defaults are applied automatically — for LFM2-VL you should now see coherent, non-repetitive descriptions (previously the model often produced generic or looping replies at the library's default `temperature=0.8, topp=0.9`). 3. Optionally override via `useLLM(...)`'s `configure({ generationConfig: { temperature: 0.7, minP: 0.1, repetitionPenalty: 1.05 } })` and confirm the generation style changes. **Letterbox preprocessing** 1. With a multimodal model loaded in `apps/llm` → `multimodal_llm` screen, attach a photo with a non-square aspect ratio (e.g. 3000×2250 from your camera roll). 2. Ask the model to describe it. Before this PR the image was stretched into the PTE's square input shape — the model would sometimes misidentify subjects in wide/tall photos. After, the image is letterboxed so proportions are preserved. ### Screenshots <!-- none --> ### Related issues <!-- none --> ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes **Per-model recommended defaults** Model presets gain an optional `generationConfig` field; `LLMController.load` applies it before flipping `isReady`, so users see sensible sampling out of the box. User `configure()` calls still override per-field. Populated for: - **Qwen3** family (`temperature=0.6, topp=0.95`, from `generation_config.json`) - **LFM2-VL** family (`temperature=0.1, minP=0.15, repetitionPenalty=1.05`, from the LiquidAI model card) Other presets (Llama, SmolLM2, Hammer, Phi-4, Qwen2.5, LFM2 text) keep the library defaults — these model cards don't publish sampling recommendations, so adding arbitrary values would be guessing. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 04852be commit 9f1c89f

29 files changed

Lines changed: 514 additions & 178 deletions

File tree

apps/llm/app/multimodal_llm/index.tsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import {
1414
import { launchImageLibrary } from 'react-native-image-picker';
1515
import { useIsFocused } from '@react-navigation/native';
1616
import { useSafeAreaInsets } from 'react-native-safe-area-context';
17-
import { useLLM, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';
17+
import { useLLM, LFM2_5_VL_1_6B_QUANTIZED } from 'react-native-executorch';
1818
import SendIcon from '../../assets/icons/send_icon.svg';
1919
import PauseIcon from '../../assets/icons/pause_icon.svg';
2020
import ColorPalette from '../../colors';
@@ -50,7 +50,7 @@ function MultimodalLLMScreen() {
5050
const [error, setError] = useState<string | null>(null);
5151

5252
const vlm = useLLM({
53-
model: LFM2_VL_1_6B_QUANTIZED,
53+
model: LFM2_5_VL_1_6B_QUANTIZED,
5454
});
5555
const tokenCount = vlm.isReady ? vlm.getGeneratedTokenCount() : 0;
5656
const { stats, onMessageSend } = useLLMStats(

docs/docs/03-hooks/01-natural-language-processing/useLLM.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,15 @@ To configure model (i.e. change system prompt, load initial conversation history
211211

212212
- [`temperature`](../../06-api-reference/interfaces/GenerationConfig.md#temperature) - Scales output logits by the inverse of temperature. Controls the randomness / creativity of text generation.
213213

214-
- [`topp`](../../06-api-reference/interfaces/GenerationConfig.md#topp) - Only samples from the smallest set of tokens whose cumulative probability exceeds topp.
214+
- [`topP`](../../06-api-reference/interfaces/GenerationConfig.md#topp) - Only samples from the smallest set of tokens whose cumulative probability exceeds topP. Range `[0, 1]`. Values of `0` or `1` disable top-p filtering.
215+
216+
- [`minP`](../../06-api-reference/interfaces/GenerationConfig.md#minp) - Minimum-probability threshold applied after softmax: tokens whose probability is below `minP * max_prob` are excluded from sampling. Range `[0, 1]`. Default `0` disables the filter. Stacks with `topP` when both are set.
217+
218+
- [`repetitionPenalty`](../../06-api-reference/interfaces/GenerationConfig.md#repetitionpenalty) - Multiplicative penalty applied to logits of tokens that already appeared in the prompt or the generated text. Values greater than `1` discourage repetition; default `1` disables the penalty.
219+
220+
:::info[Built-in models ship with sampling defaults]
221+
Model presets expose an optional [`generationConfig`](../../06-api-reference/interfaces/LLMProps.md) on the `model` prop. Whenever the upstream model card publishes recommended values (currently Qwen3 and LFM2-VL) the preset carries them and `useLLM` applies them automatically before `isReady` flips — you don't need to call `configure` just to get sensible defaults. Any fields you then pass to `configure` still override on a per-field basis.
222+
:::
215223

216224
### Model configuration example
217225

@@ -282,7 +290,9 @@ useEffect(() => {
282290
outputTokenBatchSize: 15,
283291
batchTimeInterval: 100,
284292
temperature: 0.7,
285-
topp: 0.9,
293+
topP: 0.9,
294+
minP: 0.05,
295+
repetitionPenalty: 1.05,
286296
},
287297
});
288298
}, [configure]);
@@ -491,9 +501,9 @@ Some models support multimodal input — text and images together. To use them,
491501
### Loading a VLM
492502

493503
```tsx
494-
import { useLLM, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';
504+
import { useLLM, LFM2_5_VL_1_6B_QUANTIZED } from 'react-native-executorch';
495505

496-
const llm = useLLM({ model: LFM2_VL_1_6B_QUANTIZED });
506+
const llm = useLLM({ model: LFM2_5_VL_1_6B_QUANTIZED });
497507
```
498508

499509
The `capabilities` field is already set on the model constant. You can also construct the model object explicitly:
@@ -514,7 +524,7 @@ Passing `capabilities` unlocks the typed `media` argument on `sendMessage`.
514524
### Sending a message with an image
515525

516526
```tsx
517-
const llm = useLLM({ model: LFM2_VL_1_6B_QUANTIZED });
527+
const llm = useLLM({ model: LFM2_5_VL_1_6B_QUANTIZED });
518528

519529
const send = () => {
520530
llm.sendMessage('What is in this image?', {
@@ -537,7 +547,7 @@ The `imagePath` should be a local file path on the device.
537547
You can also use `generate` directly by setting `mediaPath` on user messages:
538548

539549
```tsx
540-
const llm = useLLM({ model: LFM2_VL_1_6B_QUANTIZED });
550+
const llm = useLLM({ model: LFM2_5_VL_1_6B_QUANTIZED });
541551

542552
const handleGenerate = async () => {
543553
const chat: Message[] = [

docs/docs/04-typescript-api/01-natural-language-processing/LLMModule.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -107,17 +107,25 @@ To configure model (i.e. change system prompt, load initial conversation history
107107

108108
- [`temperature`](../../06-api-reference/interfaces/GenerationConfig.md#temperature) - Scales output logits by the inverse of temperature. Controls the randomness / creativity of text generation.
109109

110-
- [`topp`](../../06-api-reference/interfaces/GenerationConfig.md#topp) - Only samples from the smallest set of tokens whose cumulative probability exceeds topp.
110+
- [`topP`](../../06-api-reference/interfaces/GenerationConfig.md#topp) - Only samples from the smallest set of tokens whose cumulative probability exceeds topP. Range `[0, 1]`. Values of `0` or `1` disable top-p filtering.
111+
112+
- [`minP`](../../06-api-reference/interfaces/GenerationConfig.md#minp) - Minimum-probability threshold applied after softmax: tokens whose probability is below `minP * max_prob` are excluded from sampling. Range `[0, 1]`. Default `0` disables the filter. Stacks with `topP` when both are set.
113+
114+
- [`repetitionPenalty`](../../06-api-reference/interfaces/GenerationConfig.md#repetitionpenalty) - Multiplicative penalty applied to logits of tokens that already appeared in the prompt or the generated text. Values greater than `1` discourage repetition; default `1` disables the penalty.
115+
116+
:::info[Built-in models ship with sampling defaults]
117+
Model presets expose an optional `generationConfig` that `LLMModule.fromModelName` applies automatically when available — for Qwen3 and LFM2-VL this means the model-card recommended sampling settings are in effect without any explicit `configure` call. Any fields you pass to `configure` still override on a per-field basis.
118+
:::
111119

112120
## Vision-Language Models (VLM)
113121

114122
Some models support multimodal input — text and images together. To use them, pass `capabilities` in the model object when calling [`fromModelName`](../../06-api-reference/classes/LLMModule.md#frommodelname):
115123

116124
```typescript
117-
import { LLMModule, LFM2_VL_1_6B_QUANTIZED } from 'react-native-executorch';
125+
import { LLMModule, LFM2_5_VL_1_6B_QUANTIZED } from 'react-native-executorch';
118126

119127
const llm = await LLMModule.fromModelName(
120-
LFM2_VL_1_6B_QUANTIZED,
128+
LFM2_5_VL_1_6B_QUANTIZED,
121129
undefined,
122130
(token) => console.log(token)
123131
);

packages/react-native-executorch/common/rnexecutorch/host_objects/ModelHostObject.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,15 @@ template <typename Model> class ModelHostObject : public JsiHostObject {
140140
synchronousHostFunction<&Model::setTopp>,
141141
"setTopp"));
142142

143+
addFunctions(JSI_EXPORT_FUNCTION(ModelHostObject<Model>,
144+
synchronousHostFunction<&Model::setMinP>,
145+
"setMinP"));
146+
147+
addFunctions(JSI_EXPORT_FUNCTION(
148+
ModelHostObject<Model>,
149+
synchronousHostFunction<&Model::setRepetitionPenalty>,
150+
"setRepetitionPenalty"));
151+
143152
addFunctions(JSI_EXPORT_FUNCTION(
144153
ModelHostObject<Model>,
145154
synchronousHostFunction<&Model::getMaxContextLength>,

packages/react-native-executorch/common/rnexecutorch/models/llm/LLM.cpp

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,30 @@ void LLM::setTopp(float topp) {
250250
runner_->set_topp(topp);
251251
}
252252

253+
void LLM::setMinP(float minP) {
254+
if (!runner_ || !runner_->is_loaded()) {
255+
throw RnExecutorchError(RnExecutorchErrorCode::ModuleNotLoaded,
256+
"Can't configure a model that's not loaded");
257+
}
258+
if (minP < 0.0f || minP > 1.0f) {
259+
throw RnExecutorchError(RnExecutorchErrorCode::InvalidConfig,
260+
"Min-p must be between 0.0 and 1.0");
261+
}
262+
runner_->set_min_p(minP);
263+
}
264+
265+
void LLM::setRepetitionPenalty(float repetitionPenalty) {
266+
if (!runner_ || !runner_->is_loaded()) {
267+
throw RnExecutorchError(RnExecutorchErrorCode::ModuleNotLoaded,
268+
"Can't configure a model that's not loaded");
269+
}
270+
if (repetitionPenalty < 0.0f) {
271+
throw RnExecutorchError(RnExecutorchErrorCode::InvalidConfig,
272+
"Repetition penalty must be non-negative");
273+
}
274+
runner_->set_repetition_penalty(repetitionPenalty);
275+
}
276+
253277
int32_t LLM::getMaxContextLength() const {
254278
if (!runner_ || !runner_->is_loaded()) {
255279
throw RnExecutorchError(

packages/react-native-executorch/common/rnexecutorch/models/llm/LLM.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ class LLM : public BaseModel {
3838
void setCountInterval(size_t countInterval);
3939
void setTemperature(float temperature);
4040
void setTopp(float topp);
41+
void setMinP(float minP);
42+
void setRepetitionPenalty(float repetitionPenalty);
4143
void setTimeInterval(size_t timeInterval);
4244
int32_t getMaxContextLength() const;
4345

packages/react-native-executorch/common/rnexecutorch/tests/CMakeLists.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,12 @@ add_rn_test(RunnerTests unit/RunnerTest.cpp
151151
integration/stubs/jsi_stubs.cpp
152152
LIBS tokenizers_deps
153153
)
154+
add_rn_test(SamplerTests unit/SamplerTest.cpp
155+
SOURCES
156+
${COMMON_DIR}/runner/sampler.cpp
157+
${COMMON_DIR}/runner/arange_util.cpp
158+
LIBS
159+
)
154160
add_rn_test(LogTests unit/LogTest.cpp)
155161
add_rn_test(FileUtilsTest unit/FileUtilsTest.cpp)
156162
add_rn_test(ImageProcessingTest unit/ImageProcessingTest.cpp

packages/react-native-executorch/common/rnexecutorch/tests/integration/LLMTest.cpp

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,31 @@ TEST_F(LLMTest, SetToppInvalidThrows) {
110110
EXPECT_THROW(model.setTopp(1.1f), RnExecutorchError);
111111
}
112112

113+
TEST_F(LLMTest, SetMinP) {
114+
LLM model(kValidModelPath, kValidTokenizerPath, {}, mockInvoker_);
115+
EXPECT_NO_THROW(model.setMinP(0.0f));
116+
EXPECT_NO_THROW(model.setMinP(0.15f));
117+
EXPECT_NO_THROW(model.setMinP(1.0f));
118+
}
119+
120+
TEST_F(LLMTest, SetMinPInvalidThrows) {
121+
LLM model(kValidModelPath, kValidTokenizerPath, {}, mockInvoker_);
122+
EXPECT_THROW(model.setMinP(-0.1f), RnExecutorchError);
123+
EXPECT_THROW(model.setMinP(1.1f), RnExecutorchError);
124+
}
125+
126+
TEST_F(LLMTest, SetRepetitionPenalty) {
127+
LLM model(kValidModelPath, kValidTokenizerPath, {}, mockInvoker_);
128+
EXPECT_NO_THROW(model.setRepetitionPenalty(1.0f));
129+
EXPECT_NO_THROW(model.setRepetitionPenalty(1.05f));
130+
EXPECT_NO_THROW(model.setRepetitionPenalty(2.0f));
131+
}
132+
133+
TEST_F(LLMTest, SetRepetitionPenaltyInvalidThrows) {
134+
LLM model(kValidModelPath, kValidTokenizerPath, {}, mockInvoker_);
135+
EXPECT_THROW(model.setRepetitionPenalty(-0.1f), RnExecutorchError);
136+
}
137+
113138
TEST_F(LLMTest, SetCountInterval) {
114139
LLM model(kValidModelPath, kValidTokenizerPath, {}, mockInvoker_);
115140
EXPECT_NO_THROW(model.setCountInterval(5));

packages/react-native-executorch/common/rnexecutorch/tests/integration/stubs/StubRunner.h

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,11 @@ class StubRunner : public ::executorch::extension::llm::BaseLLMRunner {
1818
return ::executorch::runtime::Error::Ok;
1919
}
2020
void stop_impl() override {}
21-
void set_temperature_impl(float t) override { last_temp_ = t; }
22-
void set_topp_impl(float) override {}
23-
void set_count_interval_impl(size_t) override {}
24-
void set_time_interval_impl(size_t) override {}
2521

2622
int32_t resolve_max(int32_t prompt, int32_t seq_len, int32_t ctx_len,
2723
int32_t max_new = -1) const {
2824
return resolve_max_new_tokens(prompt, seq_len, ctx_len, max_new);
2925
}
3026

3127
bool loaded_ = false;
32-
float last_temp_ = -1.f;
3328
};

packages/react-native-executorch/common/rnexecutorch/tests/unit/RunnerTest.cpp

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,11 +62,10 @@ TEST(MultimodalInputTest, EmptyStringIsStillText) {
6262
// BaseLLMRunner via StubRunner
6363
// ============================================================================
6464

65-
TEST(BaseLLMRunnerTest, SetTemperatureUpdatesConfigAndCallsImpl) {
65+
TEST(BaseLLMRunnerTest, SetTemperatureUpdatesConfig) {
6666
StubRunner runner(nullptr, "dummy");
6767
runner.set_temperature(0.42f);
6868
EXPECT_FLOAT_EQ(runner.config_.temperature, 0.42f);
69-
EXPECT_FLOAT_EQ(runner.last_temp_, 0.42f);
7069
}
7170

7271
TEST(BaseLLMRunnerTest, SetToppUpdatesConfig) {
@@ -89,3 +88,15 @@ TEST(BaseLLMRunnerTest, GenerateEmptyStringReturnsError) {
8988
auto err = runner.generate("", {}, {}, {});
9089
EXPECT_NE(err, ::executorch::runtime::Error::Ok);
9190
}
91+
92+
TEST(BaseLLMRunnerTest, SetMinPUpdatesConfig) {
93+
StubRunner runner(nullptr, "dummy");
94+
runner.set_min_p(0.15f);
95+
EXPECT_FLOAT_EQ(runner.config_.min_p, 0.15f);
96+
}
97+
98+
TEST(BaseLLMRunnerTest, SetRepetitionPenaltyUpdatesConfig) {
99+
StubRunner runner(nullptr, "dummy");
100+
runner.set_repetition_penalty(1.05f);
101+
EXPECT_FLOAT_EQ(runner.config_.repetition_penalty, 1.05f);
102+
}

0 commit comments

Comments
 (0)