Commit 4f8bdd0
authored
fix/feat!: LLMs context management (software-mansion#819)
## Description
This PR fixes few bugs related to the LLMs, caused by mixing two
approaches - functional (as we pass whole messages history each time)
and stateful (as we keep `pos_` in the runner, representing at which
position the KV cache is), which resulted in 3 bugs:
- broken KV cache for reasoning models - in the runner, we counted
tokens generated for the reasoning and included these in KV cache (`pos_
+= num_generated_tokens`), but in next turns, `jinja template` removed
these reasoning tokens from the messages history - as a result, KV-cache
was incoherent
- duplicated tokens in KV cache - we were passing whole messages history
to the runner (functional approach), but we were also appending all
tokens (prompt and generated) to the KV cache (which position is
represented by `pos_`) - as a result tokens were "duplicated" in the KV
cache and we were running out of available tokens very fast (exceeding
`context_window_length`)
- stateful TS functional API - even though our `generate()` method is
called functional, it kept internal state in the runner (e. g. `pos_`)
These bugs were fixed by resetting the runner before each generation,
which makes it truly functional - old messages are prefilled and the KV
cache can be still used during generation phase.
Additionally, this PR adds `ContextStrategy` to `ChatConfig` interface,
so now it's possible to define (or use one of already implemented)
strategy for managing context (e. g. naive, message count based, sliding
window) - it gives us more flexibility and user can decide what's best
for their use case. From now on, `SlidingWindowContextStrategy` is also
configured as the default one.
### Introduces a breaking change?
- [x] Yes
- [ ] No
These changes will not break anything until max number of messages is
not modified (I removed `contextWindowLength` from `ChatConfig` and
replaced it with `contextStrategy`)
### Type of change
- [x] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)
### Tested on
- [x] iOS
- [x] Android
### Testing instructions
Run example llm app, open executorch logs (`adb logcat | grep -i
"executorch"` for example) and see if numbers of tokens are properly
aligned and if `pos_` is correct.
To test different context management strategies, change
`contextStrategy` in llm app and modify model configuration.
### Screenshots
<!-- Add screenshots here, if applicable -->
### Related issues
software-mansion#776
### Checklist
- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have updated the documentation accordingly
- [x] My changes generate no new warnings
### Additional notes
Position in KV cache, number of prompt tokens and number of generated
tokens for both non-reasoning and reasoning models BEFORE changes.
LLAMA 3.2 1B SPINQUANT (without reasoning)
| pos_ | Prompt tokens | Generated tokens |
|------------------|---------------|------------------|
| 0 | 335 | 269 |
| 604=269+335 | 872 | 372 |
| 1848=604+872+372 | 1513 | CRASH |
QWEN 3.0 0.6B QUANTIZED (with reasoning)
| pos_ | Prompt tokens | Generated tokens |
|------------------|---------------|------------------|
| 0 | 309 | 457 |
| 766=309+457 | 617 (<766!) | 192 |
| 1575=766+617+192 | 925 (<1575!) | CRASH |1 parent 31eca42 commit 4f8bdd0
271 files changed
Lines changed: 1474 additions & 915 deletions
File tree
- docs
- docs
- 03-hooks/01-natural-language-processing
- 04-typescript-api/01-natural-language-processing
- 06-api-reference
- classes
- enumerations
- functions
- interfaces
- react-native-executorch/namespaces/ResourceFetcherUtils/functions
- type-aliases
- variables
- packages/react-native-executorch
- common
- rnexecutorch
- host_objects
- models/llm
- runner
- src
- constants
- controllers
- types
- utils/llms/context_strategy
- skills/react-native-executorch/references
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
195 | | - | |
| 195 | + | |
196 | 196 | | |
197 | 197 | | |
198 | 198 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
| 99 | + | |
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| |||
Lines changed: 7 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
116 | | - | |
| 116 | + | |
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | | - | |
| 150 | + | |
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | | - | |
| 88 | + | |
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| |||
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
117 | | - | |
| 117 | + | |
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
151 | | - | |
| 151 | + | |
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
| |||
Lines changed: 7 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
116 | | - | |
| 116 | + | |
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | | - | |
| 150 | + | |
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
| |||
0 commit comments