Commit 6b9eebb
Abhinay Kukkadapu
Fix QNN runner KV cache bitwidth detection in Android JNI
Summary:
The QNN runner in the Android JNI layer was hardcoded to use
Runner<uint16_t>, but models can be exported with either 8-bit or
16-bit KV caches. This mismatch caused the KV cache data to be
misinterpreted, resulting in gibberish output in the Android demo app
while the same model worked correctly via the CLI runner.
This change mirrors the dynamic KV bitwidth detection already present
in qnn_llama_runner.cpp by querying the model's get_kv_io_bit_width
method and instantiating the correct Runner<uint8_t> or
Runner<uint16_t> accordingly. Also passes temperature_ to the Runner
constructor which was previously omitted.
Fixes #18571
Closes #17622
Test Plan:
- Built Android AAR with QNN support (SDK 2.37) — jni_layer_llama.cpp
compiles cleanly with both Runner<uint8_t> and Runner<uint16_t>
template instantiations
- Unit tests pass (gradlew testDebugUnitTest)1 parent e0e10cc commit 6b9eebb
1 file changed
Lines changed: 35 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
213 | 241 | | |
214 | 242 | | |
215 | 243 | | |
| |||
0 commit comments