Commit 44a91bf
authored
Arm backend: Enable and support KV cache on Llama (pytorch#20026)
- Run llama with use_kv_cache option
- Add LlamaPositionalAdapter to handle input_pos mismatch
- Extract USER_OUTPUT in arm test pipeline in order to avoid irrelevant
cache data being accidentally analysed against the ref model
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils
@Sebastian-Larsson @robell @rascani
Signed-off-by: Christoffer J.L <christoffer.johanssonlundqvist@arm.com>1 parent e0b6574 commit 44a91bf
2 files changed
Lines changed: 28 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
64 | 73 | | |
65 | 74 | | |
66 | 75 | | |
| |||
154 | 163 | | |
155 | 164 | | |
156 | 165 | | |
| 166 | + | |
157 | 167 | | |
158 | 168 | | |
159 | 169 | | |
| |||
162 | 172 | | |
163 | 173 | | |
164 | 174 | | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
165 | 180 | | |
166 | 181 | | |
167 | 182 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
641 | 641 | | |
642 | 642 | | |
643 | 643 | | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
644 | 656 | | |
645 | 657 | | |
646 | 658 | | |
| |||
0 commit comments