Commit 85ab161
Integrate attention sink into ET LLM export and runner
Summary:
Add custom op support, export pipeline integration, and C++ runner fixes
for the attention sink ring buffer implementation.
- CustomKVCacheWithAttentionSink: custom op variant using update_cache_with_indices
for scatter-write performance. Replaces KVCacheWithAttentionSink during export.
- CustomRingKVCache replacement: handle RingKVCache -> CustomRingKVCache in the
replacement pass, and set SDPACustom.use_attention_mask=True for ring buffer models.
- Export transform ordering: replace SDPA before KV cache so that
_replace_kv_cache_with_custom_kv_cache can set use_attention_mask=True on the
already-existing SDPACustom (previously the ordering was reversed, causing the
mask flag to be overwritten by a new SDPACustom).
- C++ runner: add max_seq_len prefill check; make context length check conditional
for sliding window models (max_seq_len < max_context_len) since they handle
position wrapping internally via ring buffer.
Differential Revision: D1002166861 parent 56d6e4d commit 85ab161
4 files changed
Lines changed: 183 additions & 13 deletions
File tree
- examples/models/llama
- source_transformation
- extension/llm/runner
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1781 | 1781 | | |
1782 | 1782 | | |
1783 | 1783 | | |
1784 | | - | |
1785 | | - | |
1786 | | - | |
1787 | | - | |
| 1784 | + | |
| 1785 | + | |
| 1786 | + | |
| 1787 | + | |
1788 | 1788 | | |
1789 | 1789 | | |
1790 | 1790 | | |
1791 | 1791 | | |
1792 | 1792 | | |
1793 | 1793 | | |
| 1794 | + | |
1794 | 1795 | | |
1795 | 1796 | | |
1796 | 1797 | | |
| |||
Lines changed: 109 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
371 | 371 | | |
372 | 372 | | |
373 | 373 | | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
374 | 379 | | |
375 | | - | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
376 | 409 | | |
377 | 410 | | |
378 | 411 | | |
| |||
466 | 499 | | |
467 | 500 | | |
468 | 501 | | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
469 | 577 | | |
470 | 578 | | |
471 | 579 | | |
| |||
Lines changed: 45 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
397 | 397 | | |
398 | 398 | | |
399 | 399 | | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
| 113 | + | |
| 114 | + | |
113 | 115 | | |
114 | 116 | | |
115 | 117 | | |
| |||
138 | 140 | | |
139 | 141 | | |
140 | 142 | | |
141 | | - | |
| 143 | + | |
142 | 144 | | |
143 | | - | |
144 | | - | |
145 | | - | |
| 145 | + | |
| 146 | + | |
146 | 147 | | |
147 | | - | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
148 | 163 | | |
149 | 164 | | |
150 | 165 | | |
| |||
168 | 183 | | |
169 | 184 | | |
170 | 185 | | |
171 | | - | |
172 | | - | |
173 | | - | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
174 | 190 | | |
175 | 191 | | |
176 | 192 | | |
| |||
0 commit comments