Commit df2a1b4
[Bugfix] Fix double-counted max_q_seqlen in decode delta kv_seqlens
create_model_inputs_delta / create_model_inputs_delta_valid_only build
kv_seqlens as [seq.num_all_ids + max_q_seqlen]. num_all_ids can be one decode
step stale here -- EngineLoop prefetches the next inputs before
_finish_forward_output() advances the sequence -- so the +max_q_seqlen recovers
this forward's kv length. But the reductions then added max_q_seqlen a SECOND
time and used batch_size = len(self.running_seqs), which counts
scheduler-dropped invalid seqs:
sum_kv_seqlen = sum(kv_seqlens) + batch_size * max_q_seqlen
max_kv_seqlen = max(kv_seqlens) + max_q_seqlen
so max_kv_seqlen / sum_kv_seqlen were over-counted (max by max_q_seqlen,
scaling with spec/MTP num_decode_tokens), over-allocating the attention grid +
kv-cache resources.
Reduce over kv_seqlens directly; the +max_q_seqlen is already applied once in
the comprehension.
Fixes #4024
Co-authored-by: Claude <noreply@anthropic.com>1 parent 18600ad commit df2a1b4
2 files changed
Lines changed: 35 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
603 | 603 | | |
604 | 604 | | |
605 | 605 | | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
606 | 612 | | |
607 | | - | |
608 | | - | |
| 613 | + | |
| 614 | + | |
609 | 615 | | |
610 | 616 | | |
611 | 617 | | |
| |||
650 | 656 | | |
651 | 657 | | |
652 | 658 | | |
| 659 | + | |
| 660 | + | |
653 | 661 | | |
654 | 662 | | |
655 | 663 | | |
656 | 664 | | |
657 | 665 | | |
658 | | - | |
659 | | - | |
| 666 | + | |
| 667 | + | |
660 | 668 | | |
661 | 669 | | |
662 | 670 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
333 | 333 | | |
334 | 334 | | |
335 | 335 | | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
0 commit comments