Commit 4a8b7af
authored
[None][feat] Side-stream for MM encoder (#14322)
* Why?
Multimodal context requests currently run their encoder only after they
are scheduled. That potentially keeps the next request's image encoding
on the critical path even when the executor already has independent GPU
work from the current iteration to overlap it with.
* What?
Add an opt-in cross-iteration prefetch path gated by
`TLLM_MM_SIDE_STREAM_MAX_AHEAD`. The executor picks pending multimodal
context requests that are not in flight, moves their inputs to CUDA and
runs the encoder on an auxiliary stream.
This leverages the recently added `MultimodalEncoderMixin`.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>1 parent c25fa74 commit 4a8b7af
9 files changed
Lines changed: 639 additions & 32 deletions
File tree
- tensorrt_llm
- _torch
- models
- pyexecutor
- inputs
- tests/unittest/_torch/multimodal
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
612 | 612 | | |
613 | 613 | | |
614 | 614 | | |
615 | | - | |
| 615 | + | |
616 | 616 | | |
617 | 617 | | |
618 | 618 | | |
| |||
705 | 705 | | |
706 | 706 | | |
707 | 707 | | |
708 | | - | |
709 | 708 | | |
710 | 709 | | |
711 | 710 | | |
| |||
Lines changed: 280 additions & 28 deletions
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
219 | 219 | | |
220 | 220 | | |
221 | 221 | | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
222 | 229 | | |
223 | 230 | | |
224 | 231 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
686 | 686 | | |
687 | 687 | | |
688 | 688 | | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
689 | 696 | | |
690 | 697 | | |
691 | 698 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3037 | 3037 | | |
3038 | 3038 | | |
3039 | 3039 | | |
| 3040 | + | |
| 3041 | + | |
| 3042 | + | |
| 3043 | + | |
| 3044 | + | |
| 3045 | + | |
| 3046 | + | |
| 3047 | + | |
| 3048 | + | |
| 3049 | + | |
| 3050 | + | |
3040 | 3051 | | |
3041 | 3052 | | |
3042 | 3053 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
| |||
3025 | 3027 | | |
3026 | 3028 | | |
3027 | 3029 | | |
| 3030 | + | |
| 3031 | + | |
3028 | 3032 | | |
3029 | 3033 | | |
3030 | 3034 | | |
| |||
3150 | 3154 | | |
3151 | 3155 | | |
3152 | 3156 | | |
3153 | | - | |
| 3157 | + | |
3154 | 3158 | | |
3155 | 3159 | | |
3156 | 3160 | | |
| |||
3441 | 3445 | | |
3442 | 3446 | | |
3443 | 3447 | | |
| 3448 | + | |
| 3449 | + | |
3444 | 3450 | | |
3445 | 3451 | | |
3446 | 3452 | | |
| |||
4729 | 4735 | | |
4730 | 4736 | | |
4731 | 4737 | | |
| 4738 | + | |
| 4739 | + | |
| 4740 | + | |
| 4741 | + | |
| 4742 | + | |
| 4743 | + | |
| 4744 | + | |
| 4745 | + | |
| 4746 | + | |
| 4747 | + | |
| 4748 | + | |
| 4749 | + | |
| 4750 | + | |
| 4751 | + | |
| 4752 | + | |
| 4753 | + | |
| 4754 | + | |
| 4755 | + | |
| 4756 | + | |
| 4757 | + | |
| 4758 | + | |
| 4759 | + | |
| 4760 | + | |
| 4761 | + | |
| 4762 | + | |
| 4763 | + | |
| 4764 | + | |
| 4765 | + | |
| 4766 | + | |
| 4767 | + | |
| 4768 | + | |
| 4769 | + | |
| 4770 | + | |
| 4771 | + | |
| 4772 | + | |
| 4773 | + | |
| 4774 | + | |
| 4775 | + | |
| 4776 | + | |
| 4777 | + | |
| 4778 | + | |
| 4779 | + | |
| 4780 | + | |
| 4781 | + | |
| 4782 | + | |
| 4783 | + | |
| 4784 | + | |
| 4785 | + | |
| 4786 | + | |
| 4787 | + | |
| 4788 | + | |
| 4789 | + | |
| 4790 | + | |
| 4791 | + | |
| 4792 | + | |
4732 | 4793 | | |
4733 | 4794 | | |
4734 | 4795 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
503 | 503 | | |
504 | 504 | | |
505 | 505 | | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
506 | 513 | | |
507 | 514 | | |
508 | 515 | | |
| |||
0 commit comments