[TRTLLM-12339][feat] enable TRTLLM cross attention backend#15345
[TRTLLM-12339][feat] enable TRTLLM cross attention backend#15345cascade812 wants to merge 2 commits into
Conversation
Signed-off-by: Guiju Zhang <guijuz@nvidia.com>
7537f51 to
46ef1af
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #54082 [ run ] triggered by Bot. Commit: |
|
PR_Github #54082 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #54328 [ run ] triggered by Bot. Commit: |
|
PR_Github #54328 [ run ] completed with state
|
|
/bot run |
|
PR_Github #54380 [ run ] triggered by Bot. Commit: |
|
PR_Github #54380 [ run ] completed with state |
Description
Split out the attention operator and TRTLLM attention backend changes from #13919 to reduce frequent conflicts with main and make CI validation easier for this smaller, self-contained scope.
This PR intentionally keeps the change self-contained:
thop.attentionand its nanobind signature for cross-attention and relative-attention-bias inputsNo module, executor, model, or LLM API caller changes are included in this split.
Summary by CodeRabbit