[TRTLLM-12339][feat] enable TRTLLM cross attention backend by cascade812 · Pull Request #15345 · NVIDIA/TensorRT-LLM

cascade812 · 2026-06-14T04:02:10Z

Description

Split out the attention operator and TRTLLM attention backend changes from #13919 to reduce frequent conflicts with main and make CI validation easier for this smaller, self-contained scope.

This PR intentionally keeps the change self-contained:

wires thop.attention and its nanobind signature for cross-attention and relative-attention-bias inputs
enables the TRTLLM backend path for cross-attention metadata, including Q padding, cross K/V forwarding, and beam-width handling
makes trtllm-gen decline cross-attention so cross requests use the THOP path
adds only the small backend forward-args fields required by the TRTLLM backend

No module, executor, model, or LLM API caller changes are included in this split.

Summary by CodeRabbit

New Features
- Added cross-attention support with optional cross-key-value tensor inputs.
- Added optional relative attention bias with configurable maximum distance parameter.

Signed-off-by: Guiju Zhang <guijuz@nvidia.com>

cascade812 · 2026-06-14T04:04:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-14T04:11:03Z

PR_Github #54082 [ run ] triggered by Bot. Commit: 46ef1af Link to invocation

tensorrt-cicd · 2026-06-14T08:39:55Z

PR_Github #54082 [ run ] completed with state SUCCESS. Commit: 46ef1af
/LLM/main/L0_MergeRequest_PR pipeline #43166 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

cascade812 · 2026-06-15T16:32:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-15T16:39:01Z

PR_Github #54328 [ run ] triggered by Bot. Commit: 3dfaeac Link to invocation

tensorrt-cicd · 2026-06-15T23:46:03Z

PR_Github #54328 [ run ] completed with state FAILURE. Commit: 3dfaeac
/LLM/main/L0_MergeRequest_PR pipeline #43399 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

cascade812 · 2026-06-15T23:53:32Z

/bot run

tensorrt-cicd · 2026-06-15T23:59:10Z

PR_Github #54380 [ run ] triggered by Bot. Commit: 3dfaeac Link to invocation

tensorrt-cicd · 2026-06-16T01:14:31Z

PR_Github #54380 [ run ] completed with state SUCCESS. Commit: 3dfaeac
/LLM/main/L0_MergeRequest_PR pipeline #43450 completed with status: 'SUCCESS'

CI Report

Link to invocation

brb-nv

LGTM.

github-actions Bot assigned cascade812 Jun 14, 2026

[TRTLLM-12339][feat] enable TRTLLM cross attention backend

46ef1af

Signed-off-by: Guiju Zhang <guijuz@nvidia.com>

cascade812 force-pushed the codex/split-attention-op-trtllm branch from 7537f51 to 46ef1af Compare June 14, 2026 04:03

Merge branch 'main' into codex/split-attention-op-trtllm

3dfaeac

cascade812 marked this pull request as ready for review June 16, 2026 01:15

cascade812 requested a review from a team as a code owner June 16, 2026 01:15

cascade812 requested a review from QiJune June 16, 2026 01:15

brb-nv approved these changes Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-12339][feat] enable TRTLLM cross attention backend#15345

[TRTLLM-12339][feat] enable TRTLLM cross attention backend#15345
cascade812 wants to merge 2 commits into
NVIDIA:mainfrom
cascade812:codex/split-attention-op-trtllm

cascade812 commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

cascade812 commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

cascade812 commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

cascade812 commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

brb-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cascade812 commented Jun 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

cascade812 commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

cascade812 commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

cascade812 commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cascade812 commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading