[TRTLLM-12982][chore] relocate `torch_multi_arange` by ixlmar · Pull Request #15416 · NVIDIA/TensorRT-LLM

ixlmar · 2026-06-16T12:34:49Z

Description

Follow-up on #14693 (comment).

Commit 800c7ee is from #15413, which is to be merged before this PR.

Test Coverage

Covered by existing tests

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Improvements
- Encoder CUDA graphs now properly detect multi-item scoring scenarios and fall back to eager execution when necessary.
Refactoring
- Optimized attention metadata to accept multi-item configuration during preparation phase instead of forward pass.
- Reorganized utility functions for improved code maintainability.
Chores
- Updated test infrastructure and file organization.

ixlmar · 2026-06-16T12:41:18Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-16T12:46:50Z

PR_Github #54587 [ run ] triggered by Bot. Commit: faad7dc Link to invocation

brb-nv

LGTM.

coderabbitai · 2026-06-16T18:10:07Z

📝 Walkthrough

Walkthrough

torch_multi_arange (with _AcceptSyncCompute/ACCEPT_SYNC_COMPUTE) is moved from sampling_utils.py to utils.py and all import sites are updated. Separately, multi_item_part_lens is removed from AttentionForwardArgs and from all attention forward signatures, and instead passed as a keyword argument to AttentionMetadata.prepare(), with FlashInfer caching the computed FlashInferMultiItemParams in _multi_item_params for use during plan().

Changes

torch_multi_arange relocation and multi_item_part_lens prepare() refactor

Layer / File(s)	Summary
`torch_multi_arange` relocated to `utils.py` `tensorrt_llm/_torch/utils.py`, `tensorrt_llm/_torch/pyexecutor/sampling_utils.py`, `tensorrt_llm/_torch/pyexecutor/sampler.py`, `tests/unittest/_torch/test_torch_multi_arange.py`, `tests/integration/test_lists/test-db/l0_a10.yml`, `.pre-commit-config.yaml`, `legacy-files.txt`, `pyproject.toml`, `ruff-legacy.toml`	`_AcceptSyncCompute`, `ACCEPT_SYNC_COMPUTE`, and `torch_multi_arange` are added to `utils.py` and deleted from `sampling_utils.py`; `sampler.py` import is redirected; test, test-list, and lint config entries are updated to the new path.
`AttentionMetadata.prepare()` and `AttentionForwardArgs` contract `tensorrt_llm/_torch/attention_backend/interface.py`	`AttentionMetadata.prepare()` gains a keyword-only `multi_item_part_lens` parameter; `AttentionForwardArgs` drops its `multi_item_part_lens` field, removing multi-item layout from per-forward args.
Backend `prepare()` enforce/reject multi_item_part_lens `tensorrt_llm/_torch/attention_backend/vanilla.py`, `tensorrt_llm/_torch/attention_backend/star_flashinfer.py`, `tensorrt_llm/_torch/attention_backend/trtllm.py`	`VanillaAttentionMetadata`, `StarAttentionMetadata`, `TrtllmAttentionMetadata`, and `prepare_encoder_only` each add the keyword-only `multi_item_part_lens` parameter and raise `ValueError` when non-`None`; per-forward `ValueError` checks are removed.
FlashInfer metadata caches multi_item_params at prepare() time `tensorrt_llm/_torch/attention_backend/flashinfer.py`	`FlashInferAttentionMetadata` gains `_multi_item_params` field and `_process_multi_item_part_lens()` instance method; `prepare()` computes and stores multi-item tensors; `plan()` passes `_multi_item_params` into `PlanParams`; `forward_impl`/`forward()` have `multi_item_part_lens` removed; `metadata.plan()` is wrapped in `nvtx_range`.
`Attention` module removes `multi_item_part_lens` from forward path `tensorrt_llm/_torch/modules/attention.py`	`_attn_impl`, `forward_impl`, and `forward` drop `multi_item_part_lens` parameters; `AttentionForwardArgs` construction no longer includes it; the RoPE `position_ids` rewrite block for multi-item scoring is deleted.
Executor and LLM API wire `multi_item_part_lens` into `prepare()` `tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py`, `tensorrt_llm/_torch/pyexecutor/model_engine.py`, `tensorrt_llm/llmapi/llm.py`	`EncoderCUDAGraphRunner` falls back to eager when `multi_item_part_lens` is present; `model_engine._prepare_encoder_inputs` reads and passes `multi_item_part_lens` to `prepare_encoder_only()`/`prepare()`, asserting `None` on CUDA-graph replay; `llm.py encode()` gains `@torch.inference_mode()` and computes CUDA `position_ids` via `torch_multi_arange` for multi-item scoring.

Sequence Diagram(s)

sequenceDiagram
    participant encode as llm.encode()
    participant model_engine as _prepare_encoder_inputs
    participant cuda_runner as EncoderCUDAGraphRunner
    participant metadata as FlashInferAttentionMetadata
    participant plan as FlashInferAttentionMetadata.plan()

    encode->>encode: compute position_ids via torch_multi_arange
    encode->>model_engine: inputs (multi_item_part_lens, position_ids)
    model_engine->>cuda_runner: maybe_get_cuda_graph(inputs)
    cuda_runner-->>model_engine: (None, None) — fallback to eager
    model_engine->>metadata: prepare(multi_item_part_lens=...)
    metadata->>metadata: _process_multi_item_part_lens() → _multi_item_params
    model_engine->>plan: plan(...)
    plan->>plan: PlanParams(multi_item_params=_multi_item_params)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#14693: Introduced the original multi-item scoring support via multi_item_part_lens in the FlashInfer backend and AttentionForwardArgs, which this PR refactors by moving the handling from the forward path into prepare().

Suggested reviewers

tburt-nv
Funatiq
brb-nv
chang-l
eopXD

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: relocating the `torch_multi_arange` function as a chore task.
Description check	✅ Passed	The PR description follows the template structure, includes a clear explanation referencing the related PR and commit dependencies, specifies test coverage, and completes the PR checklist.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/attention_backend/flashinfer.py`:
- Around line 744-755: The code accesses req_part_lens[0] and req_part_lens[1:]
without validating that each req_part_lens in multi_item_part_lens has the
required structure, which can cause IndexError or ValueError when constructing
tensors for malformed entries like empty lists or lists with only a prefix_len.
Before constructing the prefix_len_ptr and max_item_len_ptr tensors, add
validation to ensure each req_part_lens has at least two elements (one for
prefix_len and at least one for scored items), and raise an API-level ValueError
with a descriptive message if any request part list fails this validation.
- Around line 762-770: The zip() call combining multi_item_part_lens and
token_pos_in_items_raw_lens needs to add strict=True parameter to document that
these iterables have the same length, which resolves the B905 lint finding.
Additionally, replace the list concatenation in the innermost for loop
(req_part_lens[1:] + [token_pos_in_items_len - token_pos_in_items_raw_len]) with
iterable unpacking syntax instead to resolve the RUF005 lint finding.

In `@tensorrt_llm/_torch/utils.py`:
- Around line 574-580: The variable repeats is initialized as an alias to the
ends tensor, and when starts is None, this alias is never broken before the
in-place multiplication operation repeats *= steps.sign() on line 579. This
mutates the caller's ends tensor. Fix this by using out-of-place arithmetic for
the repeat count calculation: instead of the in-place multiplication repeats *=
steps.sign(), use repeats = repeats * steps.sign() to create a new tensor and
avoid mutating the input.
- Around line 584-602: The prev_range_ends calculation using range_ends.roll(1)
doesn't account for empty ranges where repeats == 0. When a range is empty, its
nominal end value should not be used as the previous range end for the next
range; instead, the end of the last non-empty range should be carried forward.
Modify the logic that computes prev_range_ends to propagate the previous
non-empty range's end value through empty ranges, ensuring that jumps
calculations correctly reflect transitions only between actual non-empty ranges.
- Around line 541-557: Replace the assert statements in the function that
validates dtype, shape, and device compatibility between ends, steps, and starts
parameters with explicit ValueError exceptions that include descriptive error
messages. Additionally, add validation at the function entry to ensure that all
input tensors (starts, ends, and steps) are 1-D tensors, raising ValueError if
they are not, since the implementation later uses unsqueeze and torch.cat
operations that expect 1-D inputs.

In `@tensorrt_llm/llmapi/llm.py`:
- Around line 904-932: The code does not sufficiently validate the structure of
multi_item_part_lens before constructing starts_cuda and ends_cuda, allowing
malformed inputs like [prefix_len] with no item lengths to pass through and fail
later in FlashInfer. Add validation before the torch.tensor calls that construct
starts_cuda and ends_cuda to ensure that each multi_item_part_lens in
batch_multi_item_part_lens has length greater than 1 (meaning at least one item
length in addition to the prefix length) and that all length values are
non-negative. Reject the inputs early with a clear error message if these
conditions are not met.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 89ea82b0-3e34-43c9-bc70-8761dbd903f9

📥 Commits

Reviewing files that changed from the base of the PR and between 0b0a03e and faad7dc.

📒 Files selected for processing (18)

.pre-commit-config.yaml
legacy-files.txt
pyproject.toml
ruff-legacy.toml
tensorrt_llm/_torch/attention_backend/flashinfer.py
tensorrt_llm/_torch/attention_backend/interface.py
tensorrt_llm/_torch/attention_backend/star_flashinfer.py
tensorrt_llm/_torch/attention_backend/trtllm.py
tensorrt_llm/_torch/attention_backend/vanilla.py
tensorrt_llm/_torch/modules/attention.py
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py
tensorrt_llm/_torch/pyexecutor/model_engine.py
tensorrt_llm/_torch/pyexecutor/sampler.py
tensorrt_llm/_torch/pyexecutor/sampling_utils.py
tensorrt_llm/_torch/utils.py
tensorrt_llm/llmapi/llm.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/_torch/test_torch_multi_arange.py

💤 Files with no reviewable changes (2)

tensorrt_llm/_torch/pyexecutor/sampling_utils.py
tensorrt_llm/_torch/modules/attention.py

coderabbitai · 2026-06-16T18:10:11Z

+        prefix_len_ptr = torch.tensor(
+            [req_part_lens[0] for req_part_lens in multi_item_part_lens],
+            pin_memory=prefer_pinned(),
+            dtype=torch.uint32,
+        ).to(device=device, non_blocking=True)
+        token_pos_in_items_raw_lens = [  # 'raw' lengths before padding
+            sum(req_part_lens[1:]) + len(req_part_lens)
+            for req_part_lens in multi_item_part_lens
+        ]
+        token_pos_in_items_len = max(token_pos_in_items_raw_lens)
+        max_item_len_ptr = torch.tensor(
+            [max(req_part_lens[1:]) for req_part_lens in multi_item_part_lens],


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate each request’s part list before indexing it.

req_part_lens[0] and max(req_part_lens[1:]) will raise IndexError/ValueError for malformed entries such as [] or [prefix_len], instead of the API-level ValueError used by the surrounding validation. Please reject entries with fewer than one scored item before constructing tensors.

Proposed validation

if len(multi_item_part_lens) != self.num_contexts: raise ValueError( "\"multi_item_part_lens\" needs to be provided for all requests." ) + if any(len(req_part_lens) < 2 for req_part_lens in multi_item_part_lens): + raise ValueError( + "\"multi_item_part_lens\" entries must contain a prefix length " + "followed by at least one item length." + ) prefix_len_ptr = torch.tensor(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/attention_backend/flashinfer.py` around lines 744 - 755, The code accesses req_part_lens[0] and req_part_lens[1:] without validating that each req_part_lens in multi_item_part_lens has the required structure, which can cause IndexError or ValueError when constructing tensors for malformed entries like empty lists or lists with only a prefix_len. Before constructing the prefix_len_ptr and max_item_len_ptr tensors, add validation to ensure each req_part_lens has at least two elements (one for prefix_len and at least one for scored items), and raise an API-level ValueError with a descriptive message if any request part list fails this validation.

coderabbitai · 2026-06-16T18:10:11Z

+        range_ends = torch.tensor(
+            [
+                item_len + 1
+                for req_part_lens, token_pos_in_items_raw_len in zip(
+                    multi_item_part_lens, token_pos_in_items_raw_lens)
+                for item_len in (
+                    req_part_lens[1:] +
+                    [token_pos_in_items_len - token_pos_in_items_raw_len])
+            ],


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash ruff check tensorrt_llm/_torch/attention_backend/flashinfer.py --select B905,RUF005

Repository: NVIDIA/TensorRT-LLM

Length of output: 4828

🏁 Script executed:

#!/bin/bash # Read the file to see the exact code context and verify the proposed fix sed -n '760,775p' tensorrt_llm/_torch/attention_backend/flashinfer.py | cat -n

Repository: NVIDIA/TensorRT-LLM

Length of output: 955

Fix the Ruff B905/RUF005 findings in the range construction.

The two iterables passed to zip() are derived together, so strict=True documents that invariant and satisfies B905. Replace the list concatenation with iterable unpacking to resolve RUF005.

Proposed lint fix

range_ends = torch.tensor( [ item_len + 1 for req_part_lens, token_pos_in_items_raw_len in zip( - multi_item_part_lens, token_pos_in_items_raw_lens) + multi_item_part_lens, token_pos_in_items_raw_lens, strict=True) for item_len in ( - req_part_lens[1:] + - [token_pos_in_items_len - token_pos_in_items_raw_len]) + *req_part_lens[1:], + token_pos_in_items_len - token_pos_in_items_raw_len, + ) ],

🧰 Tools

🪛 Ruff (0.15.17)

[warning] 765-766: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

[warning] 768-769: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/attention_backend/flashinfer.py` around lines 762 - 770, The zip() call combining multi_item_part_lens and token_pos_in_items_raw_lens needs to add strict=True parameter to document that these iterables have the same length, which resolves the B905 lint finding. Additionally, replace the list concatenation in the innermost for loop (req_part_lens[1:] + [token_pos_in_items_len - token_pos_in_items_raw_len]) with iterable unpacking syntax instead to resolve the RUF005 lint finding.

Sources: Coding guidelines, Linters/SAST tools

coderabbitai · 2026-06-16T18:10:11Z

+    """Efficiently compute torch.cat([torch.arange(b, e, d) for b, e, d in zip(starts, ends, steps)]).
+
+    Starts, ends, steps need to share dtype and shape. Invalid ranges like range(1, 2, -1) are
+    silently discarded. 'steps' defaults to 1 and 'starts' defaults to 0.
+
+    Provide 'output_length' to avoid synchronization when using device tensors or pass
+    `ACCEPT_SYNC_COMPUTE` to explicitly accept the possibility of a device sync (for device tensors)
+    or when tensors are known to reside on the host.
+    """
+    if steps is not None:
+        assert ends.dtype == steps.dtype
+        assert ends.shape == steps.shape
+        assert ends.device == steps.device
+    if starts is not None:
+        assert ends.dtype == starts.dtype
+        assert ends.shape == starts.shape
+        assert ends.device == starts.device


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Demonstrate why assert should not guard public input validation. python -O - <<'PY' def validate(): assert False, "this is skipped with -O" validate() print("assertions are skipped under optimized mode") PY

Repository: NVIDIA/TensorRT-LLM

Length of output: 107

🏁 Script executed:

#!/bin/bash # Check if the file exists and read the relevant lines if [ -f "tensorrt_llm/_torch/utils.py" ]; then echo "=== File exists, reading lines 530-570 for context ===" sed -n '530,570p' tensorrt_llm/_torch/utils.py | cat -n else echo "File not found, searching for it" find . -name "utils.py" -path "*/_torch/*" 2>/dev/null fi

Repository: NVIDIA/TensorRT-LLM

Length of output: 2157

🏁 Script executed:

#!/bin/bash # Get the function signature and full implementation echo "=== Searching for torch_multi_arange function ===" rg -A 50 "def torch_multi_arange" tensorrt_llm/_torch/utils.py --max-count=1

Repository: NVIDIA/TensorRT-LLM

Length of output: 2367

🏁 Script executed:

#!/bin/bash # Read more of the function to see the full implementation sed -n '541,650p' tensorrt_llm/_torch/utils.py | cat -n

Repository: NVIDIA/TensorRT-LLM

Length of output: 4090

🏁 Script executed:

#!/bin/bash # Check if there's any ndim validation elsewhere in the function or if unsqueeze is used rg "unsqueeze|\.ndim|\.dim\(\)" tensorrt_llm/_torch/utils.py -B 2 -A 2 | head -50

Repository: NVIDIA/TensorRT-LLM

Length of output: 589

Replace assert statements with explicit ValueError in public input validation.

The function uses assert to validate public inputs (dtype, shape, device compatibility), which are silently skipped under Python's optimized mode (-O flag). Additionally, the implementation expects 1-D range-parameter tensors (operations on lines 63 and 71 use unsqueeze(-1) followed by torch.cat(..., dim=1)), but this constraint is not validated at the function entry.

Per coding guidelines, use built-in exception types for public input validation. Raise ValueError instead of assert, and validate that input tensors are 1-D.

Suggested fix

or when tensors are known to reside on the host. """ + if ends.ndim != 1: + raise ValueError(f"ends must be a 1-D tensor, got shape {tuple(ends.shape)}") + + def _validate_like_ends(name: str, tensor: torch.Tensor) -> None: + if ends.dtype != tensor.dtype: + raise ValueError(f"{name}.dtype must match ends.dtype") + if ends.shape != tensor.shape: + raise ValueError(f"{name}.shape must match ends.shape") + if ends.device != tensor.device: + raise ValueError(f"{name}.device must match ends.device") + if steps is not None: - assert ends.dtype == steps.dtype - assert ends.shape == steps.shape - assert ends.device == steps.device + _validate_like_ends("steps", steps) if starts is not None: - assert ends.dtype == starts.dtype - assert ends.shape == starts.shape - assert ends.device == starts.device + _validate_like_ends("starts", starts)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/utils.py` around lines 541 - 557, Replace the assert statements in the function that validates dtype, shape, and device compatibility between ends, steps, and starts parameters with explicit ValueError exceptions that include descriptive error messages. Additionally, add validation at the function entry to ensure that all input tensors (starts, ends, and steps) are 1-D tensors, raising ValueError if they are not, since the implementation later uses unsqueeze and torch.cat operations that expect 1-D inputs.

Source: Coding guidelines

coderabbitai · 2026-06-16T18:10:11Z

+    repeats = ends  # number of elements in each range
+    if starts is not None:
+        repeats = repeats.clone()
+        repeats -= starts
+    if steps is not None:
+        repeats *= steps.sign()
+        steps_abs = steps.abs()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid mutating the caller’s ends tensor.

When starts is None and steps is provided, repeats aliases ends, so Line 579 modifies the input tensor in place. Use out-of-place arithmetic for the repeat count.

Suggested fix

- repeats = ends # number of elements in each range - if starts is not None: - repeats = repeats.clone() - repeats -= starts + repeats = ends - starts if starts is not None else ends if steps is not None: - repeats *= steps.sign() + repeats = repeats * steps.sign() steps_abs = steps.abs()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/utils.py` around lines 574 - 580, The variable repeats is initialized as an alias to the ends tensor, and when starts is None, this alias is never broken before the in-place multiplication operation repeats *= steps.sign() on line 579. This mutates the caller's ends tensor. Fix this by using out-of-place arithmetic for the repeat count calculation: instead of the in-place multiplication repeats *= steps.sign(), use repeats = repeats * steps.sign() to create a new tensor and avoid mutating the input.

coderabbitai · 2026-06-16T18:10:11Z

+    range_ends = repeats - 1  # last element in each range
+    if steps is not None:
+        range_ends *= steps
+    if starts is not None:
+        range_ends += starts
+    prev_range_ends = range_ends.roll(
+        1)  # last element in preceding range (or 0)
+    prev_range_ends[0].fill_(0)
+    ones = torch.ones((), dtype=ends.dtype, device=ends.device)
+    zeros = torch.zeros((), dtype=ends.dtype, device=ends.device)
+    if steps is None:
+        steps = ones.broadcast_to(ends.shape)
+    jumps = -prev_range_ends  # delta from one range to the next
+    if starts is not None:
+        jumps += starts
+    #     NB: Apply correction for empty ranges
+    jumps_corrections = torch.where(repeats == 0, jumps,
+                                    zeros).cumsum(0, dtype=ends.dtype)
+    jumps += jumps_corrections


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Carry forward the previous non-empty range end.

Empty ranges are skipped in seq_repeats, but prev_range_ends still uses their nominal start - step end. For example, default starts/steps with ends=[0, 3] yields [1, 2, 3] instead of [0, 1, 2].

Suggested direction

range_ends = repeats - 1 # last element in each range if steps is not None: range_ends *= steps - if starts is not None: - range_ends += starts - prev_range_ends = range_ends.roll( - 1) # last element in preceding range (or 0) - prev_range_ends[0].fill_(0) ones = torch.ones((), dtype=ends.dtype, device=ends.device) zeros = torch.zeros((), dtype=ends.dtype, device=ends.device) + start_values = starts if starts is not None else zeros.broadcast_to(ends.shape) + range_ends += start_values + + non_empty = repeats > 0 + range_indices = torch.arange(ends.numel(), device=ends.device) + last_non_empty = torch.where( + non_empty, range_indices, range_indices.new_full((), -1) + ).cummax(0).values + prev_non_empty = last_non_empty.roll(1) + prev_non_empty[0].fill_(-1) + prev_range_ends = torch.where( + prev_non_empty >= 0, + range_ends[prev_non_empty.clamp(min=0)], + zeros, + ) if steps is None: steps = ones.broadcast_to(ends.shape) - jumps = -prev_range_ends # delta from one range to the next - if starts is not None: - jumps += starts - # NB: Apply correction for empty ranges - jumps_corrections = torch.where(repeats == 0, jumps, - zeros).cumsum(0, dtype=ends.dtype) - jumps += jumps_corrections + jumps = start_values - prev_range_ends

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/utils.py` around lines 584 - 602, The prev_range_ends calculation using range_ends.roll(1) doesn't account for empty ranges where repeats == 0. When a range is empty, its nominal end value should not be used as the previous range end for the next range; instead, the end of the last non-empty range should be carried forward. Modify the logic that computes prev_range_ends to propagate the previous non-empty range's end value through empty ranges, ensuring that jumps calculations correctly reflect transitions only between actual non-empty ranges.

coderabbitai · 2026-06-16T18:10:11Z

+            # Scoring items have overlapping position IDs. Position IDs of delimiters
+            # are irrelevant.
+            starts_cuda = torch.tensor(
+                [
+                    start for multi_item_part_lens in batch_multi_item_part_lens
+                    for start in [0] + [multi_item_part_lens[0]] *
+                    (len(multi_item_part_lens) - 1)
+                ],
+                pin_memory=prefer_pinned(),
+                dtype=torch.int32,
+            ).to("cuda", non_blocking=True)  # uses current device
+            ends_cuda = torch.tensor(
+                [
+                    end + 1
+                    for multi_item_part_lens in batch_multi_item_part_lens
+                    for end in [multi_item_part_lens[0]] + [
+                        multi_item_part_lens[0] + item_len
+                        for item_len in multi_item_part_lens[1:]
+                    ]
+                ],
+                pin_memory=prefer_pinned(),
+                dtype=torch.int32,
+            ).to("cuda", non_blocking=True)
+            position_ids_cuda = torch_multi_arange(
+                starts=starts_cuda,
+                ends=ends_cuda,
+                output_length=len(flat_token_ids),
+            )
+            forward_inputs["position_ids"] = position_ids_cuda


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate multi_item_part_lens shape before generating CUDA position IDs.

The length check accepts malformed inputs such as [prefix_len] with no item lengths, so this block can build position_ids successfully while FlashInfer later fails in _process_multi_item_part_lens() on max(req_part_lens[1:]). Reject lists without at least one item and negative lengths before constructing starts_cuda/ends_cuda.

Proposed validation tightening

prompt_token_ids = inp_tok["prompt_token_ids"] if multi_item_part_lens is not None: + if len(multi_item_part_lens) < 2: + raise ValueError( + '"multi_item_part_lens" must contain a prefix length ' + "followed by at least one item length" + ) + if any(part_len < 0 for part_len in multi_item_part_lens): + raise ValueError( + '"multi_item_part_lens" must not contain negative lengths' + ) # validate lengths if sum(multi_item_part_lens) + len( multi_item_part_lens) != len(prompt_token_ids): raise ValueError( "\"multi_item_part_lens\" inconsistent with prompt length"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/llmapi/llm.py` around lines 904 - 932, The code does not sufficiently validate the structure of multi_item_part_lens before constructing starts_cuda and ends_cuda, allowing malformed inputs like [prefix_len] with no item lengths to pass through and fail later in FlashInfer. Add validation before the torch.tensor calls that construct starts_cuda and ends_cuda to ensure that each multi_item_part_lens in batch_multi_item_part_lens has length greater than 1 (meaning at least one item length in addition to the prefix length) and that all length values are non-negative. Reject the inputs early with a clear error message if these conditions are not met.

tensorrt-cicd · 2026-06-16T20:24:11Z

PR_Github #54587 [ run ] completed with state FAILURE. Commit: faad7dc
/LLM/main/L0_MergeRequest_PR pipeline #43630 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

juney-nvidia

Approved

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

github-actions Bot assigned ixlmar Jun 16, 2026

ixlmar requested a review from Funatiq June 16, 2026 12:35

Funatiq approved these changes Jun 16, 2026

View reviewed changes

tburt-nv approved these changes Jun 16, 2026

View reviewed changes

brb-nv approved these changes Jun 16, 2026

View reviewed changes

ixlmar marked this pull request as ready for review June 16, 2026 17:55

ixlmar requested review from a team as code owners June 16, 2026 17:55

ixlmar requested review from HuiGao-NV, ZhanruiSunCh, schetlur-nv, yiqingy0 and yuxianq June 16, 2026 17:55

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

yuxianq reviewed Jun 17, 2026

View reviewed changes

Comment thread legacy-files.txt Outdated

yuxianq reviewed Jun 17, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/attention_backend/interface.py Outdated

MartinMarciniszyn approved these changes Jun 17, 2026

View reviewed changes

juney-nvidia approved these changes Jun 17, 2026

View reviewed changes

ixlmar removed request for HuiGao-NV, schetlur-nv and yiqingy0 June 17, 2026 09:37

ixlmar removed the request for review from ZhanruiSunCh June 17, 2026 09:37

chore: relocate torch_multi_arange

6cf0c06

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

ixlmar force-pushed the chore/move-torch-multi-arange branch from faad7dc to 6cf0c06 Compare June 24, 2026 10:10

fix: sort legacy-files.txt

3f1df73

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

yuxianq approved these changes Jun 24, 2026

View reviewed changes

Conversation

ixlmar commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Uh oh!

ixlmar commented Jun 16, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 16, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

Uh oh!

Uh oh!

juney-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ixlmar commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading