Skip to content

[TRTLLM-12982][chore] relocate torch_multi_arange#15416

Open
ixlmar wants to merge 2 commits into
NVIDIA:mainfrom
ixlmar:chore/move-torch-multi-arange
Open

[TRTLLM-12982][chore] relocate torch_multi_arange#15416
ixlmar wants to merge 2 commits into
NVIDIA:mainfrom
ixlmar:chore/move-torch-multi-arange

Conversation

@ixlmar

@ixlmar ixlmar commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Description

Follow-up on #14693 (comment).

Commit 800c7ee is from #15413, which is to be merged before this PR.

Test Coverage

Covered by existing tests

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

  • Improvements

    • Encoder CUDA graphs now properly detect multi-item scoring scenarios and fall back to eager execution when necessary.
  • Refactoring

    • Optimized attention metadata to accept multi-item configuration during preparation phase instead of forward pass.
    • Reorganized utility functions for improved code maintainability.
  • Chores

    • Updated test infrastructure and file organization.

@ixlmar ixlmar requested a review from Funatiq June 16, 2026 12:35
@ixlmar

ixlmar commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54587 [ run ] triggered by Bot. Commit: faad7dc Link to invocation

@brb-nv brb-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ixlmar ixlmar marked this pull request as ready for review June 16, 2026 17:55
@ixlmar ixlmar requested review from a team as code owners June 16, 2026 17:55
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

torch_multi_arange (with _AcceptSyncCompute/ACCEPT_SYNC_COMPUTE) is moved from sampling_utils.py to utils.py and all import sites are updated. Separately, multi_item_part_lens is removed from AttentionForwardArgs and from all attention forward signatures, and instead passed as a keyword argument to AttentionMetadata.prepare(), with FlashInfer caching the computed FlashInferMultiItemParams in _multi_item_params for use during plan().

Changes

torch_multi_arange relocation and multi_item_part_lens prepare() refactor

Layer / File(s) Summary
torch_multi_arange relocated to utils.py
tensorrt_llm/_torch/utils.py, tensorrt_llm/_torch/pyexecutor/sampling_utils.py, tensorrt_llm/_torch/pyexecutor/sampler.py, tests/unittest/_torch/test_torch_multi_arange.py, tests/integration/test_lists/test-db/l0_a10.yml, .pre-commit-config.yaml, legacy-files.txt, pyproject.toml, ruff-legacy.toml
_AcceptSyncCompute, ACCEPT_SYNC_COMPUTE, and torch_multi_arange are added to utils.py and deleted from sampling_utils.py; sampler.py import is redirected; test, test-list, and lint config entries are updated to the new path.
AttentionMetadata.prepare() and AttentionForwardArgs contract
tensorrt_llm/_torch/attention_backend/interface.py
AttentionMetadata.prepare() gains a keyword-only multi_item_part_lens parameter; AttentionForwardArgs drops its multi_item_part_lens field, removing multi-item layout from per-forward args.
Backend prepare() enforce/reject multi_item_part_lens
tensorrt_llm/_torch/attention_backend/vanilla.py, tensorrt_llm/_torch/attention_backend/star_flashinfer.py, tensorrt_llm/_torch/attention_backend/trtllm.py
VanillaAttentionMetadata, StarAttentionMetadata, TrtllmAttentionMetadata, and prepare_encoder_only each add the keyword-only multi_item_part_lens parameter and raise ValueError when non-None; per-forward ValueError checks are removed.
FlashInfer metadata caches multi_item_params at prepare() time
tensorrt_llm/_torch/attention_backend/flashinfer.py
FlashInferAttentionMetadata gains _multi_item_params field and _process_multi_item_part_lens() instance method; prepare() computes and stores multi-item tensors; plan() passes _multi_item_params into PlanParams; forward_impl/forward() have multi_item_part_lens removed; metadata.plan() is wrapped in nvtx_range.
Attention module removes multi_item_part_lens from forward path
tensorrt_llm/_torch/modules/attention.py
_attn_impl, forward_impl, and forward drop multi_item_part_lens parameters; AttentionForwardArgs construction no longer includes it; the RoPE position_ids rewrite block for multi-item scoring is deleted.
Executor and LLM API wire multi_item_part_lens into prepare()
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py, tensorrt_llm/_torch/pyexecutor/model_engine.py, tensorrt_llm/llmapi/llm.py
EncoderCUDAGraphRunner falls back to eager when multi_item_part_lens is present; model_engine._prepare_encoder_inputs reads and passes multi_item_part_lens to prepare_encoder_only()/prepare(), asserting None on CUDA-graph replay; llm.py encode() gains @torch.inference_mode() and computes CUDA position_ids via torch_multi_arange for multi-item scoring.

Sequence Diagram(s)

sequenceDiagram
    participant encode as llm.encode()
    participant model_engine as _prepare_encoder_inputs
    participant cuda_runner as EncoderCUDAGraphRunner
    participant metadata as FlashInferAttentionMetadata
    participant plan as FlashInferAttentionMetadata.plan()

    encode->>encode: compute position_ids via torch_multi_arange
    encode->>model_engine: inputs (multi_item_part_lens, position_ids)
    model_engine->>cuda_runner: maybe_get_cuda_graph(inputs)
    cuda_runner-->>model_engine: (None, None) — fallback to eager
    model_engine->>metadata: prepare(multi_item_part_lens=...)
    metadata->>metadata: _process_multi_item_part_lens() → _multi_item_params
    model_engine->>plan: plan(...)
    plan->>plan: PlanParams(multi_item_params=_multi_item_params)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#14693: Introduced the original multi-item scoring support via multi_item_part_lens in the FlashInfer backend and AttentionForwardArgs, which this PR refactors by moving the handling from the forward path into prepare().

Suggested reviewers

  • tburt-nv
  • Funatiq
  • brb-nv
  • chang-l
  • eopXD
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: relocating the torch_multi_arange function as a chore task.
Description check ✅ Passed The PR description follows the template structure, includes a clear explanation referencing the related PR and commit dependencies, specifies test coverage, and completes the PR checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/attention_backend/flashinfer.py`:
- Around line 744-755: The code accesses req_part_lens[0] and req_part_lens[1:]
without validating that each req_part_lens in multi_item_part_lens has the
required structure, which can cause IndexError or ValueError when constructing
tensors for malformed entries like empty lists or lists with only a prefix_len.
Before constructing the prefix_len_ptr and max_item_len_ptr tensors, add
validation to ensure each req_part_lens has at least two elements (one for
prefix_len and at least one for scored items), and raise an API-level ValueError
with a descriptive message if any request part list fails this validation.
- Around line 762-770: The zip() call combining multi_item_part_lens and
token_pos_in_items_raw_lens needs to add strict=True parameter to document that
these iterables have the same length, which resolves the B905 lint finding.
Additionally, replace the list concatenation in the innermost for loop
(req_part_lens[1:] + [token_pos_in_items_len - token_pos_in_items_raw_len]) with
iterable unpacking syntax instead to resolve the RUF005 lint finding.

In `@tensorrt_llm/_torch/utils.py`:
- Around line 574-580: The variable repeats is initialized as an alias to the
ends tensor, and when starts is None, this alias is never broken before the
in-place multiplication operation repeats *= steps.sign() on line 579. This
mutates the caller's ends tensor. Fix this by using out-of-place arithmetic for
the repeat count calculation: instead of the in-place multiplication repeats *=
steps.sign(), use repeats = repeats * steps.sign() to create a new tensor and
avoid mutating the input.
- Around line 584-602: The prev_range_ends calculation using range_ends.roll(1)
doesn't account for empty ranges where repeats == 0. When a range is empty, its
nominal end value should not be used as the previous range end for the next
range; instead, the end of the last non-empty range should be carried forward.
Modify the logic that computes prev_range_ends to propagate the previous
non-empty range's end value through empty ranges, ensuring that jumps
calculations correctly reflect transitions only between actual non-empty ranges.
- Around line 541-557: Replace the assert statements in the function that
validates dtype, shape, and device compatibility between ends, steps, and starts
parameters with explicit ValueError exceptions that include descriptive error
messages. Additionally, add validation at the function entry to ensure that all
input tensors (starts, ends, and steps) are 1-D tensors, raising ValueError if
they are not, since the implementation later uses unsqueeze and torch.cat
operations that expect 1-D inputs.

In `@tensorrt_llm/llmapi/llm.py`:
- Around line 904-932: The code does not sufficiently validate the structure of
multi_item_part_lens before constructing starts_cuda and ends_cuda, allowing
malformed inputs like [prefix_len] with no item lengths to pass through and fail
later in FlashInfer. Add validation before the torch.tensor calls that construct
starts_cuda and ends_cuda to ensure that each multi_item_part_lens in
batch_multi_item_part_lens has length greater than 1 (meaning at least one item
length in addition to the prefix length) and that all length values are
non-negative. Reject the inputs early with a clear error message if these
conditions are not met.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 89ea82b0-3e34-43c9-bc70-8761dbd903f9

📥 Commits

Reviewing files that changed from the base of the PR and between 0b0a03e and faad7dc.

📒 Files selected for processing (18)
  • .pre-commit-config.yaml
  • legacy-files.txt
  • pyproject.toml
  • ruff-legacy.toml
  • tensorrt_llm/_torch/attention_backend/flashinfer.py
  • tensorrt_llm/_torch/attention_backend/interface.py
  • tensorrt_llm/_torch/attention_backend/star_flashinfer.py
  • tensorrt_llm/_torch/attention_backend/trtllm.py
  • tensorrt_llm/_torch/attention_backend/vanilla.py
  • tensorrt_llm/_torch/modules/attention.py
  • tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
  • tensorrt_llm/_torch/pyexecutor/sampler.py
  • tensorrt_llm/_torch/pyexecutor/sampling_utils.py
  • tensorrt_llm/_torch/utils.py
  • tensorrt_llm/llmapi/llm.py
  • tests/integration/test_lists/test-db/l0_a10.yml
  • tests/unittest/_torch/test_torch_multi_arange.py
💤 Files with no reviewable changes (2)
  • tensorrt_llm/_torch/pyexecutor/sampling_utils.py
  • tensorrt_llm/_torch/modules/attention.py

Comment on lines +744 to +755
prefix_len_ptr = torch.tensor(
[req_part_lens[0] for req_part_lens in multi_item_part_lens],
pin_memory=prefer_pinned(),
dtype=torch.uint32,
).to(device=device, non_blocking=True)
token_pos_in_items_raw_lens = [ # 'raw' lengths before padding
sum(req_part_lens[1:]) + len(req_part_lens)
for req_part_lens in multi_item_part_lens
]
token_pos_in_items_len = max(token_pos_in_items_raw_lens)
max_item_len_ptr = torch.tensor(
[max(req_part_lens[1:]) for req_part_lens in multi_item_part_lens],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate each request’s part list before indexing it.

req_part_lens[0] and max(req_part_lens[1:]) will raise IndexError/ValueError for malformed entries such as [] or [prefix_len], instead of the API-level ValueError used by the surrounding validation. Please reject entries with fewer than one scored item before constructing tensors.

Proposed validation
         if len(multi_item_part_lens) != self.num_contexts:
             raise ValueError(
                 "\"multi_item_part_lens\" needs to be provided for all requests."
             )
+        if any(len(req_part_lens) < 2 for req_part_lens in multi_item_part_lens):
+            raise ValueError(
+                "\"multi_item_part_lens\" entries must contain a prefix length "
+                "followed by at least one item length."
+            )
 
         prefix_len_ptr = torch.tensor(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/attention_backend/flashinfer.py` around lines 744 - 755,
The code accesses req_part_lens[0] and req_part_lens[1:] without validating that
each req_part_lens in multi_item_part_lens has the required structure, which can
cause IndexError or ValueError when constructing tensors for malformed entries
like empty lists or lists with only a prefix_len. Before constructing the
prefix_len_ptr and max_item_len_ptr tensors, add validation to ensure each
req_part_lens has at least two elements (one for prefix_len and at least one for
scored items), and raise an API-level ValueError with a descriptive message if
any request part list fails this validation.

Comment on lines +762 to +770
range_ends = torch.tensor(
[
item_len + 1
for req_part_lens, token_pos_in_items_raw_len in zip(
multi_item_part_lens, token_pos_in_items_raw_lens)
for item_len in (
req_part_lens[1:] +
[token_pos_in_items_len - token_pos_in_items_raw_len])
],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
ruff check tensorrt_llm/_torch/attention_backend/flashinfer.py --select B905,RUF005

Repository: NVIDIA/TensorRT-LLM

Length of output: 4828


🏁 Script executed:

#!/bin/bash
# Read the file to see the exact code context and verify the proposed fix
sed -n '760,775p' tensorrt_llm/_torch/attention_backend/flashinfer.py | cat -n

Repository: NVIDIA/TensorRT-LLM

Length of output: 955


Fix the Ruff B905/RUF005 findings in the range construction.

The two iterables passed to zip() are derived together, so strict=True documents that invariant and satisfies B905. Replace the list concatenation with iterable unpacking to resolve RUF005.

Proposed lint fix
         range_ends = torch.tensor(
             [
                 item_len + 1
                 for req_part_lens, token_pos_in_items_raw_len in zip(
-                    multi_item_part_lens, token_pos_in_items_raw_lens)
+                    multi_item_part_lens, token_pos_in_items_raw_lens, strict=True)
                 for item_len in (
-                    req_part_lens[1:] +
-                    [token_pos_in_items_len - token_pos_in_items_raw_len])
+                    *req_part_lens[1:],
+                    token_pos_in_items_len - token_pos_in_items_raw_len,
+                )
             ],
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 765-766: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


[warning] 768-769: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/attention_backend/flashinfer.py` around lines 762 - 770,
The zip() call combining multi_item_part_lens and token_pos_in_items_raw_lens
needs to add strict=True parameter to document that these iterables have the
same length, which resolves the B905 lint finding. Additionally, replace the
list concatenation in the innermost for loop (req_part_lens[1:] +
[token_pos_in_items_len - token_pos_in_items_raw_len]) with iterable unpacking
syntax instead to resolve the RUF005 lint finding.

Sources: Coding guidelines, Linters/SAST tools

Comment on lines +541 to +557
"""Efficiently compute torch.cat([torch.arange(b, e, d) for b, e, d in zip(starts, ends, steps)]).

Starts, ends, steps need to share dtype and shape. Invalid ranges like range(1, 2, -1) are
silently discarded. 'steps' defaults to 1 and 'starts' defaults to 0.

Provide 'output_length' to avoid synchronization when using device tensors or pass
`ACCEPT_SYNC_COMPUTE` to explicitly accept the possibility of a device sync (for device tensors)
or when tensors are known to reside on the host.
"""
if steps is not None:
assert ends.dtype == steps.dtype
assert ends.shape == steps.shape
assert ends.device == steps.device
if starts is not None:
assert ends.dtype == starts.dtype
assert ends.shape == starts.shape
assert ends.device == starts.device

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Demonstrate why assert should not guard public input validation.
python -O - <<'PY'
def validate():
    assert False, "this is skipped with -O"

validate()
print("assertions are skipped under optimized mode")
PY

Repository: NVIDIA/TensorRT-LLM

Length of output: 107


🏁 Script executed:

#!/bin/bash
# Check if the file exists and read the relevant lines
if [ -f "tensorrt_llm/_torch/utils.py" ]; then
  echo "=== File exists, reading lines 530-570 for context ==="
  sed -n '530,570p' tensorrt_llm/_torch/utils.py | cat -n
else
  echo "File not found, searching for it"
  find . -name "utils.py" -path "*/_torch/*" 2>/dev/null
fi

Repository: NVIDIA/TensorRT-LLM

Length of output: 2157


🏁 Script executed:

#!/bin/bash
# Get the function signature and full implementation
echo "=== Searching for torch_multi_arange function ==="
rg -A 50 "def torch_multi_arange" tensorrt_llm/_torch/utils.py --max-count=1

Repository: NVIDIA/TensorRT-LLM

Length of output: 2367


🏁 Script executed:

#!/bin/bash
# Read more of the function to see the full implementation
sed -n '541,650p' tensorrt_llm/_torch/utils.py | cat -n

Repository: NVIDIA/TensorRT-LLM

Length of output: 4090


🏁 Script executed:

#!/bin/bash
# Check if there's any ndim validation elsewhere in the function or if unsqueeze is used
rg "unsqueeze|\.ndim|\.dim\(\)" tensorrt_llm/_torch/utils.py -B 2 -A 2 | head -50

Repository: NVIDIA/TensorRT-LLM

Length of output: 589


Replace assert statements with explicit ValueError in public input validation.

The function uses assert to validate public inputs (dtype, shape, device compatibility), which are silently skipped under Python's optimized mode (-O flag). Additionally, the implementation expects 1-D range-parameter tensors (operations on lines 63 and 71 use unsqueeze(-1) followed by torch.cat(..., dim=1)), but this constraint is not validated at the function entry.

Per coding guidelines, use built-in exception types for public input validation. Raise ValueError instead of assert, and validate that input tensors are 1-D.

Suggested fix
     or when tensors are known to reside on the host.
     """
+    if ends.ndim != 1:
+        raise ValueError(f"ends must be a 1-D tensor, got shape {tuple(ends.shape)}")
+
+    def _validate_like_ends(name: str, tensor: torch.Tensor) -> None:
+        if ends.dtype != tensor.dtype:
+            raise ValueError(f"{name}.dtype must match ends.dtype")
+        if ends.shape != tensor.shape:
+            raise ValueError(f"{name}.shape must match ends.shape")
+        if ends.device != tensor.device:
+            raise ValueError(f"{name}.device must match ends.device")
+
     if steps is not None:
-        assert ends.dtype == steps.dtype
-        assert ends.shape == steps.shape
-        assert ends.device == steps.device
+        _validate_like_ends("steps", steps)
     if starts is not None:
-        assert ends.dtype == starts.dtype
-        assert ends.shape == starts.shape
-        assert ends.device == starts.device
+        _validate_like_ends("starts", starts)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/utils.py` around lines 541 - 557, Replace the assert
statements in the function that validates dtype, shape, and device compatibility
between ends, steps, and starts parameters with explicit ValueError exceptions
that include descriptive error messages. Additionally, add validation at the
function entry to ensure that all input tensors (starts, ends, and steps) are
1-D tensors, raising ValueError if they are not, since the implementation later
uses unsqueeze and torch.cat operations that expect 1-D inputs.

Source: Coding guidelines

Comment on lines +574 to +580
repeats = ends # number of elements in each range
if starts is not None:
repeats = repeats.clone()
repeats -= starts
if steps is not None:
repeats *= steps.sign()
steps_abs = steps.abs()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid mutating the caller’s ends tensor.

When starts is None and steps is provided, repeats aliases ends, so Line 579 modifies the input tensor in place. Use out-of-place arithmetic for the repeat count.

Suggested fix
-    repeats = ends  # number of elements in each range
-    if starts is not None:
-        repeats = repeats.clone()
-        repeats -= starts
+    repeats = ends - starts if starts is not None else ends
     if steps is not None:
-        repeats *= steps.sign()
+        repeats = repeats * steps.sign()
         steps_abs = steps.abs()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/utils.py` around lines 574 - 580, The variable repeats is
initialized as an alias to the ends tensor, and when starts is None, this alias
is never broken before the in-place multiplication operation repeats *=
steps.sign() on line 579. This mutates the caller's ends tensor. Fix this by
using out-of-place arithmetic for the repeat count calculation: instead of the
in-place multiplication repeats *= steps.sign(), use repeats = repeats *
steps.sign() to create a new tensor and avoid mutating the input.

Comment on lines +584 to +602
range_ends = repeats - 1 # last element in each range
if steps is not None:
range_ends *= steps
if starts is not None:
range_ends += starts
prev_range_ends = range_ends.roll(
1) # last element in preceding range (or 0)
prev_range_ends[0].fill_(0)
ones = torch.ones((), dtype=ends.dtype, device=ends.device)
zeros = torch.zeros((), dtype=ends.dtype, device=ends.device)
if steps is None:
steps = ones.broadcast_to(ends.shape)
jumps = -prev_range_ends # delta from one range to the next
if starts is not None:
jumps += starts
# NB: Apply correction for empty ranges
jumps_corrections = torch.where(repeats == 0, jumps,
zeros).cumsum(0, dtype=ends.dtype)
jumps += jumps_corrections

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Carry forward the previous non-empty range end.

Empty ranges are skipped in seq_repeats, but prev_range_ends still uses their nominal start - step end. For example, default starts/steps with ends=[0, 3] yields [1, 2, 3] instead of [0, 1, 2].

Suggested direction
     range_ends = repeats - 1  # last element in each range
     if steps is not None:
         range_ends *= steps
-    if starts is not None:
-        range_ends += starts
-    prev_range_ends = range_ends.roll(
-        1)  # last element in preceding range (or 0)
-    prev_range_ends[0].fill_(0)
     ones = torch.ones((), dtype=ends.dtype, device=ends.device)
     zeros = torch.zeros((), dtype=ends.dtype, device=ends.device)
+    start_values = starts if starts is not None else zeros.broadcast_to(ends.shape)
+    range_ends += start_values
+
+    non_empty = repeats > 0
+    range_indices = torch.arange(ends.numel(), device=ends.device)
+    last_non_empty = torch.where(
+        non_empty, range_indices, range_indices.new_full((), -1)
+    ).cummax(0).values
+    prev_non_empty = last_non_empty.roll(1)
+    prev_non_empty[0].fill_(-1)
+    prev_range_ends = torch.where(
+        prev_non_empty >= 0,
+        range_ends[prev_non_empty.clamp(min=0)],
+        zeros,
+    )
     if steps is None:
         steps = ones.broadcast_to(ends.shape)
-    jumps = -prev_range_ends  # delta from one range to the next
-    if starts is not None:
-        jumps += starts
-    #     NB: Apply correction for empty ranges
-    jumps_corrections = torch.where(repeats == 0, jumps,
-                                    zeros).cumsum(0, dtype=ends.dtype)
-    jumps += jumps_corrections
+    jumps = start_values - prev_range_ends
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/utils.py` around lines 584 - 602, The prev_range_ends
calculation using range_ends.roll(1) doesn't account for empty ranges where
repeats == 0. When a range is empty, its nominal end value should not be used as
the previous range end for the next range; instead, the end of the last
non-empty range should be carried forward. Modify the logic that computes
prev_range_ends to propagate the previous non-empty range's end value through
empty ranges, ensuring that jumps calculations correctly reflect transitions
only between actual non-empty ranges.

Comment thread tensorrt_llm/llmapi/llm.py Outdated
Comment on lines +904 to +932
# Scoring items have overlapping position IDs. Position IDs of delimiters
# are irrelevant.
starts_cuda = torch.tensor(
[
start for multi_item_part_lens in batch_multi_item_part_lens
for start in [0] + [multi_item_part_lens[0]] *
(len(multi_item_part_lens) - 1)
],
pin_memory=prefer_pinned(),
dtype=torch.int32,
).to("cuda", non_blocking=True) # uses current device
ends_cuda = torch.tensor(
[
end + 1
for multi_item_part_lens in batch_multi_item_part_lens
for end in [multi_item_part_lens[0]] + [
multi_item_part_lens[0] + item_len
for item_len in multi_item_part_lens[1:]
]
],
pin_memory=prefer_pinned(),
dtype=torch.int32,
).to("cuda", non_blocking=True)
position_ids_cuda = torch_multi_arange(
starts=starts_cuda,
ends=ends_cuda,
output_length=len(flat_token_ids),
)
forward_inputs["position_ids"] = position_ids_cuda

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate multi_item_part_lens shape before generating CUDA position IDs.

The length check accepts malformed inputs such as [prefix_len] with no item lengths, so this block can build position_ids successfully while FlashInfer later fails in _process_multi_item_part_lens() on max(req_part_lens[1:]). Reject lists without at least one item and negative lengths before constructing starts_cuda/ends_cuda.

Proposed validation tightening
                 prompt_token_ids = inp_tok["prompt_token_ids"]
                 if multi_item_part_lens is not None:
+                    if len(multi_item_part_lens) < 2:
+                        raise ValueError(
+                            '"multi_item_part_lens" must contain a prefix length '
+                            "followed by at least one item length"
+                        )
+                    if any(part_len < 0 for part_len in multi_item_part_lens):
+                        raise ValueError(
+                            '"multi_item_part_lens" must not contain negative lengths'
+                        )
                     # validate lengths
                     if sum(multi_item_part_lens) + len(
                             multi_item_part_lens) != len(prompt_token_ids):
                         raise ValueError(
                             "\"multi_item_part_lens\" inconsistent with prompt length"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/llmapi/llm.py` around lines 904 - 932, The code does not
sufficiently validate the structure of multi_item_part_lens before constructing
starts_cuda and ends_cuda, allowing malformed inputs like [prefix_len] with no
item lengths to pass through and fail later in FlashInfer. Add validation before
the torch.tensor calls that construct starts_cuda and ends_cuda to ensure that
each multi_item_part_lens in batch_multi_item_part_lens has length greater than
1 (meaning at least one item length in addition to the prefix length) and that
all length values are non-negative. Reject the inputs early with a clear error
message if these conditions are not met.

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54587 [ run ] completed with state FAILURE. Commit: faad7dc
/LLM/main/L0_MergeRequest_PR pipeline #43630 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Comment thread legacy-files.txt Outdated
Comment thread tensorrt_llm/_torch/attention_backend/interface.py Outdated

@juney-nvidia juney-nvidia left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@ixlmar ixlmar removed the request for review from ZhanruiSunCh June 17, 2026 09:37
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
@ixlmar ixlmar force-pushed the chore/move-torch-multi-arange branch from faad7dc to 6cf0c06 Compare June 24, 2026 10:10
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants