Skip to content

[None][perf] Improve Qwen3-VL Preprocessing Perf#15598

Open
aswinvisva wants to merge 19 commits into
NVIDIA:mainfrom
aswinvisva:avisva/cosmos3-vlm-preproc-perf
Open

[None][perf] Improve Qwen3-VL Preprocessing Perf#15598
aswinvisva wants to merge 19 commits into
NVIDIA:mainfrom
aswinvisva:avisva/cosmos3-vlm-preproc-perf

Conversation

@aswinvisva

@aswinvisva aswinvisva commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Improved media loading and video handling for faster, more reliable processing of local files, URLs, and in-memory content.
    • Added support for more efficient server-side preprocessing with dedicated worker threads.
  • Bug Fixes

    • Reduced the chance of media decoding delays by moving blocking work off the main async path.
    • Improved handling of video frame sampling so preprocessed inputs are treated more consistently.
    • Made video input handling from bytes more robust, including audio extraction from in-memory content.

Description

A few improvements to the preprocessing speed of Qwen3-VL (and other models should benefit too)

  • [uvloop](https://uvloop.readthedocs.io/) asyncio event loop backend - already default in vllm and is 2-4 faster which benefits us when event loop is under a huge queue of pending events - this is just a simple drop-in replacement that makes the largest diff
  • separate input processor pool - again already done in vllm, and benefits by having a dedicated pool for the heavy HF processor rather than using the default shared asyncio thread pool - for both completion and chat endpoints
  • dedicated media IO load pool - same as above, but for media IO loading - which is very slow for videos
  • Skip tempfile write for cv2.VideoCapture and pass the BytesIO directly - skips the file IO which is slower especially when under concurrent load
  • Don't return a list of pytorch tensors from _load_video_by_cv2 via python list comprehension (calls torch.from_numpy for every request) - instead return the uint8 numpy frames and let the HF processor's do_rescale=True path do the rescale + permute as a single vectorized op on the stacked (N, H, W, 3) array
  • Skip the HF processor frame subsampling by passing do_sample_frames=False via mm_processing_kwargs so we don't duplicate the work (Media IO already does this work upstream)
  • memoize the call to merge_kwargs in preprocessing_qwen3vl. This function is a pure function (i.e same args produce same output), and is called for every input processor invocation - and was taking 76ms under load. Simply memoize the output for the args passed

Overall perf improvements:

benchmark: 30s 374×374 libx264 mp4 video (moving_shapes synth) + 50 text tokens → ISL ≈ 2.3k–13.1k depending on fps, OSL ∈ {1, 100}, fps ∈ {1, 2, 4, 8}, concurrency ∈ {1, 8, 16, 32, 64, 128, 256}

Cosmos3-Super-Reasoner

GPU Precision median Δ tok/s/GPU
B200 BF16 +7.4%
B200 FP8 +22.9%
B200 NVFP4 +34.4%
H200 BF16 +12.6%
H200 FP8 +25.3%
All all +19.4%

Cosmos3-Nano-Reasoner

GPU Precision median Δ tok/s/GPU
B200 BF16 +43.2%
B200 FP8 +74.4%
B200 NVFP4 +76.4%
H200 BF16 +37.9%
H200 FP8 +52.7%
All all +49.8%

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds dedicated ThreadPoolExecutor instances for blocking media decoding (media_io.py) and HuggingFace input preprocessing (openai_server.py), replacing asyncio.to_thread. Refactors _load_video_by_cv2 for in-memory OpenCV decoding without temp files and changes format="pt" to return uint8 numpy frames. Installs uvloop when available. Adds a memoizing cache for Qwen2VL processor _merge_kwargs and changes Qwen3VL video kwargs to set do_sample_frames=False.

Changes

Media I/O & Async Threading

Layer / File(s) Summary
Dedicated media decoding thread pool
tensorrt_llm/inputs/media_io.py
Adds _MEDIA_LOAD_EXECUTOR (a module-level ThreadPoolExecutor) and _media_load_run helper; refactors BaseMediaIO.async_load to dispatch all blocking load calls through this executor instead of asyncio.to_thread.
In-memory OpenCV video decoding and frame format
tensorrt_llm/inputs/media_io.py
Updates _load_video_by_cv2 to decode from in-memory bytes via stream-buffered OpenCV backends with BytesIO (temp file fallback). Changes format="pt" to return uint8 RGB numpy arrays. Fixes audio extraction to wrap bytes in a fresh BytesIO. Changes VideoMediaIO.load_bytes to pass bytes directly without writing a temp file.
Dedicated HF preprocessing executor in OpenAI server
tensorrt_llm/serve/openai_server.py
Adds _INPUT_PROC_EXECUTOR (sized by TRTLLM_INPUT_PROC_WORKERS, default 8) and replaces asyncio.to_thread in openai_chat and openai_completion with loop.run_in_executor(_INPUT_PROC_EXECUTOR, functools.partial(...)).
Optional uvloop event loop policy
tensorrt_llm/commands/serve.py
Installs uvloop as the asyncio event loop policy at import time if the package is available; silently skips on ImportError.

VLM Processor Fixes

Layer / File(s) Summary
Qwen2VL _merge_kwargs memoization
tensorrt_llm/_torch/models/modeling_qwen2vl.py
Adds _install_merge_kwargs_cache to monkey-patch processor._merge_kwargs with a deep-copy-backed memoizing wrapper keyed by repr(sorted(kwargs.items())), guarded against double-install. Called from Qwen2VLInputProcessorBase.__init__.
Qwen3VL video preprocessing kwargs
tensorrt_llm/_torch/models/modeling_qwen3vl.py
Replaces conditional fps=None injection with unconditional do_sample_frames=False in proc_kwargs, forwarding all other mm_processor_kwargs except num_frames and fps.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • tcherckez-nvidia
  • MrGeva
  • galagam
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title follows the required [None][type] format and clearly summarizes the main Qwen3-VL preprocessing performance work.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description explains the issue, solution, and checklist well, though the Test Coverage section is left empty.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
tensorrt_llm/inputs/media_io.py (1)

50-55: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Annotate _media_load_run to preserve caller result types.

Every async media load now goes through this helper, but the untyped signature erases _MediaT to Any. Use ParamSpec/TypeVar so type checking follows load_bytes/load_base64/load_file.

Suggested typing shape
+from collections.abc import Callable
+from typing import ParamSpec
+
+_P = ParamSpec("_P")
+_R = TypeVar("_R")
+
-async def _media_load_run(fn, *args, **kwargs):
+async def _media_load_run(
+    fn: Callable[_P, _R], *args: _P.args, **kwargs: _P.kwargs
+) -> _R:

As per coding guidelines, “Always annotate functions. Make the return type None if the function does not return anything.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/inputs/media_io.py` around lines 50 - 55, The helper
`_media_load_run` is currently untyped, which causes the caller-specific media
result type to degrade to Any across `load_bytes`, `load_base64`, and
`load_file`. Update `_media_load_run` to use `ParamSpec` for the callable
arguments and a `TypeVar` for the return value so the async wrappers preserve
their concrete `_MediaT` result types. Keep the typing consistent with the
existing media-loading helpers in `media_io.py` and ensure the annotation
reflects that the function returns the awaited result from `run_in_executor`.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py`:
- Around line 100-103: The cache-key fallback in the key-building logic is
catching too broadly in the function that computes
`repr(sorted(kwargs.items()))`, which can mask unrelated bugs and violates the
repo’s exception-handling rule. Narrow the `except Exception` in that cache-key
construction path to only the specific failure(s) expected from sorting or repr
generation, and keep the fallback to `orig(*args, **kwargs)` only for those
concrete cases so other processor regressions still fail visibly.

In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py`:
- Around line 262-265: The Qwen3VL multimodal preprocessing in the `proc_kwargs`
merge still allows `mm_processor_kwargs` to override `do_sample_frames`, which
can re-enable HF resampling. Update the `modeling_qwen3vl` merge logic so
`do_sample_frames` remains fixed to False and is never copied from
`mm_processor_kwargs`, while still preserving the other allowed keys like
`num_frames` and `fps`.

In `@tensorrt_llm/inputs/media_io.py`:
- Around line 450-459: The frame return contract for the default format path is
inconsistent: `VideoData.frames` and `get_num_tokens_per_video` still assume PIL
images or torch tensors, but `media_io.py` now returns HWC uint8 ndarrays from
the `format="pt"` flow. Update the shared `VideoData` type/doc and any consumers
like `get_num_tokens_per_video` to recognize numpy frames (using shape-based
dimensions instead of `.height`/`.width`), or change the
`load_video`/frame-return path so `format="pt"` continues to yield tensors and
add a separate explicit numpy format.
- Around line 389-397: The fallback video-loading path that uses
tempfile.NamedTemporaryFile(delete=False) leaves behind temporary .mp4 files;
update the media I/O flow so the temp path is unlinked after all processing is
finished, not immediately after opening the VideoCapture. Make the cleanup
happen after any optional audio extraction that still depends on the video
filename, and apply the same fix in both fallback sites in media_io.py so the
byte-loaded video tempfiles are always removed.
- Around line 383-384: Narrow the fallback in the backend-probing block of
media_io.py: the broad Exception handler around the OpenCV API/registry probe is
masking unrelated bugs. Update the exception handling in that probe to only
catch the expected availability failures, specifically AttributeError and
cv2.error when backend queries fail, while keeping the tempfile fallback logic
unchanged.

In `@tensorrt_llm/serve/openai_server.py`:
- Around line 115-118: Validate TRTLLM_INPUT_PROC_WORKERS before constructing
_INPUT_PROC_EXECUTOR in openai_server.py so startup fails with a clear
operator-friendly message instead of crashing on zero, negative, or non-integer
values. Update the _INPUT_PROC_WORKERS initialization to reject invalid values
and ensure ThreadPoolExecutor only receives a positive worker count, keeping the
fix near the existing module-level executor setup.

---

Nitpick comments:
In `@tensorrt_llm/inputs/media_io.py`:
- Around line 50-55: The helper `_media_load_run` is currently untyped, which
causes the caller-specific media result type to degrade to Any across
`load_bytes`, `load_base64`, and `load_file`. Update `_media_load_run` to use
`ParamSpec` for the callable arguments and a `TypeVar` for the return value so
the async wrappers preserve their concrete `_MediaT` result types. Keep the
typing consistent with the existing media-loading helpers in `media_io.py` and
ensure the annotation reflects that the function returns the awaited result from
`run_in_executor`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 446e6a41-d8ed-4df8-9012-a296acd412b6

📥 Commits

Reviewing files that changed from the base of the PR and between eb0cbdb and 08b42ad.

📒 Files selected for processing (5)
  • tensorrt_llm/_torch/models/modeling_qwen2vl.py
  • tensorrt_llm/_torch/models/modeling_qwen3vl.py
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/inputs/media_io.py
  • tensorrt_llm/serve/openai_server.py

Comment thread tensorrt_llm/_torch/models/modeling_qwen2vl.py Outdated
Comment thread tensorrt_llm/_torch/models/modeling_qwen3vl.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
Comment thread tensorrt_llm/serve/openai_server.py Outdated
@aswinvisva aswinvisva force-pushed the avisva/cosmos3-vlm-preproc-perf branch from 08b42ad to 2b7db68 Compare June 24, 2026 16:22
Comment thread tensorrt_llm/_torch/models/modeling_qwen2vl.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py
Comment thread tensorrt_llm/inputs/media_io.py
Comment thread tensorrt_llm/_torch/models/modeling_qwen2vl.py Outdated
Comment thread tensorrt_llm/commands/serve.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py
@aswinvisva aswinvisva requested a review from a team as a code owner June 24, 2026 20:13
@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

Comment thread tensorrt_llm/commands/serve.py Outdated
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55605 [ run ] triggered by Bot. Commit: 6db51a6 Link to invocation

PAD_INDEX = -100 # NOTE: refer to https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py#L269


def _install_merge_kwargs_cache(processor) -> None:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-evaluating the perf benchmarks with everything included, this may be removable - it doesn't seem to make a huge difference when removing it. Will test in Jet and report back

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55618 [ run ] triggered by Bot. Commit: d9294c9 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55605 [ run ] completed with state ABORTED. Commit: 6db51a6

Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55618 [ run ] completed with state FAILURE. Commit: d9294c9
/LLM/main/L0_MergeRequest_PR pipeline #44535 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Comment thread tensorrt_llm/_torch/models/modeling_qwen2vl.py Outdated
Comment thread tensorrt_llm/_torch/models/modeling_qwen3vl.py Outdated
# already sampled to the target frame count, so letting a caller flip
# it back to True would re-enter the HF sample_frames branch on
# already-sampled input and break the contract this patch establishes.
proc_kwargs = {"do_sample_frames": False}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm...

  1. are we sure the sampling is the same in media IO vs in the processor?
  2. is it possible to gate this on whether the num_frames / fps are different than what we already sampled?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to the logic we discussed in person - let me know what you think

Comment thread tensorrt_llm/commands/serve.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
Comment thread tensorrt_llm/inputs/media_io.py Outdated
if _tmp_path is not None:
try:
os.unlink(_tmp_path)
except OSError:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't know how I feel about this. Could we move some of the logic in lines 426-439 to the caller VideoMediaIO.load_bytes? We could use a contextlib.ExitStack to conditionally push a tempfile.NamedTemporaryFile(..., delete=True) into it.

If not, I would still vote to use it here to rely on NamedTemporaryFile to do the cleanup, it's just a bit less readable in that we add one more indent level for the context manager.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to move this upstream and use context managers to handle the cleanup rather than doing it here

Comment thread tensorrt_llm/inputs/media_io.py Outdated
)


async def _media_load_run(fn, *args, **kwargs):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think _load_media might be better?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to run_in_executor since it's mostly just wrapping a blocking function as an async by offloading to the threadpoolexecutor - let me know what you think

Comment thread tensorrt_llm/serve/openai_server.py Outdated
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57213 [ run ] triggered by Bot. Commit: be3b69a Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57212 [ run ] completed with state ABORTED. Commit: be3b69a

Link to invocation

@aswinvisva aswinvisva added the api-compatible Accepted LLM API contract change that is backwards-compatible label Jul 2, 2026
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57213 [ run ] completed with state SUCCESS. Commit: be3b69a
/LLM/main/L0_MergeRequest_PR pipeline #45985 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57230 [ run ] triggered by Bot. Commit: be3b69a Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57230 [ run ] completed with state SUCCESS. Commit: be3b69a
/LLM/main/L0_MergeRequest_PR pipeline #45997 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57253 [ run ] triggered by Bot. Commit: be3b69a Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57253 [ run ] completed with state FAILURE. Commit: be3b69a
/LLM/main/L0_MergeRequest_PR pipeline #46020 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

3 similar comments
@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57266 [ run ] triggered by Bot. Commit: be3b69a Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57266 [ run ] completed with state FAILURE. Commit: be3b69a
/LLM/main/L0_MergeRequest_PR pipeline #46029 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

5 similar comments
@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57274 [ run ] triggered by Bot. Commit: be3b69a Link to invocation

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57274 [ run ] completed with state FAILURE. Commit: be3b69a
/LLM/main/L0_MergeRequest_PR pipeline #46036 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@aswinvisva

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57387 [ run ] triggered by Bot. Commit: be3b69a Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57388 [ run ] triggered by Bot. Commit: be3b69a Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57387 [ run ] completed with state ABORTED. Commit: be3b69a

Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57388 [ run ] completed with state FAILURE. Commit: be3b69a
/LLM/main/L0_MergeRequest_PR pipeline #46136 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-compatible Accepted LLM API contract change that is backwards-compatible

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants