[None][perf] set ncclConfig graphUsageMode=1 on communicator init by nv-lschneider · Pull Request #13511 · NVIDIA/TensorRT-LLM

nv-lschneider · 2026-04-27T15:24:29Z

Summary by CodeRabbit

Refactor
- Improved NCCL communicator initialization with enhanced configuration parameters.

Description

This sets the NCCL graph mixing support to a single graph.
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html#c.ncclConfig_t.graphUsageMode
This eliminates an overhead before NCCL calls within captured graphs.

The risk is that this shifts the responsibility to not overlap NCCL operations within graphs, with either NCCL operations in other graphs or uncaptured calls.
From what I can see it is safe to do so today, but there is a risk to shift to this behavior.
Careful testing will be necessary.

Test Coverage

All kinds of parallel usage modes should be tested.
Starting with CI.

PR Checklist

Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

nv-lschneider · 2026-04-27T15:25:34Z

/bot run --add-multi-gpu --disable-fail-fast

coderabbitai · 2026-04-27T15:28:26Z

📝 Walkthrough

Walkthrough

NCCL communicator initialization in the common utilities is updated to use the ncclCommInitRankConfig API instead of ncclCommInitRank. A configuration object with graph usage mode enabled is created and passed to the initialization call, modifying the communicator setup parameters without changing surrounding logic.

Changes

Cohort / File(s)	Summary
NCCL Communicator Configuration `cpp/tensorrt_llm/common/opUtils.cpp`	Updated `getComm` to initialize NCCL communicators using `ncclCommInitRankConfig` with explicit configuration enabling `graphUsageMode = 1`, replacing the previous `ncclCommInitRank` call.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description explains the change and its intent clearly, but is missing the required ticket/issue format in the title and incomplete sections.	Add proper ticket format to title (e.g., [TRTLLM-XXXX], [`#XXXX`], or [https://nvbugs/XXXXXXX]). Complete missing sections: PR title with type indicator, and verify test coverage details are sufficient.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: setting ncclConfig graphUsageMode=1 during communicator initialization for performance improvement.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tensorrt_llm/common/opUtils.cpp (1)
2-2: ⚠️ Potential issue | 🟠 Major

Update the NVIDIA copyright year in this modified file.

This file is modified in this PR, but the header still ends at 2024. Please update it to include 2026.
Proposed fix
- * SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines, All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification and Include NVIDIA copyright header on all new files; update year on modified files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/common/opUtils.cpp` at line 2, Update the NVIDIA copyright
header in cpp/tensorrt_llm/common/opUtils.cpp to reflect the latest modification
year by changing the trailing year range that currently ends with "2024" so it
includes "2026" (i.e., update the header comment line containing "NVIDIA
CORPORATION & AFFILIATES. All rights reserved." to show 2026 as the latest
year).

🧹 Nitpick comments (1)

cpp/tensorrt_llm/common/opUtils.cpp (1)
166-166: Consider inline parameter comments for this NCCL call.

The argument list is non-obvious; inline /*paramName=*/ comments would make this call easier to audit and maintain.
Proposed refactor
-    NCCLCHECK_THROW(ncclCommInitRankConfig(ncclComm.get(), group.size(), id, groupRank, &config));
+    NCCLCHECK_THROW(ncclCommInitRankConfig(
+        /*comm=*/ncclComm.get(), /*nranks=*/group.size(), /*commId=*/id, /*myrank=*/groupRank, /*config=*/&config));
As per coding guidelines, In C++ function calls with non-obvious parameters, use inline C comments with the format /*paramName=*/ to document parameters.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/common/opUtils.cpp` at line 166, The NCCL call
NCCLCHECK_THROW(ncclCommInitRankConfig(ncclComm.get(), group.size(), id,
groupRank, &config)); has non-obvious positional arguments; update this call to
use inline parameter comments for clarity by annotating each argument with the
parameter name using the /*paramName=*/ style (e.g., /*comm=*/ ncclComm.get(),
/*nranks=*/ group.size(), /*uid=*/ id, /*rank=*/ groupRank, /*config=*/ &config)
so reviewers can immediately see what each value maps to in
ncclCommInitRankConfig.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tensorrt_llm/common/opUtils.cpp`:
- Around line 164-165: Replace the magic literal 1 assigned to
config.graphUsageMode with a named C++ constant: add a file-local or header
constexpr int kGraphUsageMode = 1 (use camelCase with k prefix) and then set
config.graphUsageMode = kGraphUsageMode; update any nearby comments to reflect
the meaning if needed and use the same kGraphUsageMode symbol to make intent
explicit in ncclConfig_t config initialization code (refer to
config.graphUsageMode and NCCL_CONFIG_INITIALIZER).

---

Outside diff comments:
In `@cpp/tensorrt_llm/common/opUtils.cpp`:
- Line 2: Update the NVIDIA copyright header in
cpp/tensorrt_llm/common/opUtils.cpp to reflect the latest modification year by
changing the trailing year range that currently ends with "2024" so it includes
"2026" (i.e., update the header comment line containing "NVIDIA CORPORATION &
AFFILIATES. All rights reserved." to show 2026 as the latest year).

---

Nitpick comments:
In `@cpp/tensorrt_llm/common/opUtils.cpp`:
- Line 166: The NCCL call NCCLCHECK_THROW(ncclCommInitRankConfig(ncclComm.get(),
group.size(), id, groupRank, &config)); has non-obvious positional arguments;
update this call to use inline parameter comments for clarity by annotating each
argument with the parameter name using the /*paramName=*/ style (e.g., /*comm=*/
ncclComm.get(), /*nranks=*/ group.size(), /*uid=*/ id, /*rank=*/ groupRank,
/*config=*/ &config) so reviewers can immediately see what each value maps to in
ncclCommInitRankConfig.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: df33c482-8b74-4693-b35c-cef7af0d6132

📥 Commits

Reviewing files that changed from the base of the PR and between 450122e and be0f60d.

📒 Files selected for processing (1)

cpp/tensorrt_llm/common/opUtils.cpp

tensorrt-cicd · 2026-04-27T15:33:04Z

PR_Github #45743 [ run ] triggered by Bot. Commit: be0f60d Link to invocation

tensorrt-cicd · 2026-04-28T15:33:52Z

PR_Github #45743 [ run ] completed with state ABORTED. Commit: be0f60d

Link to invocation

nv-lschneider · 2026-04-28T18:45:04Z

/bot run --add-multi-gpu --disable-fail-fast

tensorrt-cicd · 2026-04-28T18:51:26Z

PR_Github #45976 [ run ] triggered by Bot. Commit: be0f60d Link to invocation

tensorrt-cicd · 2026-04-29T18:52:16Z

PR_Github #45976 [ run ] completed with state ABORTED. Commit: be0f60d

Link to invocation

nv-lschneider · 2026-06-23T21:06:31Z

/bot run --add-multi-gpu --disable-fail-fast

tensorrt-cicd · 2026-06-23T21:13:31Z

PR_Github #55329 [ run ] triggered by Bot. Commit: c501cb2 Link to invocation

nv-lschneider · 2026-06-23T22:19:12Z

We might want to change this:
./docs/source/legacy/advanced/disaggregated-service.md:* NCCL_GRAPH_MIXING_SUPPORT: With the default value 1, the CUDA driver may create too many CUDA streams while working with one CUDA graph, leading to performance drop. Setting it to 0 will reduce the number of CUDA streams, but please make sure there are no other NCCL ops outside the one CUDA graph, otherwise it's unsafe.

I can do that tomorrow.
The statement is actually also only half true.

tensorrt-cicd · 2026-06-24T04:54:34Z

PR_Github #55329 [ run ] completed with state FAILURE. Commit: c501cb2
/LLM/main/L0_MergeRequest_PR pipeline #44280 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

nv-lschneider · 2026-06-24T18:07:15Z

/bot run --add-multi-gpu --disable-fail-fast

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>

nv-lschneider · 2026-06-24T20:10:36Z

/bot run --add-multi-gpu --disable-fail-fast

tensorrt-cicd · 2026-06-24T20:17:08Z

PR_Github #55575 [ run ] triggered by Bot. Commit: c2afce6 Link to invocation

tensorrt-cicd · 2026-06-25T03:35:40Z

PR_Github #55575 [ run ] completed with state FAILURE. Commit: c2afce6
/LLM/main/L0_MergeRequest_PR pipeline #44493 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned nv-lschneider Apr 27, 2026

nv-lschneider force-pushed the lschneider/nccl-graph-usage-mode branch from 25e1717 to be0f60d Compare April 27, 2026 15:24

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread cpp/tensorrt_llm/common/opUtils.cpp

nv-lschneider force-pushed the lschneider/nccl-graph-usage-mode branch from be0f60d to c501cb2 Compare June 23, 2026 20:56

nv-lschneider requested a review from Tabrizian June 23, 2026 21:24

Tabrizian approved these changes Jun 23, 2026

View reviewed changes

nv-lschneider force-pushed the lschneider/nccl-graph-usage-mode branch from c501cb2 to 08e84e5 Compare June 24, 2026 18:05

nv-lschneider requested a review from a team as a code owner June 24, 2026 18:05

nv-lschneider requested review from QiJune and arysef June 24, 2026 18:05

nv-lschneider added 2 commits June 24, 2026 15:10

[None][perf] set ncclConfig graphUsageMode=1 on communicator init

7adbe8c

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>

[None][doc] document NCCL graph mixing default

c2afce6

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>

nv-lschneider force-pushed the lschneider/nccl-graph-usage-mode branch from 08e84e5 to c2afce6 Compare June 24, 2026 20:10

Uh oh!

Conversation

nv-lschneider commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

nv-lschneider commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

nv-lschneider commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

nv-lschneider commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

nv-lschneider commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 24, 2026

Uh oh!

nv-lschneider commented Jun 24, 2026

Uh oh!

nv-lschneider commented Jun 24, 2026

Uh oh!

tensorrt-cicd commented Jun 24, 2026

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nv-lschneider commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading