Skip to content

[None][perf] set ncclConfig graphUsageMode=1 on communicator init#13511

Open
nv-lschneider wants to merge 2 commits into
NVIDIA:mainfrom
nv-lschneider:lschneider/nccl-graph-usage-mode
Open

[None][perf] set ncclConfig graphUsageMode=1 on communicator init#13511
nv-lschneider wants to merge 2 commits into
NVIDIA:mainfrom
nv-lschneider:lschneider/nccl-graph-usage-mode

Conversation

@nv-lschneider

@nv-lschneider nv-lschneider commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • Refactor
    • Improved NCCL communicator initialization with enhanced configuration parameters.

Description

This sets the NCCL graph mixing support to a single graph.
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html#c.ncclConfig_t.graphUsageMode
This eliminates an overhead before NCCL calls within captured graphs.

The risk is that this shifts the responsibility to not overlap NCCL operations within graphs, with either NCCL operations in other graphs or uncaptured calls.
From what I can see it is safe to do so today, but there is a risk to shift to this behavior.
Careful testing will be necessary.

Test Coverage

All kinds of parallel usage modes should be tested.
Starting with CI.

PR Checklist

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@nv-lschneider nv-lschneider force-pushed the lschneider/nccl-graph-usage-mode branch from 25e1717 to be0f60d Compare April 27, 2026 15:24
@nv-lschneider

Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu --disable-fail-fast

@coderabbitai

coderabbitai Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

NCCL communicator initialization in the common utilities is updated to use the ncclCommInitRankConfig API instead of ncclCommInitRank. A configuration object with graph usage mode enabled is created and passed to the initialization call, modifying the communicator setup parameters without changing surrounding logic.

Changes

Cohort / File(s) Summary
NCCL Communicator Configuration
cpp/tensorrt_llm/common/opUtils.cpp
Updated getComm to initialize NCCL communicators using ncclCommInitRankConfig with explicit configuration enabling graphUsageMode = 1, replacing the previous ncclCommInitRank call.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description explains the change and its intent clearly, but is missing the required ticket/issue format in the title and incomplete sections. Add proper ticket format to title (e.g., [TRTLLM-XXXX], [#XXXX], or [https://nvbugs/XXXXXXX]). Complete missing sections: PR title with type indicator, and verify test coverage details are sufficient.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: setting ncclConfig graphUsageMode=1 during communicator initialization for performance improvement.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/tensorrt_llm/common/opUtils.cpp (1)

2-2: ⚠️ Potential issue | 🟠 Major

Update the NVIDIA copyright year in this modified file.

This file is modified in this PR, but the header still ends at 2024. Please update it to include 2026.

Proposed fix
- * SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines, All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification and Include NVIDIA copyright header on all new files; update year on modified files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/common/opUtils.cpp` at line 2, Update the NVIDIA copyright
header in cpp/tensorrt_llm/common/opUtils.cpp to reflect the latest modification
year by changing the trailing year range that currently ends with "2024" so it
includes "2026" (i.e., update the header comment line containing "NVIDIA
CORPORATION & AFFILIATES. All rights reserved." to show 2026 as the latest
year).
🧹 Nitpick comments (1)
cpp/tensorrt_llm/common/opUtils.cpp (1)

166-166: Consider inline parameter comments for this NCCL call.

The argument list is non-obvious; inline /*paramName=*/ comments would make this call easier to audit and maintain.

Proposed refactor
-    NCCLCHECK_THROW(ncclCommInitRankConfig(ncclComm.get(), group.size(), id, groupRank, &config));
+    NCCLCHECK_THROW(ncclCommInitRankConfig(
+        /*comm=*/ncclComm.get(), /*nranks=*/group.size(), /*commId=*/id, /*myrank=*/groupRank, /*config=*/&config));

As per coding guidelines, In C++ function calls with non-obvious parameters, use inline C comments with the format /*paramName=*/ to document parameters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/common/opUtils.cpp` at line 166, The NCCL call
NCCLCHECK_THROW(ncclCommInitRankConfig(ncclComm.get(), group.size(), id,
groupRank, &config)); has non-obvious positional arguments; update this call to
use inline parameter comments for clarity by annotating each argument with the
parameter name using the /*paramName=*/ style (e.g., /*comm=*/ ncclComm.get(),
/*nranks=*/ group.size(), /*uid=*/ id, /*rank=*/ groupRank, /*config=*/ &config)
so reviewers can immediately see what each value maps to in
ncclCommInitRankConfig.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tensorrt_llm/common/opUtils.cpp`:
- Around line 164-165: Replace the magic literal 1 assigned to
config.graphUsageMode with a named C++ constant: add a file-local or header
constexpr int kGraphUsageMode = 1 (use camelCase with k prefix) and then set
config.graphUsageMode = kGraphUsageMode; update any nearby comments to reflect
the meaning if needed and use the same kGraphUsageMode symbol to make intent
explicit in ncclConfig_t config initialization code (refer to
config.graphUsageMode and NCCL_CONFIG_INITIALIZER).

---

Outside diff comments:
In `@cpp/tensorrt_llm/common/opUtils.cpp`:
- Line 2: Update the NVIDIA copyright header in
cpp/tensorrt_llm/common/opUtils.cpp to reflect the latest modification year by
changing the trailing year range that currently ends with "2024" so it includes
"2026" (i.e., update the header comment line containing "NVIDIA CORPORATION &
AFFILIATES. All rights reserved." to show 2026 as the latest year).

---

Nitpick comments:
In `@cpp/tensorrt_llm/common/opUtils.cpp`:
- Line 166: The NCCL call NCCLCHECK_THROW(ncclCommInitRankConfig(ncclComm.get(),
group.size(), id, groupRank, &config)); has non-obvious positional arguments;
update this call to use inline parameter comments for clarity by annotating each
argument with the parameter name using the /*paramName=*/ style (e.g., /*comm=*/
ncclComm.get(), /*nranks=*/ group.size(), /*uid=*/ id, /*rank=*/ groupRank,
/*config=*/ &config) so reviewers can immediately see what each value maps to in
ncclCommInitRankConfig.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: df33c482-8b74-4693-b35c-cef7af0d6132

📥 Commits

Reviewing files that changed from the base of the PR and between 450122e and be0f60d.

📒 Files selected for processing (1)
  • cpp/tensorrt_llm/common/opUtils.cpp

Comment thread cpp/tensorrt_llm/common/opUtils.cpp
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #45743 [ run ] triggered by Bot. Commit: be0f60d Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #45743 [ run ] completed with state ABORTED. Commit: be0f60d

Link to invocation

@nv-lschneider

Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #45976 [ run ] triggered by Bot. Commit: be0f60d Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #45976 [ run ] completed with state ABORTED. Commit: be0f60d

Link to invocation

@nv-lschneider nv-lschneider force-pushed the lschneider/nccl-graph-usage-mode branch from be0f60d to c501cb2 Compare June 23, 2026 20:56
@nv-lschneider

Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55329 [ run ] triggered by Bot. Commit: c501cb2 Link to invocation

@nv-lschneider nv-lschneider requested a review from Tabrizian June 23, 2026 21:24
@nv-lschneider

Copy link
Copy Markdown
Collaborator Author

We might want to change this:
./docs/source/legacy/advanced/disaggregated-service.md:* NCCL_GRAPH_MIXING_SUPPORT: With the default value 1, the CUDA driver may create too many CUDA streams while working with one CUDA graph, leading to performance drop. Setting it to 0 will reduce the number of CUDA streams, but please make sure there are no other NCCL ops outside the one CUDA graph, otherwise it's unsafe.

I can do that tomorrow.
The statement is actually also only half true.

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55329 [ run ] completed with state FAILURE. Commit: c501cb2
/LLM/main/L0_MergeRequest_PR pipeline #44280 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@nv-lschneider nv-lschneider force-pushed the lschneider/nccl-graph-usage-mode branch from c501cb2 to 08e84e5 Compare June 24, 2026 18:05
@nv-lschneider nv-lschneider requested a review from a team as a code owner June 24, 2026 18:05
@nv-lschneider nv-lschneider requested review from QiJune and arysef June 24, 2026 18:05
@nv-lschneider

Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu --disable-fail-fast

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
@nv-lschneider nv-lschneider force-pushed the lschneider/nccl-graph-usage-mode branch from 08e84e5 to c2afce6 Compare June 24, 2026 20:10
@nv-lschneider

Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55575 [ run ] triggered by Bot. Commit: c2afce6 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55575 [ run ] completed with state FAILURE. Commit: c2afce6
/LLM/main/L0_MergeRequest_PR pipeline #44493 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants