cp: [perf] fix: guard cuda_graph_scope validation against None (3249) into r0.4.0#3262
cp: [perf] fix: guard cuda_graph_scope validation against None (3249) into r0.4.0#3262
[perf] fix: guard cuda_graph_scope validation against None (3249) into r0.4.0#3262Conversation
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
|
/ok to test 0d4dddc |
📝 WalkthroughWalkthroughMoved CUDA graph scope validation logic from conditional CLI-argument handling to post-override validation phase, ensuring invalid scope values in the effective recipe configuration are caught regardless of whether CLI arguments are provided. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/performance/utils/overrides.py`:
- Around line 119-126: The TE scope validation in the overrides.py block uses
effective_scope = getattr(recipe.model, "cuda_graph_scope", None) but doesn't
normalize it, causing strings or enum values to be iterated per-character;
update the validation to normalize effective_scope into a list of simple strings
before checking against valid_te_scopes: if effective_scope is a str convert to
[effective_scope], if items are enum-like objects convert each to its .value or
str representation, then assert the normalized list is non-empty and that every
element is in valid_te_scopes (use the existing valid_te_scopes list and the
same assert message).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2c36def4-8a59-40dd-8a29-16b4081fe969
📒 Files selected for processing (1)
scripts/performance/utils/overrides.py
| # Validate post-override state so we check the effective recipe value, | ||
| # not the raw CLI input (which may be None when the recipe default is used). | ||
| if recipe.model.cuda_graph_impl == "transformer_engine": | ||
| valid_te_scopes = ["attn", "mlp", "moe", "moe_router", "moe_preprocess", "mamba"] | ||
| effective_scope = getattr(recipe.model, "cuda_graph_scope", None) | ||
| assert effective_scope is not None and all(scope in valid_te_scopes for scope in effective_scope), ( | ||
| f"Invalid cuda graph scope: {effective_scope}. Valid options are: {valid_te_scopes}" | ||
| ) |
There was a problem hiding this comment.
Normalize effective_scope before TE scope validation to avoid false failures.
At Line 124, validation assumes effective_scope is a list of plain strings. But cuda_graph_scope is consumed elsewhere as str | list | single value and may contain enum-like entries, so valid configs can fail here (e.g., string input iterates per-character).
💡 Proposed fix
if recipe.model.cuda_graph_impl == "transformer_engine":
valid_te_scopes = ["attn", "mlp", "moe", "moe_router", "moe_preprocess", "mamba"]
effective_scope = getattr(recipe.model, "cuda_graph_scope", None)
- assert effective_scope is not None and all(scope in valid_te_scopes for scope in effective_scope), (
- f"Invalid cuda graph scope: {effective_scope}. Valid options are: {valid_te_scopes}"
- )
+ assert effective_scope is not None, "cuda_graph_scope must not be None when cuda_graph_impl=transformer_engine"
+ if isinstance(effective_scope, str):
+ effective_scope = effective_scope.split(",") if effective_scope else []
+ elif not isinstance(effective_scope, list):
+ effective_scope = [effective_scope]
+
+ normalized_scope = []
+ for scope in effective_scope:
+ scope_name = getattr(scope, "value", str(scope))
+ if scope_name.startswith("CudaGraphScope."):
+ scope_name = scope_name.split(".", 1)[1]
+ normalized_scope.append(scope_name)
+
+ assert all(scope in valid_te_scopes for scope in normalized_scope), (
+ f"Invalid cuda graph scope: {effective_scope}. Valid options are: {valid_te_scopes}"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Validate post-override state so we check the effective recipe value, | |
| # not the raw CLI input (which may be None when the recipe default is used). | |
| if recipe.model.cuda_graph_impl == "transformer_engine": | |
| valid_te_scopes = ["attn", "mlp", "moe", "moe_router", "moe_preprocess", "mamba"] | |
| effective_scope = getattr(recipe.model, "cuda_graph_scope", None) | |
| assert effective_scope is not None and all(scope in valid_te_scopes for scope in effective_scope), ( | |
| f"Invalid cuda graph scope: {effective_scope}. Valid options are: {valid_te_scopes}" | |
| ) | |
| # Validate post-override state so we check the effective recipe value, | |
| # not the raw CLI input (which may be None when the recipe default is used). | |
| if recipe.model.cuda_graph_impl == "transformer_engine": | |
| valid_te_scopes = ["attn", "mlp", "moe", "moe_router", "moe_preprocess", "mamba"] | |
| effective_scope = getattr(recipe.model, "cuda_graph_scope", None) | |
| assert effective_scope is not None, "cuda_graph_scope must not be None when cuda_graph_impl=transformer_engine" | |
| if isinstance(effective_scope, str): | |
| effective_scope = effective_scope.split(",") if effective_scope else [] | |
| elif not isinstance(effective_scope, list): | |
| effective_scope = [effective_scope] | |
| normalized_scope = [] | |
| for scope in effective_scope: | |
| scope_name = getattr(scope, "value", str(scope)) | |
| if scope_name.startswith("CudaGraphScope."): | |
| scope_name = scope_name.split(".", 1)[1] | |
| normalized_scope.append(scope_name) | |
| assert all(scope in valid_te_scopes for scope in normalized_scope), ( | |
| f"Invalid cuda graph scope: {effective_scope}. Valid options are: {valid_te_scopes}" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/performance/utils/overrides.py` around lines 119 - 126, The TE scope
validation in the overrides.py block uses effective_scope =
getattr(recipe.model, "cuda_graph_scope", None) but doesn't normalize it,
causing strings or enum values to be iterated per-character; update the
validation to normalize effective_scope into a list of simple strings before
checking against valid_te_scopes: if effective_scope is a str convert to
[effective_scope], if items are enum-like objects convert each to its .value or
str representation, then assert the normalized list is non-empty and that every
element is in valid_te_scopes (use the existing valid_te_scopes list and the
same assert message).
|
will merge for the patch |
beep boop [🤖]: Hi @yaoyu-33 👋,
Summary by CodeRabbit