[For RL] Keep attrs after folding weight and fix empty extra state for Megatron#779
[For RL] Keep attrs after folding weight and fix empty extra state for Megatron#779
Conversation
Signed-off-by: Meng Xin <mxin@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds an optional Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #779 +/- ##
==========================================
- Coverage 72.16% 72.08% -0.08%
==========================================
Files 210 210
Lines 23522 23545 +23
==========================================
- Hits 16974 16973 -1
- Misses 6548 6572 +24 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelopt/torch/quantization/plugins/vllm.py (1)
231-248:⚠️ Potential issue | 🟠 Major
keep_attrsis accepted but ignored in vLLM fold path.Line 231 adds
keep_attrs, but the method never branches on it. This makes the new flag ineffective for these modules and inconsistent with the base fold behavior.💡 Proposed fix
`@torch.no_grad`() def fold_weight(self, keep_attrs: bool = False): @@ self.w13_weight_quantizer.disable() @@ self.w2_weight_quantizer.disable() + + if not keep_attrs: + for quantizer in (self.w13_weight_quantizer, self.w2_weight_quantizer): + for attr_name in ("_pre_quant_scale", "_amax"): + if hasattr(quantizer, attr_name): + delattr(quantizer, attr_name) torch.cuda.empty_cache()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/plugins/vllm.py` around lines 231 - 248, The fold_weight method currently ignores the keep_attrs flag; update fold_weight in vLLM plugin to honor keep_attrs by performing folding loop as is but only disabling and removing quantizer attributes when keep_attrs is False: after folding each weight (self.w13_weight and self.w2_weight using self.w13_weight_quantizer and self.w2_weight_quantizer), if not keep_attrs call .disable() and then delete or set to None the corresponding quantizer attributes (self.w13_weight_quantizer, self.w2_weight_quantizer) so the behavior matches the base fold_weight semantics; if keep_attrs is True, do not disable or remove those attributes. Ensure torch.cuda.empty_cache() remains after folding.modelopt/torch/utils/dataset_utils.py (1)
165-176:⚠️ Potential issue | 🟠 Major
.jsonl.gzis documented but not actually supported.Line 165 documents
.jsonl.gz, but Line 175 only checks.jsonl. Compressed JSONL inputs will bypass the new code path and fail as unsupported dataset names.💡 Proposed fix
@@ import copy +import gzip import json @@ -def _get_jsonl_text_samples(jsonl_path: str, num_samples: int) -> list[str]: +def _get_jsonl_text_samples(jsonl_path: str, num_samples: int) -> list[str]: @@ - with open(jsonl_path, encoding="utf-8") as f: + open_fn = gzip.open if jsonl_path.endswith(".gz") else open + with open_fn(jsonl_path, mode="rt", encoding="utf-8") as f: @@ - if dataset_name.endswith(".jsonl"): + if dataset_name.endswith((".jsonl", ".jsonl.gz")): return _get_jsonl_text_samples(dataset_name, num_samples)Also applies to: 261-263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/utils/dataset_utils.py` around lines 165 - 176, The code documents support for ".jsonl.gz" but only checks for ".jsonl"; update the dataset_name checks (the condition that now calls _get_jsonl_text_samples) to also detect ".jsonl.gz" (e.g., endswith(".jsonl") or endswith(".jsonl.gz")) and ensure _get_jsonl_text_samples can read gzipped files (or add a gzip-aware wrapper) so compressed JSONL is handled; apply the same fix to the other occurrence that currently only checks ".jsonl" (the block around where the dataset_name is re-evaluated later).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modelopt/torch/quantization/plugins/vllm.py`:
- Around line 231-248: The fold_weight method currently ignores the keep_attrs
flag; update fold_weight in vLLM plugin to honor keep_attrs by performing
folding loop as is but only disabling and removing quantizer attributes when
keep_attrs is False: after folding each weight (self.w13_weight and
self.w2_weight using self.w13_weight_quantizer and self.w2_weight_quantizer), if
not keep_attrs call .disable() and then delete or set to None the corresponding
quantizer attributes (self.w13_weight_quantizer, self.w2_weight_quantizer) so
the behavior matches the base fold_weight semantics; if keep_attrs is True, do
not disable or remove those attributes. Ensure torch.cuda.empty_cache() remains
after folding.
In `@modelopt/torch/utils/dataset_utils.py`:
- Around line 165-176: The code documents support for ".jsonl.gz" but only
checks for ".jsonl"; update the dataset_name checks (the condition that now
calls _get_jsonl_text_samples) to also detect ".jsonl.gz" (e.g.,
endswith(".jsonl") or endswith(".jsonl.gz")) and ensure _get_jsonl_text_samples
can read gzipped files (or add a gzip-aware wrapper) so compressed JSONL is
handled; apply the same fix to the other occurrence that currently only checks
".jsonl" (the block around where the dataset_name is re-evaluated later).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
modelopt/torch/opt/plugins/megatron.pymodelopt/torch/quantization/model_quant.pymodelopt/torch/quantization/nn/modules/quant_linear.pymodelopt/torch/quantization/nn/modules/quant_module.pymodelopt/torch/quantization/plugins/huggingface.pymodelopt/torch/quantization/plugins/vllm.pymodelopt/torch/utils/dataset_utils.pymodelopt/torch/utils/plugins/megatron_generate.py
Signed-off-by: Meng Xin <mxin@nvidia.com>
What does this PR do?
Type of change: improvement
Overview:
Usage
mtq.fold_weight(keep_attrs=True)will keep quantizer attrs after folding weight,Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Bug Fixes