feat(pt): add warmup_ratio setting to set warmup steps conveniently#5134
feat(pt): add warmup_ratio setting to set warmup steps conveniently#5134njzjz merged 3 commits intodeepmodeling:masterfrom
Conversation
📝 WalkthroughWalkthroughAdded two warm-up options: Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @deepmd/pt/train/training.py:
- Around line 418-426: The warmup_ratio is used to compute self.warmup_steps
without validation and with int() truncation; add a validation step: if
warmup_ratio is provided ensure 0.0 <= warmup_ratio < 1.0 and raise a clear
ValueError mentioning warmup_ratio when out of range, then compute
self.warmup_steps using round(warmup_ratio * self.num_steps) (or keep int() but
add a comment if truncation is intentional) and if round(...) == 0 while
warmup_ratio > 0, set self.warmup_steps = 1 to avoid producing zero warmup
steps; update the error message used by the existing assertion (the warmup_steps
check) to include warmup_ratio when raising.
🧹 Nitpick comments (1)
deepmd/utils/argcheck.py (1)
3344-3349: Consider adding range validation forwarmup_ratioin the argument schema.The
warmup_ratioparameter lacks explicit range constraints. Based on the implementation intraining.py, valid values should be in the range[0, 1)to ensure the computedwarmup_stepsdoesn't exceednum_steps. Adding validation here would provide clearer error messages during configuration validation rather than at runtime.♻️ Suggested improvement with validation
While the
dargsArgumentclass may not directly support min/max validation for float types, you could document the expected range more explicitly and consider adding a validation function:Argument( "warmup_ratio", float, optional=True, - doc=doc_only_pt_supported + doc_warmup_ratio, + doc=doc_only_pt_supported + doc_warmup_ratio + " Valid range: [0, 1).", ),Alternatively, if custom validation is supported, add explicit bounds checking to fail fast during configuration parsing.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/pt/train/training.pydeepmd/utils/argcheck.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: CodeQL analysis (python)
- GitHub Check: Agent
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
There was a problem hiding this comment.
Pull request overview
This PR adds a convenient warmup_ratio parameter for PyTorch training that allows users to specify warmup steps as a ratio of total training steps, rather than an absolute number.
Key changes:
- Added
warmup_ratioconfiguration parameter that calculates warmup steps as a fraction ofnumb_steps - Implemented precedence logic where
warmup_stepstakes priority overwarmup_ratiowhen both are specified
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| deepmd/utils/argcheck.py | Adds documentation and argument definition for the new warmup_ratio parameter, marked as PyTorch-only |
| deepmd/pt/train/training.py | Implements the warmup_ratio logic by calculating warmup_steps = int(warmup_ratio * num_steps) when warmup_steps is not explicitly set |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deepmd/pt/train/training.py (2)
424-425: Consider clarifying why the upper bound is exclusive.The validation correctly enforces
warmup_ratio < 1, but the error message doesn't explain why 1.0 is excluded. Users might wonder why the full range isn't[0, 1].📝 Suggested improvement
if not 0 <= warmup_ratio < 1: - raise ValueError(f"warmup_ratio must be in [0, 1), got {warmup_ratio}") + raise ValueError( + f"warmup_ratio must be in [0, 1) to leave steps for training, got {warmup_ratio}" + )This also addresses the Ruff TRY003 style hint by breaking the message into multiple lines.
435-435: Add validation forwarmup_start_factorto prevent unexpected behavior.The
warmup_start_factorparameter lacks range validation. While the most common use case is[0, 1](starting from 0% to 100% of the target learning rate), negative or > 1 values could cause unexpected learning rate schedules.🛡️ Proposed validation
Add validation after line 435:
self.warmup_start_factor = training_params.get("warmup_start_factor", 0.0) if not 0 <= self.warmup_start_factor <= 1: log.warning( f"warmup_start_factor is typically in [0, 1], got {self.warmup_start_factor}. " f"This may result in unusual learning rate schedules." )Alternatively, if values outside
[0, 1]should be disallowed, useraise ValueErrorinstead oflog.warning.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/pt/train/training.pydeepmd/utils/argcheck.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/utils/argcheck.py
🧰 Additional context used
🪛 Ruff (0.14.10)
deepmd/pt/train/training.py
425-425: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
- GitHub Check: Test Python (9, 3.10)
- GitHub Check: Test Python (10, 3.10)
- GitHub Check: Test Python (12, 3.13)
- GitHub Check: Test Python (9, 3.13)
- GitHub Check: Test Python (11, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (10, 3.13)
- GitHub Check: Test Python (7, 3.13)
- GitHub Check: Test Python (8, 3.13)
- GitHub Check: Test Python (8, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (7, 3.10)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (2)
deepmd/pt/train/training.py (2)
419-434: LGTM! Warmup configuration logic is well-structured.The precedence (explicit
warmup_steps→warmup_ratio→ default 0) is clear and correctly implements the feature. The validation and warning for truncation are helpful for users.
685-691: LGTM! Linear warmup implementation is mathematically correct.The modified warmup schedule correctly interpolates the learning rate multiplier from
warmup_start_factorto1.0overwarmup_stepssteps. The division by zero is protected by the conditional check, and the formula produces the expected linear ramp.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5134 +/- ##
==========================================
- Coverage 82.15% 81.93% -0.22%
==========================================
Files 709 712 +3
Lines 72468 72900 +432
Branches 3616 3617 +1
==========================================
+ Hits 59535 59733 +198
- Misses 11769 12003 +234
Partials 1164 1164 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.