Unify scalar optimizers.#210
Conversation
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
|
@greptile |
Greptile SummaryThis PR consolidates four previously independent scalar optimizer classes (Lion, LaProp, Signum, SimplifiedAdEMAMix) behind a shared
Confidence Score: 5/5Safe to merge — the refactoring is a faithful mechanical consolidation with no logic changes to any optimizer's update math. The step loop, state initialization, and validation were mechanically lifted into the shared base, and the diff confirms the behaviour is preserved: same update equation ordering (pre-norm capture → weight decay → update fn → post-norm rescale for frob_normalize), same error messages, same defaults. The only net-new behaviour is that Lion now tracks a step counter it ignores, which is intentional and documented. The mixin-based test consolidation preserves all meaningful assertions from the deleted files and adds cross-optimizer smoke coverage for Signum and SimplifiedAdEMAMix. No files require special attention. Important Files Changed
Reviews (3): Last reviewed commit: "add one more test and adjust logging lev..." | Re-trigger Greptile |
|
/ok to test 552d23b |
Test Results 50 files - 4 120 suites +4 1m 28s ⏱️ +5s Results for commit fa638f5. ± Comparison against base commit 3e86f89. This pull request removes 31 and adds 55 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
Signed-off-by: Hao Wu <skyw@nvidia.com>
|
/ok to test 753808b |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Signed-off-by: Hao Wu <skyw@nvidia.com>
|
/ok to test fa638f5 |
| state = self.state[p] | ||
| if len(state) == 0: | ||
| for key in self.state_keys: | ||
| state[key] = torch.zeros_like(p.data) |
There was a problem hiding this comment.
maybe add warning that states are always the same shape as p?
many optims like Adafactor do not have that
There was a problem hiding this comment.
I consider that is implied by scalar optimizer, everything is element wise so states can only have same shape.
I think a note in the docstring would be enough?
Provide unified optimizer class for different scalar update functions.
Keep the detailed docstring in update function and only have very light docstring for the wrapper class.