Standardize weight_decay_mask naming and add weight decay by rdyro · Pull Request #1635 · google-deepmind/optax

rdyro · 2026-03-20T05:02:59Z

Test PR co-written with Gemini.

Addresses inconsistencies where the mask parameter for weight decay was named mask instead of weight_decay_mask in optimizers like adamw, nadamw, adan, lion, lamb, and adamaxw. Additionally, missing decoupled weight decay support (with weight_decay and weight_decay_mask) was added to multiple base optimizers including adabelief, adagrad, amsgrad, radam, rmsprop, sgd, sm3, yogi, optimistic_gradient_descent, optimistic_adam, and optimistic_adam_v2. The novograd alias continues to use its internal decoupled approach.

Optimizers With Weight Decay

Historically designed with decoupled weight decay explicitly:

adamw
nadamw
adamaxw

Already have weight decay added:

adan
lion
lamb
adadelta
adafactor (named weight_decay_rate)

Newly updated to standard decoupled weight decay:

adabelief
adagrad
amsgrad
radam
rmsprop
sgd
sm3
yogi
optimistic_gradient_descent
optimistic_adam
optimistic_adam_v2

Applying weight decay inside their specific transformation:

novograd
lars

Optimizers Without Weight Decay

Left without weight decay to serve as unmodified algorithms:
Kept for backward compatibility to match their "W" counterparts natively.

adam
nadam
adamax

Modifies gradient signs (norm agnostic):
Weight decay could behave unpredictably since gradient magnitude is ignored.

sign_sgd
signum

Niche algorithms or raw-gradient wrappers:
Weight decay is not a standard practice for these methods.

noisy_sgd
rprop
polyak_sgd
lbfgs
fromage

…ight decay to optimizers Addresses inconsistencies where the mask parameter for weight decay was named `mask` instead of `weight_decay_mask` in optimizers like adamw, nadamw, adan, lion, lamb, and adamaxw. Additionally, missing decoupled weight decay support (with `weight_decay` and `weight_decay_mask`) was added to multiple base optimizers including adabelief, adagrad, amsgrad, radam, rmsprop, sgd, sm3, yogi, optimistic_gradient_descent, optimistic_adam, and optimistic_adam_v2. The novograd alias continues to use its internal decoupled approach.

…param requirements

rdyro force-pushed the alias-weight-decay-unify branch from 670f1d3 to 6ed973a Compare March 20, 2026 05:09

fix(alias): un-break downstream testing for weight_decay default 0.0 …

4cf37de

…param requirements

rdyro force-pushed the alias-weight-decay-unify branch from 988573e to 4cf37de Compare March 20, 2026 06:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize weight_decay_mask naming and add weight decay#1635

Standardize weight_decay_mask naming and add weight decay#1635
rdyro wants to merge 2 commits intogoogle-deepmind:mainfrom
rdyro:alias-weight-decay-unify

rdyro commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdyro commented Mar 20, 2026

Optimizers With Weight Decay

Optimizers Without Weight Decay

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant