Fix LayerNorm Scaling implementation by davidgonmar · Pull Request #698 · allenai/OLMo-core

davidgonmar · 2026-05-29T20:32:47Z

Issue

The current LayerNorm Scaling implementation has a silent bug when used in the training pipeline.

The scale is implemented as a buffer and initialized in the __init__ method of LayerNormScaledTransformerBlock. However, model initialization in src/olmo_core/nn/transformer/model.py calls to_empty before initializing each module’s weights individually. This effectively erases the contents of the ln_scale buffer and causes incorrect behavior during training.

Fix

There are two possible fixes. The first is to keep using a buffer and implement a custom initialization function. The second is to implement the scale as a Python scalar, which is simpler and is the approach taken in this PR.

fix lns

b302088

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LayerNorm Scaling implementation#698

Fix LayerNorm Scaling implementation#698
davidgonmar wants to merge 1 commit into
allenai:mainfrom
davidgonmar:fix-layernormscaling

davidgonmar commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidgonmar commented May 29, 2026

Issue

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant