Skip to content

feat(nn): add configurable norm_layer to Mlp#1702

Open
Thabhelo wants to merge 2 commits into
NVIDIA:mainfrom
Thabhelo:feat/mlp-configurable-norm-layer
Open

feat(nn): add configurable norm_layer to Mlp#1702
Thabhelo wants to merge 2 commits into
NVIDIA:mainfrom
Thabhelo:feat/mlp-configurable-norm-layer

Conversation

@Thabhelo
Copy link
Copy Markdown

@Thabhelo Thabhelo commented Jun 6, 2026

PhysicsNeMo Pull Request

Description

Adds a norm_layer parameter to physicsnemo.nn.Mlp so normalization type is configurable instead of being limited to BatchNorm1d via use_batchnorm.

Supported options:

  • norm_layer="batchnorm", "layernorm", or "te_layernorm"
  • norm_layer=nn.LayerNorm or get_layer_norm_class() for TE-aware LayerNorm
  • Any callable factory invoked as norm_layer(out_features)

use_batchnorm=True remains backward compatible and is mutually exclusive with norm_layer.

Closes #1451

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards. (N/A: physicsnemo.nn layer change, not physicsnemo.models.)

Notes

  • norm_layer is documented in the Mlp class docstring and doctest examples; Sphinx exposes it via docs/api/nn/layers/fully_connected_mlp.rst automodule.

Dependencies

None.

Test plan

  • pytest test/nn/module/test_mlp_layers.py -v
  • pytest --doctest-modules physicsnemo/nn/module/mlp_layers.py
  • pre-commit hooks on changed files (ruff, interrogate, markdownlint, license, import-linter)

Add a norm_layer parameter supporting LayerNorm, TE-aware norms, and
custom factories while keeping use_batchnorm backward compatible.

Closes NVIDIA#1451

Signed-off-by: Thabhelo <50872400+Thabhelo@users.noreply.github.com>
@Thabhelo Thabhelo requested a review from loliverhennigh as a code owner June 6, 2026 02:56
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 6, 2026

Greptile Summary

This PR adds a norm_layer parameter to physicsnemo.nn.Mlp, replacing the hard-coded BatchNorm1d path with a flexible factory resolved by the new _resolve_norm_factory helper. The use_batchnorm flag is preserved for backward compatibility and is mutually exclusive with the new parameter.

  • _resolve_norm_factory accepts a string alias (\"batchnorm\", \"layernorm\", \"te_layernorm\"), a class, or any callable factory; unknown strings raise ValueError and a pre-instantiated nn.Module is explicitly rejected.
  • Both \"layernorm\" and \"te_layernorm\" resolve to exactly the same get_layer_norm_class() factory with no behavioral distinction — \"te_layernorm\" does not enforce TE availability and silently falls back to nn.LayerNorm, making the two strings undifferentiated aliases despite the docstring listing them separately.

Important Files Changed

Filename Overview
physicsnemo/nn/module/mlp_layers.py Adds norm_layer parameter and _resolve_norm_factory helper; "layernorm" and "te_layernorm" strings are silent aliases (both call get_layer_norm_class()), which makes the "te_layernorm" option misleading
test/nn/module/test_mlp_layers.py Good coverage of new norm_layer paths; no test exercises the "te_layernorm" string directly to verify it differs from "layernorm"
CHANGELOG.md Entry added under the correct section with accurate description of the new norm_layer parameter and backward-compatibility note

Reviews (1): Last reviewed commit: "feat(nn): add configurable norm_layer to..." | Re-trigger Greptile

Comment thread physicsnemo/nn/module/mlp_layers.py Outdated
Comment thread physicsnemo/nn/module/mlp_layers.py
Split layernorm (PyTorch LayerNorm) from te_layernorm (requires TE and
CUDA), store norm_layer on the module, and extend tests accordingly.

Signed-off-by: Thabhelo <50872400+Thabhelo@users.noreply.github.com>
@Thabhelo
Copy link
Copy Markdown
Author

Thabhelo commented Jun 6, 2026

Okay cool, I have addressed the feedback in the latest push:

  • "layernorm" now maps to PyTorch nn.LayerNorm; "te_layernorm" requires transformer_engine and CUDA (raises otherwise).
  • get_layer_norm_class() remains available as a user-supplied factory for TE-aware auto selection.
  • self.norm_layer is stored on Mlp for introspection.

Added tests for the TE string path and the stored attribute, too.

@Thabhelo
Copy link
Copy Markdown
Author

Thabhelo commented Jun 6, 2026

Greptile feedback from the first review is addressed in 8896888. Replies are on the inline threads. docs come from the updated Mlp docstrings via Sphinx automodule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🚀[FEA]: Make Mlp layer normalization type configurable

1 participant