NNX migration prep (4/N): sharding tools and Linen<->NNX checkpoint utilities by ecnal-cienet · Pull Request #3525 · AI-Hypercomputer/maxtext

ecnal-cienet · 2026-03-31T14:46:24Z

NNX Migration Route Map

✅ Add NNX scaffolding: pure_nnx flag, init_state_fn, TrainStateNNX, NNX utils. Linen workflow unchanged. (PR #3427)
✅ NNX sharding utilities: get_abstract_state_nnx, get_named_sharding_nnx, set_named_sharding_nnx, get_partition_spec_nnx, get_mesh_from_config. (PR #3470)
✅ NNX fully supported end-to-end: TrainStateNNX, model creation, gradient accumulation, checkpointing, and training loop dispatch. (PR #3500)
✅ [This PR] NNX sharding diagnostics and bidirectional Linen↔NNX checkpoint conversion utilities. (PR #3525)
❌ NNX post-training fixes: MultimodalInput unpacking, scalar LR guard, nested NNX transform workaround.
❌ Enable NNX by default; fix unit and integration test failures.
❌ Remove Linen-specific code paths and NNX compatibility flags.

Description

Note: This is the fourth in a series of NNX migration PRs. This PR adds developer tooling to inspect NNX sharding and convert / compare checkpoints across Linen and NNX formats. No training logic is changed.

Sharding diagnostics

maxtext_utils.py — print_shardings_params now dispatches on pure_nnx: for NNX models it iterates over the flat nnx.State rather than the Linen params tree.
tests/utils/run_sharding_dump.py — run_single_dump() now propagates --pure_nnx=true to the sharding-dump subprocess when the flag is set, enabling NNX sharding dumps without manual flag threading.

Linen ↔ NNX checkpoint converter

src/maxtext/checkpoint_conversion/linen_nnx_converter.py — a standalone CPU-only script that bidirectionally converts Orbax checkpoints between Linen and NNX formats.

Key transformations handled:

Direction	`params` tree	`opt_state`	`step`	Layer layout
Linen → NNX	`params/params/<model>` → `model/<model>` + `{value:}` wrappers	remove `params` level from `mu`/`nu`	move inside `optimizer/`	stack `layers_N` arrays → `layers` tensor (axis 1)
NNX → Linen	reverse of above	add `params` level	move to top level	unstack `layers` tensor → `layers_N` per-layer arrays

--direction accepts linen_to_nnx, nnx_to_linen, or auto (detects format from checkpoint keys).

Checkpoint comparison utility

src/maxtext/checkpoint_conversion/compare_linen_nnx_checkpoint.py — compares tree structure, shapes, and optionally values between any two Orbax checkpoints (Linen vs NNX, or same-format). Auto-detects format and applies cross-format normalization (layer axis transposition, {value:} unwrapping, RNG filtering) only when needed.

# Structure + shape comparison (Linen vs NNX)
python compare_linen_nnx_checkpoint.py \
  --ckpt_path_1="gs://bucket/linen_checkpoint/0/items" \
  --ckpt_path_2="gs://bucket/nnx_checkpoint/0/items"

# Value comparison
python compare_linen_nnx_checkpoint.py \
  --ckpt_path_1="gs://bucket/ckpt_a/0/items" \
  --ckpt_path_2="gs://bucket/ckpt_b/0/items" \
  --compare_values --atol=1e-5 --rtol=1e-5

Tests

Unit tests:

python3 -m pytest tests/unit/linen_nnx_converter_test.py -v
python3 -m pytest tests/unit/compare_linen_nnx_checkpoint_test.py -v

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-03-31T14:50:31Z

Codecov Report

❌ Patch coverage is 62.64151% with 396 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/trainers/pre_train/train.py	32.35%	94 Missing and 21 partials ⚠️
...ckpoint_conversion/compare_linen_nnx_checkpoint.py	65.03%	97 Missing and 10 partials ⚠️
src/maxtext/utils/sharding.py	5.55%	50 Missing and 1 partial ⚠️
...xtext/checkpoint_conversion/linen_nnx_converter.py	87.82%	22 Missing and 16 partials ⚠️
src/maxtext/utils/maxtext_utils.py	73.73%	16 Missing and 10 partials ⚠️
src/maxtext/utils/train_utils.py	27.58%	19 Missing and 2 partials ⚠️
src/maxtext/utils/gradient_accumulation.py	27.77%	8 Missing and 5 partials ⚠️
src/maxtext/common/checkpointing.py	43.75%	6 Missing and 3 partials ⚠️
src/maxtext/utils/muon_utils.py	66.66%	4 Missing and 3 partials ⚠️
src/maxtext/layers/nnx_decoders.py	0.00%	5 Missing and 1 partial ⚠️
... and 1 more

📢 Thoughts on this report? Let us know!

- Add TrainStateNNX (layers/train_state_nnx.py) with checkpoint and unit tests - Refactor model_creation_utils with create_nnx_abstract_model(); add NNX support to muon_utils - Add get_abstract_state_nnx() and get_nnx_named_sharding_with_scan_axis() to maxtext_utils.py - Wire NNX train state into train.py and train_utils.py with pure_nnx dispatch

…ison utility - modify print_shardings_params to support NNX (maxtext_utils.py) - add --pure_nnx flag to run_sharding_dump.py - add bidirectional Linen<->NNX checkpoint conversion utility (linen_nnx_converter.py) - add checkpoint comparison utility for Linen vs NNX validation (compare_linen_nnx_checkpoint.py)

ecnal-cienet force-pushed the feat/nnx-linen-converter-and-sharding-tools branch 6 times, most recently from bcd7b07 to f27e4f9 Compare April 6, 2026 19:02

ecnal-cienet mentioned this pull request Apr 6, 2026

NNX migration prep (5/N): enable NNX by default #3526

Draft

ecnal-cienet force-pushed the feat/nnx-linen-converter-and-sharding-tools branch 5 times, most recently from 606baf8 to 9895925 Compare April 13, 2026 14:59

ecnal-cienet force-pushed the feat/nnx-linen-converter-and-sharding-tools branch 3 times, most recently from d6627ef to 91535ec Compare April 16, 2026 17:46

This was referenced Apr 16, 2026

[NNX] NNX migration prep (4/N): sharding tools, Linen↔NNX checkpoint utilities, and post-training fixes #3652

Draft

NNX migration prep (2/N): NNX utils and sharding utilities #3470

Merged

ecnal-cienet force-pushed the feat/nnx-linen-converter-and-sharding-tools branch from 91535ec to 3f34221 Compare April 16, 2026 22:23

ecnal-cienet force-pushed the feat/nnx-linen-converter-and-sharding-tools branch from 3f34221 to 56d4548 Compare April 20, 2026 13:52

ecnal-cienet force-pushed the feat/nnx-linen-converter-and-sharding-tools branch from 56d4548 to 07431d2 Compare April 20, 2026 13:52

ecnal-cienet closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NNX migration prep (4/N): sharding tools and Linen<->NNX checkpoint utilities#3525

NNX migration prep (4/N): sharding tools and Linen<->NNX checkpoint utilities#3525
ecnal-cienet wants to merge 2 commits intomainfrom
feat/nnx-linen-converter-and-sharding-tools

ecnal-cienet commented Mar 31, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ecnal-cienet commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NNX Migration Route Map

Description

Sharding diagnostics

Linen ↔ NNX checkpoint converter

Checkpoint comparison utility

Tests

Checklist

Uh oh!

codecov Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ecnal-cienet commented Mar 31, 2026 •

edited

Loading

codecov Bot commented Mar 31, 2026 •

edited

Loading