Skip to content

fix: Muon momentum, harden streaming#25

Open
pszemraj wants to merge 36 commits into
mainfrom
fix/muon-correctness
Open

fix: Muon momentum, harden streaming#25
pszemraj wants to merge 36 commits into
mainfrom
fix/muon-correctness

Conversation

@pszemraj
Copy link
Copy Markdown
Owner

@pszemraj pszemraj commented Apr 9, 2026

The distributed-support work broke Muon's optimizer semantics without it being obvious - the FSDP2 owner-compute plumbing was fine, but momentum behavior drifted. This PR fixes that, makes streaming pretraining less fragile against transient HF Hub failures, and does some overdue housekeeping.

Changes

Muon optimizer

  • Restored standard Nesterov momentum behavior.
  • Split fused qkv.weight Q/K/V before orthogonalization, then repack into the fused layout.
  • Cleaned up defaults/naming: param_policy: hidden_2d, norm_factor: neobert / norm_factor: muon_reference.
  • Kept the FSDP2 owner-compute path (it was never the problem) and added tests for it.

Streaming robustness

  • Added retry with backoff for transient HF Hub read failures.
  • Supports resume from the last yielded example when the HF iterable dataset exposes state restore.

Repo cleanup

  • Moved GLUE validation logic src/neobert/glue/, HF classifier adapter src/neobert/huggingface/.
  • Split classifier/wrapper code out of model/model.py.
  • Consolidated duplicated checkpoint-loading logic and tokenizer test helpers.
  • Reorganized docs into docs/guides/ + docs/reference/, added a training optimization guide.
  • Test runs now fail on warnings by default.

still to test

  • Still need real 2-rank FSDP2 manual tests on multi-GPU - the single-node mocks pass but that's not a substitute.

@pszemraj pszemraj added the bug Something isn't working label Apr 9, 2026
@pszemraj pszemraj self-assigned this Apr 9, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb0ccf6f9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/neobert/pretraining/trainer.py Outdated
User added 3 commits April 9, 2026 14:48
- Remove redundant param_policy normalization/validation in
  _build_param_groups that duplicated MuonClipConfig.__post_init__
- Reset retry counter after each successful yield so isolated transient
  failures at different positions each get the full retry budget
- Remove misleading __len__ on RetryingStreamingDataset that created
  false "has length" signal for DataLoader
- Add test verifying retry budget resets between successful yields
The eval path used _move_batch_to_device with non_blocking=True but
never re-pinned batches, making transfers effectively blocking when
disable_dispatch was true.  Rather than patching the eval path, enable
loader-side pinning unconditionally so every consumer (train loop, eval
loop) gets page-locked batches.  The training path's explicit
_pin_cpu_tensors call remains necessary for gradient-accumulated batches
(torch.cat produces unpinned tensors) and is a no-op otherwise.
@pszemraj
Copy link
Copy Markdown
Owner Author

pszemraj commented Apr 9, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1a251a4fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/neobert/streaming.py Outdated
@pszemraj
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d458f47cf4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/neobert/checkpointing.py Outdated
@pszemraj
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f66e1065d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/neobert/contrastive/trainer.py
Comment thread src/neobert/checkpointing.py Outdated
@pszemraj
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@pszemraj pszemraj requested a review from amazingvince April 13, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant