Shuwen check vma by Shuwen-Fang · Pull Request #3411 · AI-Hypercomputer/maxtext

Shuwen-Fang · 2026-03-13T19:31:16Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Imported from GitHub PR #2831 Migrate the Transformer decoder layer into NNX. Note: The following models are currently not supported: - DeepSeek - Gemma3 - Llama4 Support for these models will be added in a follow-up PR. Strategy: A `pure_nnx_decoder` flag is added to control whether NNX or Linen decoder shall be used. Initial migration doesn't include the pipeline NNX support. Conducted these tests. Details in the [GDoc file](https://docs.google.com/document/d/1NbUP3g5glgbC6bMyt44pwM_vQA1NR7U2rBUzfbTDwSs/edit?pli=1&resourcekey=0-9EUahtzL-hCycdu7l0grhQ&tab=t.htq5367h8au0) 1. Test with different model and compare with Linen training 2. Golden logits comparison 3. Inference 4. Checkpoint comparison (Including TreeStructure Comparison) 5. Sharding comparison TODOs: - NNX version of unit tests (future PRs) Before submitting this PR, please make sure (put X in square brackets): - [x] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label. - [x] I have necessary comments in my code, particularly in hard-to-understand areas. - [x] I have run end-to-end tests tests and provided workload links above if applicable. - [x] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files). Copybara import of the project: -- 073e916 by hsuan-lun-chiang <hsuan-lun.chiang@cienet.com>: Migrate Decoder to NNX Adding nnx_decoders.py in parallel with decoders.py 1. Dup and modifiy decoders.py on new file nnx_decoders.py 2. add new config pure_nnx_decoder to control if model will use NNXDecoder, default false for now 3. modify relative code to accomodate the change 4. add/modify unit test Merging this change closes #2831 COPYBARA_INTEGRATE_REVIEW=#2831 from CIeNET-International:feat/Migrate-Decoder-to-NNX 073e916 PiperOrigin-RevId: 884170982

…o add default tokenizer_path for default model

…allation

Imported from GitHub PR #3449 Move the `install_maxtext_extra_deps` deps directory to `dependencies` after `dependencies` was added to the PyPI package. This command still works/runs the expected installation: ``` uv pip install -e .[tpu] --resolution=lowest install_maxtext_tpu_github_deps install_maxtext_tpu_post_train_extra_deps ``` CI also builds this command Before submitting this PR, please make sure (put X in square brackets): - [x] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label. - [x] I have necessary comments in my code, particularly in hard-to-understand areas. - [x] I have run end-to-end tests tests and provided workload links above if applicable. - [x] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation. Copybara import of the project: -- 277b08e by Branden Vandermoon <bvandermoon@google.com>: Move install_maxtext_extra_deps to dependencies directory Merging this change closes #3449 COPYBARA_INTEGRATE_REVIEW=#3449 from AI-Hypercomputer:bvandermoon-github-deps 277b08e PiperOrigin-RevId: 886463160

add qwen3-base to configs/types and checkpoint_conversion/param_mapping add qwen3-base configs to checkpoint_conversion/hf_model_configs pyink

PiperOrigin-RevId: 886627046

# Description FP8 path is still using tokamax internal backend APIs. The new `RaggedDotGroupSizes` was introduced ([pull3330](#3330)) for Tokamax public APIs in bf16 path, which broke FP8. # Tests Benchmarks were run internally. # Checklist Before submitting this PR, please make sure (put X in square brackets): - [X] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label. - [X] I have necessary comments in my code, particularly in hard-to-understand areas. - [X] I have run end-to-end tests tests and provided workload links above if applicable. - [X] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).

github-actions · 2026-04-26T16:17:10Z

This PR has been automatically marked as stale because it has not had recent activity. It will be closed soon if no further activity occurs. Thank you for your contributions.

Shuwen-Fang requested review from Obliviour, SujeethJinesh, gpolovets1, jacoguzo, mailvijayasingh, mitalisi, notabee and shauryagup as code owners March 25, 2026 19:26

hsuan-lun-chiang and others added 27 commits March 25, 2026 19:31

docs: simplify checkpoint storage flags for Pathways workloads

319992e

Add option to start test_batch in train_rl from a specific index, als…

5afca54

…o add default tokenizer_path for default model

Add qwen2 implementation

6740d85

Move tests, rto_setup.sh and preflight.sh to docker image

010816a

Update post-training docs to point to single source of truth for inst…

1d1704d

…allation

add custom mesh and logical rule support

c6da8da

add qwen3-base variants and qwen3-1.7b

8feb476

add qwen3-base to configs/types and checkpoint_conversion/param_mapping add qwen3-base configs to checkpoint_conversion/hf_model_configs pyink

Fix src/MaxText references in GPU/runner Dockerfiles

1dbde05

PiperOrigin-RevId: 886627046

Update moe.py

8c30d30

format

e547f7a

Update moe.py

0deb4cb

split logical names in moe module

3adf889

base.yml formatting auto

3df0d31

moe formatting changes

a5a7cee

check vma changes

99631fd

remove input variable name due to internal change of name

331882c

enable pp with batch split ds

7cd4ede

add another layer of custom vjp

665877f

add new pipeline weight prefetching config

3b46bd5

refactor pr

42b4e8a

retrigger CI

48555d8

Remove post_training_local_dependencies.Dockerfile

8b774cf

update tokamax group sizes for pipeline

84ae8f7

Update user docs to drop config file path

8a2c5a3

Shuwen-Fang force-pushed the shuwen-check-vma branch from 8684153 to 8a2c5a3 Compare March 25, 2026 19:33

github-actions Bot added the stale Automatically applied to stale PRs. label Apr 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuwen check vma#3411

Shuwen check vma#3411
Shuwen-Fang wants to merge 40 commits intomainfrom
shuwen-check-vma

Shuwen-Fang commented Mar 13, 2026 •

edited by andytwigg

Loading

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

Shuwen-Fang commented Mar 13, 2026 • edited by andytwigg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Shuwen-Fang commented Mar 13, 2026 •

edited by andytwigg

Loading