Skip to content

Feat/Decoder migration: DeepSeek/Gemma3/Gemma4/Llama4#3114

Draft
hsuan-lun-chiang wants to merge 1 commit intoAI-Hypercomputer:mainfrom
CIeNET-International:feat/Migrate-Decoder-And-Tests-to-NNX
Draft

Feat/Decoder migration: DeepSeek/Gemma3/Gemma4/Llama4#3114
hsuan-lun-chiang wants to merge 1 commit intoAI-Hypercomputer:mainfrom
CIeNET-International:feat/Migrate-Decoder-And-Tests-to-NNX

Conversation

@hsuan-lun-chiang
Copy link
Copy Markdown
Collaborator

@hsuan-lun-chiang hsuan-lun-chiang commented Feb 9, 2026

Description

Implement and update the following models in NNX decoder that were are not supported in previous PR 2831:

  • DeepSeek
  • Gemma 3, 4
  • Llama4

Tests

Test with different model and compare with Linen training. Details in the GDoc file

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch from 7faa14d to edbbf29 Compare February 9, 2026 08:15
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 31.34328% with 138 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/layers/nnx_decoders.py 29.53% 119 Missing and 17 partials ⚠️
src/maxtext/layers/initializers.py 71.42% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch 9 times, most recently from 8a3e073 to 2f30ac1 Compare February 12, 2026 11:19
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch 6 times, most recently from 1a5740b to 5725403 Compare February 26, 2026 07:27
Comment thread tests/unit/multi_token_prediction_test.py Outdated
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch 12 times, most recently from de4ec11 to 29a9b74 Compare March 6, 2026 09:31
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch 4 times, most recently from 047b91e to 1d3cc0c Compare March 16, 2026 08:03
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch 6 times, most recently from 13cfedf to d00508e Compare March 23, 2026 08:52
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch from 40f33b8 to c4b5e64 Compare March 23, 2026 09:40
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch 4 times, most recently from 80ebfcb to f0ddf63 Compare March 25, 2026 10:00
Comment thread tests/unit/nnx_decoder_test.py Outdated
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch 3 times, most recently from e1bc3f2 to 21fd4f5 Compare April 1, 2026 07:18
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Decoder-And-Tests-to-NNX branch 6 times, most recently from 06657a2 to 53f3052 Compare April 9, 2026 09:11
@RissyRan
Copy link
Copy Markdown
Collaborator

Thanks for the change! Before the review, could you ensure those are tested? Thank you!

cc @bvandermoon @entrpn

@hsuan-lun-chiang
Copy link
Copy Markdown
Collaborator Author

hsuan-lun-chiang commented Apr 16, 2026

Thanks for the change! Before the review, could you ensure those are tested? Thank you!

cc @bvandermoon @entrpn

Hi @RissyRan, the Linen/NNX comparison logs for Llama4/Deepseek/Gemma4 are in the GDoc file. The PR is passing all unit tests with NNX flags (enable_nnx=True and pure_nnx_decoder=True), except for a few cases requiring further discussion, which we’ve documented here. Would you mind taking a look? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants