[Do Not Merge] Optimizations on Qwen3-Next GatedDeltaNet w/ Kernel & XProf Agent by Rohan-Bierneni · Pull Request #3077 · AI-Hypercomputer/maxtext

Rohan-Bierneni · 2026-02-04T18:08:22Z

Description

Using the suggestions from kernel & xprof agent, try to improve the Gated Delta Net implementation in qwen3.py. We test our changes using the script added as part of this pr. The script tests the forward pass, backward pass, overall train step, and memory consumption between the baseline implementation of the GDN versus our optimized version in qwen3.py. This allowed us to test out changes iteratively and quickly.

To test the script, please use the command:

python3 /src/maxtext/scratch_code/benchmark_gdn_optimization.py

Note: run this script on a TPU/GPU vm since on CPU it will take a while.

So far, total improvements on the Gated Delta Rule using Q3-Next configs & 4k Seq len are:

https://paste.googleplex.com/5438820566827008
Forward pass speedup: 2.27x
Train step speedup: 3.75x
Memory reduction: 76.01%

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456

Tests

Tested our changes using the benchmarking script and pr unit tests (train_compile test for qwen3 next)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-02-04T18:18:51Z

Codecov Report

❌ Patch coverage is 5.81395% with 81 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/models/qwen3.py	5.81%	81 Missing ⚠️

📢 Thoughts on this report? Let us know!

Add backward pass checks & memory checks Add backward pass & memory consumption checks Update memory calcs Optimizations made to GDN impl in qwen3.py (3x speedup) Update dummy configs to align with q3-next Update tflops calc to align with WY-optimized GDN remove mixed precision Update config for chunk size update dtype Add NaN test in backward pass Fix exploding gradient in gdn Reintroduce mixed precision typo in bloat16 typo fixed convert to float test pallas kernel for gdn wrong api name fix function positional args fix pallas code fix tensor indexing error only optimize forward pass update pallas code use float mask fix function returns add shardmap to kernel update with kernel agent suggestions fix matrix indexing fix matrix indexing mask before exp update benchmarking script

github-actions · 2026-04-20T16:30:14Z

This PR has been automatically marked as stale because it has not had recent activity. It will be closed soon if no further activity occurs. Thank you for your contributions.

Rohan-Bierneni requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, shuningjin, suexu1025 and vipannalla as code owners February 4, 2026 18:08

Rohan-Bierneni force-pushed the rbierneni-test-kernelagent branch from acc1d56 to b03fe07 Compare March 18, 2026 16:42

Rohan-Bierneni requested review from dipannita08 and igorts-git as code owners March 18, 2026 16:42

Rohan-Bierneni added 4 commits March 18, 2026 17:09

Fixed rebase errors

e25d9bc

Add entire pallas kernel for testing

c702a96

Add correct e2e kernel code

1825950

working kernel in gdn_pallas3.py and updated wrapper in qwen3.py

719d2f2

github-actions Bot added the stale Automatically applied to stale PRs. label Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do Not Merge] Optimizations on Qwen3-Next GatedDeltaNet w/ Kernel & XProf Agent#3077

[Do Not Merge] Optimizations on Qwen3-Next GatedDeltaNet w/ Kernel & XProf Agent#3077
Rohan-Bierneni wants to merge 5 commits intomainfrom
rbierneni-test-kernelagent

Rohan-Bierneni commented Feb 4, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Rohan-Bierneni commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Rohan-Bierneni commented Feb 4, 2026 •

edited

Loading

codecov Bot commented Feb 4, 2026 •

edited

Loading