Skip to content

mHC optimization for expansion rate 4 (flag controlled)#3890

Open
dandragona wants to merge 1 commit into
AI-Hypercomputer:mainfrom
dandragona:dandragona/mhc_k4_shortcut
Open

mHC optimization for expansion rate 4 (flag controlled)#3890
dandragona wants to merge 1 commit into
AI-Hypercomputer:mainfrom
dandragona:dandragona/mhc_k4_shortcut

Conversation

@dandragona
Copy link
Copy Markdown
Collaborator

@dandragona dandragona commented May 12, 2026

Description

When expansion rate is equal to four we don't need to do sinkhorn as we can generate the doubly-stochastic matrix by taking a random convex combination of the permutation matrices (there are 4! = 24 of them when using expansion rate 4).

Based on https://arxiv.org/pdf/2601.05732.

  • Flag-controlled for now, but once we establish a DeepSeek v4 baseline we can use it by default if it shows a performance improvement.
  • Can potentially support this for expansion rates higher than 4 too before N! becomes too large (maybe it will increase training stability?).
  • Did some refactoring on the mhc_test.py code.
  • pylint refactors on some other files.

Tests

Just tested with unit tests, adding the gemini-review now.

Waiting to performance test it once we get baseline DeepSeek v4 numbers.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions github-actions Bot mentioned this pull request May 13, 2026
4 tasks
@dandragona dandragona force-pushed the dandragona/mhc_k4_shortcut branch 5 times, most recently from 51c1161 to 1799cc1 Compare May 13, 2026 19:57
@dandragona dandragona force-pushed the dandragona/mhc_k4_shortcut branch 8 times, most recently from fa7f997 to b3a8bb4 Compare May 14, 2026 18:31
Copy link
Copy Markdown
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing mHC-lite! The core logic looks solid.

  • Your implementation aligns with Eqs (2) and (3), and the parameter shapes are correct.
  • Generalize k: The shortcut is currently hardcoded for k=4. While the $k!$ permutation scaling limits large $k$ in practice, the underlying math applies to any $k$. Let's parameterize it for flexibility.
  • Cleanup: I left a few minor inline comments. Please also revert unrelated formatting changes.

Also thanks for adding the unit tests! As a follow-up, you can run an end-to-end test to verify training stability and performance.

Comment thread src/maxtext/configs/types.py Outdated
Comment thread src/maxtext/configs/base.yml Outdated
Comment thread tests/unit/grain_data_processing_test.py Outdated
Comment thread tests/unit/mhc_test.py Outdated
Comment thread tests/unit/mhc_test.py Outdated
[None, get_test_config_path()],
**extra_args,
run_name="test_mhc",
skip_jax_distributed_system=True,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not skip, instead mark @pytest.mark.tpu_only for relavent tests

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove skip_jax_distributed_system=True?

Comment thread src/maxtext/layers/mhc.py Outdated
Comment thread src/maxtext/layers/mhc.py Outdated
Comment thread tests/unit/mhc_test.py Outdated
@dandragona dandragona force-pushed the dandragona/mhc_k4_shortcut branch from b3a8bb4 to 8e83338 Compare May 22, 2026 16:35
@dandragona dandragona requested a review from darisoy as a code owner May 22, 2026 16:35
@dandragona dandragona force-pushed the dandragona/mhc_k4_shortcut branch from 8e83338 to f8bcc1a Compare May 22, 2026 16:46
Copy link
Copy Markdown
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Minor comments.

Comment thread src/maxtext/layers/mhc.py Outdated
Comment thread .gitignore Outdated
Comment thread tests/unit/mhc_test.py Outdated
Comment thread tests/unit/mhc_test.py Outdated
[None, get_test_config_path()],
**extra_args,
run_name="test_mhc",
skip_jax_distributed_system=True,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove skip_jax_distributed_system=True?

Comment thread tests/unit/mhc_test.py Outdated
@dandragona dandragona force-pushed the dandragona/mhc_k4_shortcut branch from f8bcc1a to 944741b Compare May 26, 2026 20:57
Comment thread src/maxtext/layers/mhc.py
Copy link
Copy Markdown
Collaborator

@parambole parambole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@dandragona dandragona force-pushed the dandragona/mhc_k4_shortcut branch 2 times, most recently from 4eb4596 to a4a169e Compare June 1, 2026 15:26
…tions and add enable_mhc_k4_shortcut feature gate
@dandragona dandragona force-pushed the dandragona/mhc_k4_shortcut branch from a4a169e to 2c88ab1 Compare June 1, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants