Snehalv dsv4 conv utility by snehalv2002 · Pull Request #4078 · AI-Hypercomputer/maxtext

snehalv2002 · 2026-06-05T15:34:40Z

Description

Standalone and conversion util support for DeepSeek V4 Scanned.
FIXES: b/509930555

Tests

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

…near) Implement architectural core primitives required for DeepSeek-V4 integration into MaxText: - DeepSeekV4RMSNorm & DeepSeekV4UnweightedRMSNorm: RMS normalization layers utilizing float32 variance pooling. Includes unweighted scale-free variants that avoid allocating or synchronizing trainable weight parameters. - DeepSeekGroupedLinear: Block-diagonal grouped linear projection layer supporting parallel group projection via einsum broadcasting ([B, S, hc_mult, D] -> [B, S, D]). - DeepSeekV4RotaryEmbedding: Interleaved partial rotary positional embedding pairing consecutive even/odd channels. - Unit test suite (deepseek_v4_vs_reference_test.py) validating numerical parity against PyTorch reference implementations at atol=1e-5, rtol=1e-5.

…outer, RoutedMoE) Implement Mixture of Experts routing gates and execution layers for DeepSeek-V4 integration into MaxText: - HashRouter: Token routing mechanism utilizing MD5 hash projections for deterministic expert assignment. - TopKRouter: Gated top-k router implementing sigmoid scaling and score normalization. - RoutedMoE & RoutedAndSharedMoE: Execution layers supporting layer_idx routing and FP32 expert summation parity. - Parity verification: Extended unit test suite (deepseek_v4_vs_reference_test.py) validating routing parity against PyTorch reference implementations at atol=1e-5, rtol=1e-5.

…ghtningIndexer) Implement compressed attention mechanisms and indexer modules for DeepSeek-V4 integration into MaxText: - CSACompressor & HCACompressor: Long-range attention compressors supporting causal block bias and YaRN frequency scaling decoupling. - LightningIndexer: Memory-efficient indexer module implementing sentinel masking and dynamic RoPE scaling. - Configuration: Register attention compression hyperparameters (compress_ratios, index_head_dim, sliding_window) in types.py and base.yml. - Parity verification: Extended unit test suite (deepseek_v4_vs_reference_test.py) validating attention compression parity against PyTorch reference implementations at atol=1e-5, rtol=1e-5.

…ation stack

…ear projection layers from mainline

…variable tags to satisfy XLA compile rules

…ry expansion parameters for Trillium parity

…rporate DeepSeek-V4 top-k hash routers

…ked reference numerical parity

…nner test suites

… across automated runner suites

parambole and others added 22 commits May 14, 2026 17:45

feat: implement DeepSeek-V4 model integration, decoders, and configur…

9c5c564

…ation stack

feat(core): synchronize production-grade normalization bounds and lin…

4e5725c

…ear projection layers from mainline

feat(moe): integrate float32 static routing lookup tables and custom …

0e3615a

…variable tags to satisfy XLA compile rules

feat(attention): update CSA/HCA compressors and Lightning Indexer que…

327a68d

…ry expansion parameters for Trillium parity

Merge deepseek_v4_core_primitives into dsv4-moe-routing-primitives

4d0324b

Merge dsv4-moe-routing-primitives into deepseek_v4_compressed_attention

e8a7a6f

Merge deepseek_v4_compressed_attention into dsv4_integration and inco…

e45d4ff

…rporate DeepSeek-V4 top-k hash routers

test(dsv4): disable sa_block_kv hardware grid padding to secure unmas…

559fe54

…ked reference numerical parity

ci(github): exclude deepseek_v4_vs_reference_test.py across GitHub ru…

f5b3ebf

…nner test suites

Merge deepseek_v4_core_primitives into dsv4-moe-routing-primitives

47788dc

Merge dsv4-moe-routing-primitives into deepseek_v4_compressed_attention

adb7774

Merge deepseek_v4_compressed_attention into dsv4_integration

3ba20fa

ci(pytest): ignore deepseek_v4_vs_reference_test.py parity evaluation…

54a97af

… across automated runner suites

Merge deepseek_v4_core_primitives into dsv4-moe-routing-primitives

bd8a4d6

Merge dsv4-moe-routing-primitives into deepseek_v4_compressed_attention

9d79b99

Merge deepseek_v4_compressed_attention into dsv4_integration

da39b01

fix(attention_compressed): use min kwarg in jnp.clip

5f0701c

fix zero loss issue

01d6efd

standalone and checkpoint util for deepseek v4

b48a1f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snehalv dsv4 conv utility#4078

Snehalv dsv4 conv utility#4078
snehalv2002 wants to merge 22 commits into
mainfrom
snehalv-dsv4-conv-utility

snehalv2002 commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

snehalv2002 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

snehalv2002 commented Jun 5, 2026 •

edited

Loading