@@ -44,8 +44,8 @@ MaxText is an open-source, high-performance LLM framework written in Python/JAX.
4444
4545### Qwen3
4646
47- - ** Variants** : Dense (0.6B–32B); MoE (30B-A3B, 235B-A22B, 480B Coder)
48- - ** Notes** : ** QK-Norm** , GQA, SwiGLU, RMSNorm, RoPE.
47+ - ** Variants** : Dense (0.6B–32B); MoE (30B-A3B, 235B-A22B, 480B Coder), MoE w/ Hybrid Attention (Next-80B-a3b)
48+ - ** Notes** : ** QK-Norm** , GQA, SwiGLU, RMSNorm, RoPE, GatedDeltaNet .
4949
5050### GPT-OSS
5151
@@ -80,12 +80,12 @@ The following summarizes observed runtime efficiency and scaling behaviors of Ma
8080
8181- ** Model Implementation Guides & Source Code:**
8282
83- - ** Llama** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/llama2/run_llama2.md ) | [ Llama2 and Llama3 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /llama2.py ) | [ Llama4 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /llama4.py )
84- - ** Gemma** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/gemma/Run_Gemma.md ) | [ Gemma Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /gemma.py ) | [ Gemma2 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /gemma2.py ) | [ Gemma3 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /gemma3.py )
85- - ** Mixtral** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/mixtral/Run_Mixtral.md ) | [ Mixtral Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /mixtral.py ) | [ Mistral Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /mistral.py )
86- - ** DeepSeek** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/deepseek/Run_DeepSeek.md ) | [ DeepSeek Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /deepseek.py )
87- - ** Qwen3** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/qwen/moe/run_qwen_moe.md ) | [ Qwen3 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /qwen3.py )
88- - ** GPT-OSS** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/gpt_oss/run_gpt_oss.md ) | [ GPT-OSS Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/layers /gpt_oss.py )
83+ - ** Llama** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/llama2/run_llama2.md ) | [ Llama2 and Llama3 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /llama2.py ) | [ Llama4 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /llama4.py )
84+ - ** Gemma** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/gemma/Run_Gemma.md ) | [ Gemma Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /gemma.py ) | [ Gemma2 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /gemma2.py ) | [ Gemma3 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /gemma3.py )
85+ - ** Mixtral** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/mixtral/Run_Mixtral.md ) | [ Mixtral Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /mixtral.py ) | [ Mistral Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /mistral.py )
86+ - ** DeepSeek** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/deepseek/Run_DeepSeek.md ) | [ DeepSeek Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /deepseek.py )
87+ - ** Qwen3** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/qwen/moe/run_qwen_moe.md ) | [ Qwen3-Next Guide ] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/qwen/next/run_qwen3_next.md ) | [ Qwen3 Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models/qwen3.py ) | [ Qwen3-Next Source ] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /qwen3.py)
88+ - ** GPT-OSS** : [ Guide] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/gpt_oss/run_gpt_oss.md ) | [ GPT-OSS Source] ( https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/models /gpt_oss.py )
8989
9090- ** Technical Explanations:**
9191
0 commit comments