Merge pull request #3642 from AI-Hypercomputer:save-gepa-notebook

Google-ML-Automation · Google-ML-Automation · commit 8a17c3d19a09 · 2026-04-21T21:53:25.000-07:00
PiperOrigin-RevId: 903601095
diff --git a/.github/workflows/run_jupyter_notebooks.yml b/.github/workflows/run_jupyter_notebooks.yml
@@ -105,7 +105,7 @@ jobs:
 
           for notebook in "$MAXTEXT_NOTEBOOKS_ROOT"/{sft,rl}*.ipynb; do
             filename=$(basename "$notebook")
-            if [[ "$filename" == "sft_llama3_demo_gpu.ipynb" ]]; then
+            if [[ "$filename" == "sft_llama3_demo_gpu.ipynb" || "$filename" == "maxtext_with_gepa.ipynb" ]]; then
               echo "Skipping $filename"
               continue
             fi
diff --git a/README.md b/README.md
@@ -41,6 +41,7 @@ See our guide on running MaxText in decoupled mode, without any GCP dependencies
 
 ## 🔥 Latest news 🔥
 
+* \[April 18, 2026\] Added a new notebook [maxtext_with_gepa.ipynb](https://github.com/AI-Hypercomputer/maxtext/blob/3c7d8d27864fc12cccac07786f02bd0e5262c982/src/maxtext/examples/maxtext_with_gepa.ipynb) for optimizing AIME prompts using the GEPA framework with Maxtext.
 * \[April 14, 2026\] Legacy `MaxText.*` post-training shims have been removed. Please refer to [src/MaxText/README.md](https://github.com/AI-Hypercomputer/maxtext/blob/0536605a8ca116087ed93178433a67e905be566c/src/MaxText/README.md) for details on the new command locations and how to migrate. 
 * \[April 13, 2026\] Kimi-K2 is now supported, along with MuonClip optimizer. Try the [kimi-k2-1t](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/src/maxtext/configs/models/kimi-k2-1t.yml) config and check the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/tests/end_to_end/tpu/kimi/Run_Kimi.md).  
 * \[April 10, 2026\] [DeepSeek-V3.2](https://arxiv.org/pdf/2512.02556) is now supported, featuring DeepSeek Sparse Attention for long context. Try it out with the [deepseek3.2-671b](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/src/maxtext/configs/models/deepseek3.2-671b.yml) config. See the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/tests/end_to_end/tpu/deepseek/Run_DeepSeek.md) for more details.  
diff --git a/docs/conf.py b/docs/conf.py
@@ -165,6 +165,9 @@
     r"https://github\.com/jax-ml/jax/commits/.*",
     # Ignore Hugging Face settings links which require login
     r"https://huggingface\.co/settings/tokens",
+    # Ignore GitHub PRs and blobs that trigger rate limiting
+    r"https://github\.com/AI-Hypercomputer/maxtext/pull/.*",
+    r"https://github\.com/google/maxtext/blob/.*",
 ]
 
 
diff --git a/docs/development/contribute_docs.md b/docs/development/contribute_docs.md
@@ -24,7 +24,7 @@ in [MyST Markdown syntax](https://myst-parser.readthedocs.io/en/latest/syntax/ty
 
 If you are writing documentation for MaxText, you may want to preview the
 documentation site locally to ensure things work as expected before a deployment
-to [Read The Docs](https://readthedocs.org/).
+to [Read The Docs](https://about.readthedocs.com/?ref=app.readthedocs.org).
 
 First, make sure you
 [install MaxText from source](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source)
diff --git a/src/maxtext/checkpoint_conversion/utils/hf_model_configs.py b/src/maxtext/checkpoint_conversion/utils/hf_model_configs.py
@@ -593,7 +593,6 @@
     tie_word_embeddings=False,
     torch_dtype="bfloat16",
     use_cache=True,
-    use_sliding_window=False,
     vocab_size=151936,
 )
 
@@ -630,7 +629,6 @@
     torch_dtype="bfloat16",
     transformers_version="4.51.0",
     use_cache=True,
-    use_sliding_window=False,
     vocab_size=151936,
 )
 
@@ -668,7 +666,6 @@
     transformers_version="4.51.0",
     use_cache=True,
     use_qk_norm=True,
-    use_sliding_window=False,
     vocab_size=151936,
 )
 
@@ -1076,7 +1073,6 @@ def __init__(self, **kwargs):
     "torch_dtype": "bfloat16",
     "transformers_version": "4.57.0.dev0",
     "use_cache": True,
-    "use_sliding_window": False,
     "vocab_size": 151936,
 }
 qwen3_next_80b_a3b_config = transformers.Qwen3NextConfig(**qwen3_next_80b_a3b_dict)
diff --git a/src/maxtext/examples/maxtext_with_gepa.ipynb b/src/maxtext/examples/maxtext_with_gepa.ipynb

Original file line number	Diff line number	Diff line change
`@@ -165,6 +165,9 @@`
`165`	`165`	`r"https://github\.com/jax-ml/jax/commits/.*",`
`166`	`166`	`# Ignore Hugging Face settings links which require login`
`167`	`167`	`r"https://huggingface\.co/settings/tokens",`
	`168`	`+ # Ignore GitHub PRs and blobs that trigger rate limiting`
	`169`	`+ r"https://github\.com/AI-Hypercomputer/maxtext/pull/.*",`
	`170`	`+ r"https://github\.com/google/maxtext/blob/.*",`
`168`	`171`	`]`
`169`	`172`
`170`	`173`