Skip to content

Commit 8a17c3d

Browse files
Merge pull request #3642 from AI-Hypercomputer:save-gepa-notebook
PiperOrigin-RevId: 903601095
2 parents 25f7ba0 + 633eca2 commit 8a17c3d

6 files changed

Lines changed: 544 additions & 6 deletions

File tree

.github/workflows/run_jupyter_notebooks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ jobs:
105105
106106
for notebook in "$MAXTEXT_NOTEBOOKS_ROOT"/{sft,rl}*.ipynb; do
107107
filename=$(basename "$notebook")
108-
if [[ "$filename" == "sft_llama3_demo_gpu.ipynb" ]]; then
108+
if [[ "$filename" == "sft_llama3_demo_gpu.ipynb" || "$filename" == "maxtext_with_gepa.ipynb" ]]; then
109109
echo "Skipping $filename"
110110
continue
111111
fi

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ See our guide on running MaxText in decoupled mode, without any GCP dependencies
4141

4242
## 🔥 Latest news 🔥
4343

44+
* \[April 18, 2026\] Added a new notebook [maxtext_with_gepa.ipynb](https://github.com/AI-Hypercomputer/maxtext/blob/3c7d8d27864fc12cccac07786f02bd0e5262c982/src/maxtext/examples/maxtext_with_gepa.ipynb) for optimizing AIME prompts using the GEPA framework with Maxtext.
4445
* \[April 14, 2026\] Legacy `MaxText.*` post-training shims have been removed. Please refer to [src/MaxText/README.md](https://github.com/AI-Hypercomputer/maxtext/blob/0536605a8ca116087ed93178433a67e905be566c/src/MaxText/README.md) for details on the new command locations and how to migrate.
4546
* \[April 13, 2026\] Kimi-K2 is now supported, along with MuonClip optimizer. Try the [kimi-k2-1t](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/src/maxtext/configs/models/kimi-k2-1t.yml) config and check the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/tests/end_to_end/tpu/kimi/Run_Kimi.md).
4647
* \[April 10, 2026\] [DeepSeek-V3.2](https://arxiv.org/pdf/2512.02556) is now supported, featuring DeepSeek Sparse Attention for long context. Try it out with the [deepseek3.2-671b](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/src/maxtext/configs/models/deepseek3.2-671b.yml) config. See the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/tests/end_to_end/tpu/deepseek/Run_DeepSeek.md) for more details.

docs/conf.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,9 @@
165165
r"https://github\.com/jax-ml/jax/commits/.*",
166166
# Ignore Hugging Face settings links which require login
167167
r"https://huggingface\.co/settings/tokens",
168+
# Ignore GitHub PRs and blobs that trigger rate limiting
169+
r"https://github\.com/AI-Hypercomputer/maxtext/pull/.*",
170+
r"https://github\.com/google/maxtext/blob/.*",
168171
]
169172

170173

docs/development/contribute_docs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ in [MyST Markdown syntax](https://myst-parser.readthedocs.io/en/latest/syntax/ty
2424

2525
If you are writing documentation for MaxText, you may want to preview the
2626
documentation site locally to ensure things work as expected before a deployment
27-
to [Read The Docs](https://readthedocs.org/).
27+
to [Read The Docs](https://about.readthedocs.com/?ref=app.readthedocs.org).
2828

2929
First, make sure you
3030
[install MaxText from source](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source)

src/maxtext/checkpoint_conversion/utils/hf_model_configs.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -593,7 +593,6 @@
593593
tie_word_embeddings=False,
594594
torch_dtype="bfloat16",
595595
use_cache=True,
596-
use_sliding_window=False,
597596
vocab_size=151936,
598597
)
599598

@@ -630,7 +629,6 @@
630629
torch_dtype="bfloat16",
631630
transformers_version="4.51.0",
632631
use_cache=True,
633-
use_sliding_window=False,
634632
vocab_size=151936,
635633
)
636634

@@ -668,7 +666,6 @@
668666
transformers_version="4.51.0",
669667
use_cache=True,
670668
use_qk_norm=True,
671-
use_sliding_window=False,
672669
vocab_size=151936,
673670
)
674671

@@ -1076,7 +1073,6 @@ def __init__(self, **kwargs):
10761073
"torch_dtype": "bfloat16",
10771074
"transformers_version": "4.57.0.dev0",
10781075
"use_cache": True,
1079-
"use_sliding_window": False,
10801076
"vocab_size": 151936,
10811077
}
10821078
qwen3_next_80b_a3b_config = transformers.Qwen3NextConfig(**qwen3_next_80b_a3b_dict)

0 commit comments

Comments
 (0)