Skip to content

Commit 415f50d

Browse files
committed
Merge branch 'main' of github.com:AI-Hypercomputer/maxtext into shuningjin-fix
2 parents 9d9b9d9 + 8a17c3d commit 415f50d

77 files changed

Lines changed: 5968 additions & 1979 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/run_jupyter_notebooks.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,8 @@ jobs:
8181
PYTHONPATH: "${{ github.workspace }}/src"
8282
HF_TOKEN: ${{ secrets.HF_TOKEN }}
8383
MAXTEXT_INSTALLED: ${{ inputs.maxtext_installed }}
84+
# TODO: Fix evaluation in sft_qwen3_demo.ipynb and remove this env variable
85+
RUN_EVALUATION: "False"
8486
run: |
8587
if [ "${MAXTEXT_INSTALLED}" == "true" ]; then
8688
# Move to the directory where code is baked into the image. See the Dockerfile.
@@ -103,7 +105,7 @@ jobs:
103105
104106
for notebook in "$MAXTEXT_NOTEBOOKS_ROOT"/{sft,rl}*.ipynb; do
105107
filename=$(basename "$notebook")
106-
if [[ "$filename" == "sft_qwen3_demo.ipynb" || "$filename" == "sft_llama3_demo_gpu.ipynb" ]]; then
108+
if [[ "$filename" == "sft_llama3_demo_gpu.ipynb" || "$filename" == "maxtext_with_gepa.ipynb" ]]; then
107109
echo "Skipping $filename"
108110
continue
109111
fi

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ See our guide on running MaxText in decoupled mode, without any GCP dependencies
4141

4242
## 🔥 Latest news 🔥
4343

44+
* \[April 18, 2026\] Added a new notebook [maxtext_with_gepa.ipynb](https://github.com/AI-Hypercomputer/maxtext/blob/3c7d8d27864fc12cccac07786f02bd0e5262c982/src/maxtext/examples/maxtext_with_gepa.ipynb) for optimizing AIME prompts using the GEPA framework with Maxtext.
4445
* \[April 14, 2026\] Legacy `MaxText.*` post-training shims have been removed. Please refer to [src/MaxText/README.md](https://github.com/AI-Hypercomputer/maxtext/blob/0536605a8ca116087ed93178433a67e905be566c/src/MaxText/README.md) for details on the new command locations and how to migrate.
4546
* \[April 13, 2026\] Kimi-K2 is now supported, along with MuonClip optimizer. Try the [kimi-k2-1t](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/src/maxtext/configs/models/kimi-k2-1t.yml) config and check the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/tests/end_to_end/tpu/kimi/Run_Kimi.md).
4647
* \[April 10, 2026\] [DeepSeek-V3.2](https://arxiv.org/pdf/2512.02556) is now supported, featuring DeepSeek Sparse Attention for long context. Try it out with the [deepseek3.2-671b](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/src/maxtext/configs/models/deepseek3.2-671b.yml) config. See the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/tests/end_to_end/tpu/deepseek/Run_DeepSeek.md) for more details.
Lines changed: 61 additions & 114 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,44 @@
11
/**
22
* Handles inline editable commands in documentation.
3-
* Replaces placeholders in code blocks with inline input fields.
3+
* Replaces placeholders in code blocks with inline editable spans.
4+
* Using contenteditable spans avoids the "newline on copy" issue caused by inputs.
45
*/
56
document.addEventListener('DOMContentLoaded', () => {
67
const codeBlocks = document.querySelectorAll('div.highlight-sh pre, div.highlight-bash pre, div.highlight-default pre');
78

8-
codeBlocks.forEach(block => {
9+
const placeholders = [
10+
"<BATCH_SIZE_PER_DEVICE>",
11+
"<CHIPS_PER_VM>",
12+
"<CKPT_PATH>",
13+
"<CLUSTER_NAME>",
14+
"<DATA_COLUMNS>",
15+
"<DATASET_NAME>",
16+
"<DATASET_PATH>",
17+
"<GCS_BUCKET>",
18+
"<HF_CKPT_PATH>",
19+
"<HF_MODEL>",
20+
"<HF_TOKEN>",
21+
"<IMAGE_NAME>",
22+
"<LAZY_LOAD>",
23+
"<MODEL_NAME>",
24+
"<NUM_SLICES>",
25+
"<POD_NAME>",
26+
"<PROJECT_ID>",
27+
"<RUN_NAME>",
28+
"<STEPS>",
29+
"<TPU_TYPE>",
30+
"<TRAIN_SPLIT>",
31+
"<VENV_NAME>",
32+
"<ZONE>"
33+
];
934

35+
codeBlocks.forEach(block => {
1036
const originalHTML = block.innerHTML;
11-
12-
const placeholders = [
13-
"<batch size per device>",
14-
"<bucket>",
15-
"<cluster name>",
16-
"<data columns to train on>",
17-
"<Data Columns to Train on>",
18-
"<data split for train>",
19-
"<Data Split for Train>",
20-
"<dataset path>",
21-
"<Docker Image Name>",
22-
"<Fine-Tuning Steps>",
23-
"<Flag to lazy load>",
24-
"<Flag to use ocdbt>",
25-
"<Flag to use zarr3>",
26-
"<folder>",
27-
"<gcs path for MaxText checkpoint>",
28-
"<GCS Path for Output/Logs>",
29-
"<GCS for dataset>",
30-
"<GCP project ID>",
31-
"<GCP zone>",
32-
"<gke version>",
33-
"<GKE Cluster Zone>",
34-
"<Google Cloud Project ID>",
35-
"<Hugging Face Access Token>",
36-
"<Hugging Face access token>",
37-
"<Hugging Face Dataset Name>",
38-
"<Hugging Face dataset name>",
39-
"<Hugging Face Model>",
40-
"<Hugging Face Model to be converted to MaxText>",
41-
"<MaxText Model>",
42-
"<MaxText model name>",
43-
"<Model Name>",
44-
"<model name>",
45-
"<Model Tokenizer>",
46-
"<name for this run>",
47-
"<Name for this run>",
48-
"<Name of GKE Cluster>",
49-
"<Name of Workload>",
50-
"<number of fine-tuning steps to run>",
51-
"<number of slices>",
52-
"<output directory to store Hugging Face checkpoint>",
53-
"<output directory to store MaxText checkpoint>",
54-
"<output directory to store run logs>",
55-
"<path to Hugging Face checkpoint>",
56-
"<path/to/gcr.io>",
57-
"<project id>",
58-
"<project ID>",
59-
"<project>",
60-
"<ramdisk size>",
61-
"<steps>",
62-
"<the number of chips per VM>",
63-
"<Tokenizer>",
64-
"<tokenizer path>",
65-
"<TPU Type>",
66-
"<virtual env name>",
67-
"<your virtual env name>",
68-
"<your zone>",
69-
"<YOUR WORKLOAD NAME>",
70-
"<zone>",
71-
"<zone name>"
72-
];
73-
7437
let newHTML = originalHTML;
7538

7639
placeholders.forEach(placeholder => {
77-
// 1. create robust regex for this placeholder
78-
// escape chars
40+
// 1. Create robust regex for this placeholder
41+
// Escape chars
7942
const escapeRegex = (string) => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
8043

8144
const htmlEscapedKey = placeholder
@@ -86,67 +49,51 @@ document.addEventListener('DOMContentLoaded', () => {
8649
let pattern = '';
8750
for (let i = 0; i < htmlEscapedKey.length; i++) {
8851
const char = htmlEscapedKey[i];
89-
pattern += escapeRegex(char) + '(?:<[^>]+>)*';
52+
// FIX: Avoid matching across our inserted spans by ignoring tags with contenteditable
53+
pattern += escapeRegex(char) + '(?:<(?!span[^>]*contenteditable)[^>]+>)*';
9054
}
9155

9256
const regex = new RegExp(pattern, 'g');
9357

94-
// Replace with an input element
95-
// We use the original placeholder text as placeholder for the input
96-
const inputHTML = `<input class="inline-input" placeholder="${placeholder}" style="width: ${placeholder.length + 2}ch;" />`;
58+
// Replace with a contenteditable span
59+
// The styling mimics an input field but remains strictly inline
60+
const spanHTML = `<span class="inline-input" contenteditable="true" spellcheck="false" data-placeholder="${placeholder}" style="border-bottom: 1px dashed #888; background: rgba(128, 128, 128, 0.15); padding: 0 4px; border-radius: 2px; outline: none;">${htmlEscapedKey}</span>`;
9761

98-
newHTML = newHTML.replace(regex, inputHTML);
62+
newHTML = newHTML.replace(regex, spanHTML);
9963
});
10064

10165
if (newHTML !== originalHTML) {
10266
block.innerHTML = newHTML;
10367
}
10468
});
10569

106-
// Add event listeners to newly created inputs to auto-resize
107-
document.querySelectorAll('.inline-input').forEach(input => {
108-
input.addEventListener('input', function () {
109-
this.style.width = Math.max(this.value.length, this.placeholder.length) + 2 + 'ch';
70+
// Bind behavioral events to the newly created editable spans
71+
document.querySelectorAll('.inline-input').forEach(span => {
72+
73+
// Auto-select the text when clicked, so the user can immediately type over the placeholder
74+
span.addEventListener('focus', function () {
75+
if (this.textContent === this.getAttribute('data-placeholder')) {
76+
const range = document.createRange();
77+
range.selectNodeContents(this);
78+
const sel = window.getSelection();
79+
sel.removeAllRanges();
80+
sel.addRange(range);
81+
}
11082
});
111-
});
112-
113-
/**
114-
* Intercept copy button clicks to include user input values.
115-
* Runs in capture phase to precede sphinx-copybutton's listener.
116-
*/
117-
document.addEventListener('click', (event) => {
118-
// Check if the clicked element is a copy button or inside one
119-
const button = event.target.closest('.copybtn');
120-
if (!button) return;
121-
122-
// Find the associated code block
123-
// Sphinx-copybutton places the button inside .highlight usually
124-
const highlightDiv = button.closest('.highlight');
125-
if (!highlightDiv) return;
12683

127-
const inputs = highlightDiv.querySelectorAll('input.inline-input');
128-
if (inputs.length === 0) return;
129-
130-
const swaps = [];
131-
inputs.forEach(input => {
132-
// Create a temporary span with the input's current value
133-
const span = document.createElement('span');
134-
// If value is empty, fallback to placeholder to match original text behavior
135-
const val = input.value;
136-
span.textContent = val ? val : input.placeholder;
137-
138-
// Mimic input appearance slightly if needed, but plain text is what we want copied
139-
span.style.color = val ? 'inherit' : 'gray';
140-
141-
input.replaceWith(span);
142-
swaps.push({ input, span });
84+
// If the user deletes everything and clicks away, restore the original placeholder
85+
span.addEventListener('blur', function () {
86+
if (this.textContent.trim() === '') {
87+
this.textContent = this.getAttribute('data-placeholder');
88+
}
14389
});
14490

145-
// Revert immediately after the current event loop
146-
setTimeout(() => {
147-
swaps.forEach(({ input, span }) => {
148-
span.replaceWith(input);
149-
});
150-
}, 0);
151-
}, true);
91+
// Prevent 'Enter' from creating a messy multiline command block
92+
span.addEventListener('keydown', function (e) {
93+
if (e.key === 'Enter') {
94+
e.preventDefault();
95+
this.blur(); // Drop focus instead of breaking to a new line
96+
}
97+
});
98+
});
15299
});

docs/build_maxtext.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ pip install uv
5757
# curl -LsSf https://astral.sh/uv/install.sh | sh
5858

5959
# Create virtual environment
60-
export VENV_NAME=<your virtual env name> # e.g., docker_venv
60+
export VENV_NAME=<VENV_NAME> # e.g., docker_venv
6161
uv venv --python 3.12 --seed ${VENV_NAME?}
6262
source ${VENV_NAME?}/bin/activate
6363

@@ -98,7 +98,7 @@ before proceeding with the installation.
9898

9999
```bash
100100
# Create virtual environment
101-
export VENV_NAME=<your virtual env name> # e.g., docker_venv
101+
export VENV_NAME=<VENV_NAME> # e.g., docker_venv
102102
uv venv --python 3.12 --seed ${VENV_NAME?}
103103
source ${VENV_NAME?}/bin/activate
104104

@@ -155,7 +155,7 @@ build_maxtext_docker_image WORKFLOW=post-training
155155

156156
```bash
157157
# Make sure to set `CLOUD_IMAGE_NAME` with your desired image name.
158-
export CLOUD_IMAGE_NAME=<Docker Image Name>
158+
export CLOUD_IMAGE_NAME=<IMAGE_NAME>
159159
upload_maxtext_docker_image CLOUD_IMAGE_NAME=${CLOUD_IMAGE_NAME?}
160160
```
161161

docs/conf.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,9 @@
165165
r"https://github\.com/jax-ml/jax/commits/.*",
166166
# Ignore Hugging Face settings links which require login
167167
r"https://huggingface\.co/settings/tokens",
168+
# Ignore GitHub PRs and blobs that trigger rate limiting
169+
r"https://github\.com/AI-Hypercomputer/maxtext/pull/.*",
170+
r"https://github\.com/google/maxtext/blob/.*",
168171
]
169172

170173

docs/development/contribute_docs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ in [MyST Markdown syntax](https://myst-parser.readthedocs.io/en/latest/syntax/ty
2424

2525
If you are writing documentation for MaxText, you may want to preview the
2626
documentation site locally to ensure things work as expected before a deployment
27-
to [Read The Docs](https://readthedocs.org/).
27+
to [Read The Docs](https://about.readthedocs.com/?ref=app.readthedocs.org).
2828

2929
First, make sure you
3030
[install MaxText from source](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source)

docs/guides/checkpointing_solutions/convert_checkpoint.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,10 @@ Use the `to_maxtext.py` script to convert a Hugging Face model checkpoint into a
4040
python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
4141

4242
# Setup environment variables
43-
export MODEL=<Hugging Face Model to be converted to MaxText> # e.g. 'llama3.1-8b-Instruct'
44-
export BASE_OUTPUT_DIRECTORY=<output directory to store MaxText checkpoint> # e.g., gs://my-bucket/my-checkpoint-directory
43+
export MODEL=<HF_MODEL> # e.g. 'llama3.1-8b-Instruct'
44+
export BASE_OUTPUT_DIRECTORY=<CKPT_PATH> # e.g., gs://my-bucket/my-checkpoint-directory
4545
export USE_PATHWAYS=0 # Set to 1 for Pathways, 0 for McJAX
46-
export LAZY_LOAD_TENSORS=<Flag to lazy load> # Set to True to save RAM
46+
export LAZY_LOAD_TENSORS=<LAZY_LOAD> # Set to True to save RAM
4747
```
4848

4949
### Run Conversion
@@ -93,9 +93,9 @@ Use the `to_huggingface.py` script to convert a MaxText checkpoint into the Hugg
9393
python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
9494

9595
# Setup environment variables
96-
export MODEL=<MaxText model name> # e.g. 'qwen3-4b'
97-
export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs://my-bucket/my-model-checkpoint/0/items
98-
export BASE_OUTPUT_DIRECTORY=<output directory to store Hugging Face checkpoint> # e.g., gs://my-bucket/my-checkpoint-directory
96+
export MODEL=<MODEL_NAME> # e.g. 'qwen3-4b'
97+
export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
98+
export BASE_OUTPUT_DIRECTORY=<HF_CKPT_PATH> # e.g., gs://my-bucket/my-checkpoint-directory
9999
```
100100

101101
### Run Conversion
@@ -134,9 +134,9 @@ To ensure the conversion was successful, you can use the [test script](https://g
134134

135135
```bash
136136
# Setup environment variables
137-
export MODEL=<MaxText model name> # e.g. 'qwen3-4b'
138-
export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs://my-bucket/my-model-checkpoint/0/items
139-
export HF_CKPT_PATH=<path to Hugging Face checkpoint> # e.g., gs://my-bucket/my-checkpoint-directory
137+
export MODEL=<MODEL_NAME> # e.g. 'qwen3-4b'
138+
export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
139+
export HF_CKPT_PATH=<HF_CKPT_PATH> # e.g., gs://my-bucket/my-checkpoint-directory
140140
```
141141

142142
### Run Correctness Test

docs/install_maxtext.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ This is the easiest way to get started with the latest stable version.
3333
1. **Create a virtual environment:**
3434

3535
```bash
36-
uv venv --python 3.12 --seed <virtual env name>
37-
source <virtual env name>/bin/activate
36+
uv venv --python 3.12 --seed <VENV_NAME>
37+
source <VENV_NAME>/bin/activate
3838
```
3939

4040
2. **Install MaxText and its dependencies.**
@@ -131,8 +131,8 @@ before proceeding with the installation.
131131
2. Create virtual environment:
132132
133133
```bash
134-
uv venv --python 3.12 --seed <virtual env name>
135-
source <virtual env name>/bin/activate
134+
uv venv --python 3.12 --seed <VENV_NAME>
135+
source <VENV_NAME>/bin/activate
136136
```
137137
138138
3. Install dependencies in editable mode. Choose a single installation option

docs/tutorials/posttraining/full_finetuning.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -41,19 +41,19 @@ placeholders with your actual values.
4141
# -- Model configuration --
4242
# The MaxText model name. See `src/maxtext/configs/types.py` for `ModelName` for a
4343
# full list of supported models.
44-
export MODEL=<MaxText Model> # e.g., 'llama3.1-8b-Instruct'
44+
export MODEL=<MODEL_NAME> # e.g., 'llama3.1-8b-Instruct'
4545

4646
# -- MaxText configuration --
4747
# Use a GCS bucket you own to store logs and checkpoints. Ideally in the same
4848
# region as your TPUs to minimize latency and costs.
4949
# You can list your buckets and their locations in the
5050
# [Cloud Console](https://console.cloud.google.com/storage/browser).
51-
export BASE_OUTPUT_DIRECTORY=<gcs bucket path> # e.g., gs://my-bucket/maxtext-runs
51+
export BASE_OUTPUT_DIRECTORY=<GCS_BUCKET> # e.g., gs://my-bucket/maxtext-runs
5252

5353
# An arbitrary string to identify this specific run.
5454
# We recommend to include the model, user, and timestamp.
5555
# Note: Kubernetes requires workload names to be valid DNS labels (lowercase, no underscores or periods).
56-
export RUN_NAME=<Name for this run>
56+
export RUN_NAME=<RUN_NAME>
5757
```
5858

5959
## Hugging Face checkpoint to Maxtext checkpoint
@@ -65,15 +65,15 @@ This section explains how to prepare your model checkpoint for use with MaxText.
6565
If you already have a MaxText-compatible model checkpoint, simply set the following environment variable and move on to the next section.
6666

6767
```sh
68-
export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs://my-bucket/my-model-checkpoint/0/items
68+
export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
6969
```
7070

7171
### Option 2: Converting a Hugging Face checkpoint
7272

7373
Refer the steps in [Hugging Face to MaxText](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/guides/checkpointing_solutions/convert_checkpoint.html#hugging-face-to-maxtext) to convert a hugging face checkpoint to MaxText. Make sure you have correct checkpoint files converted and saved. Similar as Option 1, you can set the following environment and move on.
7474

7575
```bash
76-
export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # gs://my-bucket/my-checkpoint-directory/0/items
76+
export MAXTEXT_CKPT_PATH=<CKPT_PATH> # gs://my-bucket/my-checkpoint-directory/0/items
7777
```
7878

7979
## Dataset
@@ -90,8 +90,8 @@ Run these steps once per project prior to any local development or cluster exper
9090
MaxText assumes these GCS buckets are created in the same project and that it has permissions to read and write from them.
9191

9292
```sh
93-
export PROJECT_ID=<Google Cloud Project ID>
94-
export DATASET_GCS_BUCKET=<GCS for dataset> # e.g., gs://my-bucket/my-dataset
93+
export PROJECT_ID=<PROJECT_ID>
94+
export DATASET_GCS_BUCKET=<DATASET_PATH> # e.g., gs://my-bucket/my-dataset
9595

9696
bash tools/data_generation/download_dataset.sh ${PROJECT_ID?} ${DATASET_GCS_BUCKET?}
9797
```

0 commit comments

Comments
 (0)