AI-Hypercomputer
diff --git a/‎.github/workflows/run_jupyter_notebooks.yml‎
Lines changed: 3 additions & 1 deletion b/‎.github/workflows/run_jupyter_notebooks.yml‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/_static/js/editable_commands.js‎
Lines changed: 61 additions & 114 deletions b/‎docs/_static/js/editable_commands.js‎
Lines changed: 61 additions & 114 deletions
diff --git a/‎docs/build_maxtext.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/build_maxtext.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/conf.py‎
Lines changed: 3 additions & 0 deletions b/‎docs/conf.py‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/development/contribute_docs.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/development/contribute_docs.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/guides/checkpointing_solutions/convert_checkpoint.md‎
Lines changed: 9 additions & 9 deletions b/‎docs/guides/checkpointing_solutions/convert_checkpoint.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎docs/install_maxtext.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/install_maxtext.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/tutorials/posttraining/full_finetuning.md‎
Lines changed: 7 additions & 7 deletions b/‎docs/tutorials/posttraining/full_finetuning.md‎
Lines changed: 7 additions & 7 deletions
@@ -81,6 +81,8 @@ jobs:
           PYTHONPATH: "${{ github.workspace }}/src"
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
           MAXTEXT_INSTALLED: ${{ inputs.maxtext_installed }}
+          # TODO: Fix evaluation in sft_qwen3_demo.ipynb and remove this env variable
+          RUN_EVALUATION: "False"
         run: |
           if [ "${MAXTEXT_INSTALLED}" == "true" ]; then
             # Move to the directory where code is baked into the image. See the Dockerfile.
@@ -103,7 +105,7 @@ jobs:
 
           for notebook in "$MAXTEXT_NOTEBOOKS_ROOT"/{sft,rl}*.ipynb; do
             filename=$(basename "$notebook")
-            if [[ "$filename" == "sft_qwen3_demo.ipynb" || "$filename" == "sft_llama3_demo_gpu.ipynb" ]]; then
+            if [[ "$filename" == "sft_llama3_demo_gpu.ipynb" || "$filename" == "maxtext_with_gepa.ipynb" ]]; then
               echo "Skipping $filename"
               continue
             fi
 
@@ -41,6 +41,7 @@ See our guide on running MaxText in decoupled mode, without any GCP dependencies
 
 ## 🔥 Latest news 🔥
 
+* \[April 18, 2026\] Added a new notebook [maxtext_with_gepa.ipynb](https://github.com/AI-Hypercomputer/maxtext/blob/3c7d8d27864fc12cccac07786f02bd0e5262c982/src/maxtext/examples/maxtext_with_gepa.ipynb) for optimizing AIME prompts using the GEPA framework with Maxtext.
 * \[April 14, 2026\] Legacy `MaxText.*` post-training shims have been removed. Please refer to [src/MaxText/README.md](https://github.com/AI-Hypercomputer/maxtext/blob/0536605a8ca116087ed93178433a67e905be566c/src/MaxText/README.md) for details on the new command locations and how to migrate. 
 * \[April 13, 2026\] Kimi-K2 is now supported, along with MuonClip optimizer. Try the [kimi-k2-1t](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/src/maxtext/configs/models/kimi-k2-1t.yml) config and check the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/tests/end_to_end/tpu/kimi/Run_Kimi.md).  
 * \[April 10, 2026\] [DeepSeek-V3.2](https://arxiv.org/pdf/2512.02556) is now supported, featuring DeepSeek Sparse Attention for long context. Try it out with the [deepseek3.2-671b](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/src/maxtext/configs/models/deepseek3.2-671b.yml) config. See the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/tests/end_to_end/tpu/deepseek/Run_DeepSeek.md) for more details.  
 
@@ -1,81 +1,44 @@
 /**
  * Handles inline editable commands in documentation.
- * Replaces placeholders in code blocks with inline input fields.
+ * Replaces placeholders in code blocks with inline editable spans.
+ * Using contenteditable spans avoids the "newline on copy" issue caused by inputs.
  */
 document.addEventListener('DOMContentLoaded', () => {
   const codeBlocks = document.querySelectorAll('div.highlight-sh pre, div.highlight-bash pre, div.highlight-default pre');
 
-  codeBlocks.forEach(block => {
+  const placeholders = [
+    "<BATCH_SIZE_PER_DEVICE>",
+    "<CHIPS_PER_VM>",
+    "<CKPT_PATH>",
+    "<CLUSTER_NAME>",
+    "<DATA_COLUMNS>",
+    "<DATASET_NAME>",
+    "<DATASET_PATH>",
+    "<GCS_BUCKET>",
+    "<HF_CKPT_PATH>",
+    "<HF_MODEL>",
+    "<HF_TOKEN>",
+    "<IMAGE_NAME>",
+    "<LAZY_LOAD>",
+    "<MODEL_NAME>",
+    "<NUM_SLICES>",
+    "<POD_NAME>",
+    "<PROJECT_ID>",
+    "<RUN_NAME>",
+    "<STEPS>",
+    "<TPU_TYPE>",
+    "<TRAIN_SPLIT>",
+    "<VENV_NAME>",
+    "<ZONE>"
+  ];
 
+  codeBlocks.forEach(block => {
     const originalHTML = block.innerHTML;
-
-    const placeholders = [
-      "<batch size per device>",
-      "<bucket>",
-      "<cluster name>",
-      "<data columns to train on>",
-      "<Data Columns to Train on>",
-      "<data split for train>",
-      "<Data Split for Train>",
-      "<dataset path>",
-      "<Docker Image Name>",
-      "<Fine-Tuning Steps>",
-      "<Flag to lazy load>",
-      "<Flag to use ocdbt>",
-      "<Flag to use zarr3>",
-      "<folder>",
-      "<gcs path for MaxText checkpoint>",
-      "<GCS Path for Output/Logs>",
-      "<GCS for dataset>",
-      "<GCP project ID>",
-      "<GCP zone>",
-      "<gke version>",
-      "<GKE Cluster Zone>",
-      "<Google Cloud Project ID>",
-      "<Hugging Face Access Token>",
-      "<Hugging Face access token>",
-      "<Hugging Face Dataset Name>",
-      "<Hugging Face dataset name>",
-      "<Hugging Face Model>",
-      "<Hugging Face Model to be converted to MaxText>",
-      "<MaxText Model>",
-      "<MaxText model name>",
-      "<Model Name>",
-      "<model name>",
-      "<Model Tokenizer>",
-      "<name for this run>",
-      "<Name for this run>",
-      "<Name of GKE Cluster>",
-      "<Name of Workload>",
-      "<number of fine-tuning steps to run>",
-      "<number of slices>",
-      "<output directory to store Hugging Face checkpoint>",
-      "<output directory to store MaxText checkpoint>",
-      "<output directory to store run logs>",
-      "<path to Hugging Face checkpoint>",
-      "<path/to/gcr.io>",
-      "<project id>",
-      "<project ID>",
-      "<project>",
-      "<ramdisk size>",
-      "<steps>",
-      "<the number of chips per VM>",
-      "<Tokenizer>",
-      "<tokenizer path>",
-      "<TPU Type>",
-      "<virtual env name>",
-      "<your virtual env name>",
-      "<your zone>",
-      "<YOUR WORKLOAD NAME>",
-      "<zone>",
-      "<zone name>"
-    ];
-
     let newHTML = originalHTML;
 
     placeholders.forEach(placeholder => {
-      // 1. create robust regex for this placeholder
-      // escape chars
+      // 1. Create robust regex for this placeholder
+      // Escape chars
       const escapeRegex = (string) => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
 
       const htmlEscapedKey = placeholder
@@ -86,67 +49,51 @@ document.addEventListener('DOMContentLoaded', () => {
       let pattern = '';
       for (let i = 0; i < htmlEscapedKey.length; i++) {
         const char = htmlEscapedKey[i];
-        pattern += escapeRegex(char) + '(?:<[^>]+>)*';
+        // FIX: Avoid matching across our inserted spans by ignoring tags with contenteditable
+        pattern += escapeRegex(char) + '(?:<(?!span[^>]*contenteditable)[^>]+>)*';
       }
 
       const regex = new RegExp(pattern, 'g');
 
-      // Replace with an input element
-      // We use the original placeholder text as placeholder for the input
-      const inputHTML = `<input class="inline-input" placeholder="${placeholder}" style="width: ${placeholder.length + 2}ch;" />`;
+      // Replace with a contenteditable span
+      // The styling mimics an input field but remains strictly inline
+      const spanHTML = `<span class="inline-input" contenteditable="true" spellcheck="false" data-placeholder="${placeholder}" style="border-bottom: 1px dashed #888; background: rgba(128, 128, 128, 0.15); padding: 0 4px; border-radius: 2px; outline: none;">${htmlEscapedKey}</span>`;
 
-      newHTML = newHTML.replace(regex, inputHTML);
+      newHTML = newHTML.replace(regex, spanHTML);
     });
 
     if (newHTML !== originalHTML) {
       block.innerHTML = newHTML;
     }
   });
 
-  // Add event listeners to newly created inputs to auto-resize
-  document.querySelectorAll('.inline-input').forEach(input => {
-    input.addEventListener('input', function () {
-      this.style.width = Math.max(this.value.length, this.placeholder.length) + 2 + 'ch';
+  // Bind behavioral events to the newly created editable spans
+  document.querySelectorAll('.inline-input').forEach(span => {
+
+    // Auto-select the text when clicked, so the user can immediately type over the placeholder
+    span.addEventListener('focus', function () {
+      if (this.textContent === this.getAttribute('data-placeholder')) {
+        const range = document.createRange();
+        range.selectNodeContents(this);
+        const sel = window.getSelection();
+        sel.removeAllRanges();
+        sel.addRange(range);
+      }
     });
-  });
-
-  /**
-   * Intercept copy button clicks to include user input values.
-   * Runs in capture phase to precede sphinx-copybutton's listener.
-   */
-  document.addEventListener('click', (event) => {
-    // Check if the clicked element is a copy button or inside one
-    const button = event.target.closest('.copybtn');
-    if (!button) return;
-
-    // Find the associated code block
-    // Sphinx-copybutton places the button inside .highlight usually
-    const highlightDiv = button.closest('.highlight');
-    if (!highlightDiv) return;
 
-    const inputs = highlightDiv.querySelectorAll('input.inline-input');
-    if (inputs.length === 0) return;
-
-    const swaps = [];
-    inputs.forEach(input => {
-      // Create a temporary span with the input's current value
-      const span = document.createElement('span');
-      // If value is empty, fallback to placeholder to match original text behavior
-      const val = input.value;
-      span.textContent = val ? val : input.placeholder;
-
-      // Mimic input appearance slightly if needed, but plain text is what we want copied
-      span.style.color = val ? 'inherit' : 'gray';
-
-      input.replaceWith(span);
-      swaps.push({ input, span });
+    // If the user deletes everything and clicks away, restore the original placeholder
+    span.addEventListener('blur', function () {
+      if (this.textContent.trim() === '') {
+        this.textContent = this.getAttribute('data-placeholder');
+      }
     });
 
-    // Revert immediately after the current event loop
-    setTimeout(() => {
-      swaps.forEach(({ input, span }) => {
-        span.replaceWith(input);
-      });
-    }, 0);
-  }, true);
+    // Prevent 'Enter' from creating a messy multiline command block
+    span.addEventListener('keydown', function (e) {
+      if (e.key === 'Enter') {
+        e.preventDefault();
+        this.blur(); // Drop focus instead of breaking to a new line
+      }
+    });
+  });
 });
@@ -57,7 +57,7 @@ pip install uv
 # curl -LsSf https://astral.sh/uv/install.sh | sh
 
 # Create virtual environment
-export VENV_NAME=<your virtual env name> # e.g., docker_venv
+export VENV_NAME=<VENV_NAME> # e.g., docker_venv
 uv venv --python 3.12 --seed ${VENV_NAME?}
 source ${VENV_NAME?}/bin/activate
 
@@ -98,7 +98,7 @@ before proceeding with the installation.
 
 ```bash
 # Create virtual environment
-export VENV_NAME=<your virtual env name> # e.g., docker_venv
+export VENV_NAME=<VENV_NAME> # e.g., docker_venv
 uv venv --python 3.12 --seed ${VENV_NAME?}
 source ${VENV_NAME?}/bin/activate
 
@@ -155,7 +155,7 @@ build_maxtext_docker_image WORKFLOW=post-training
 
 ```bash
 # Make sure to set `CLOUD_IMAGE_NAME` with your desired image name.
-export CLOUD_IMAGE_NAME=<Docker Image Name>
+export CLOUD_IMAGE_NAME=<IMAGE_NAME>
 upload_maxtext_docker_image CLOUD_IMAGE_NAME=${CLOUD_IMAGE_NAME?}
 ```
 
 
@@ -165,6 +165,9 @@
     r"https://github\.com/jax-ml/jax/commits/.*",
     # Ignore Hugging Face settings links which require login
     r"https://huggingface\.co/settings/tokens",
+    # Ignore GitHub PRs and blobs that trigger rate limiting
+    r"https://github\.com/AI-Hypercomputer/maxtext/pull/.*",
+    r"https://github\.com/google/maxtext/blob/.*",
 ]
 
 
 
@@ -24,7 +24,7 @@ in [MyST Markdown syntax](https://myst-parser.readthedocs.io/en/latest/syntax/ty
 
 If you are writing documentation for MaxText, you may want to preview the
 documentation site locally to ensure things work as expected before a deployment
-to [Read The Docs](https://readthedocs.org/).
+to [Read The Docs](https://about.readthedocs.com/?ref=app.readthedocs.org).
 
 First, make sure you
 [install MaxText from source](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source)
 
@@ -40,10 +40,10 @@ Use the `to_maxtext.py` script to convert a Hugging Face model checkpoint into a
 python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
 
 # Setup environment variables
-export MODEL=<Hugging Face Model to be converted to MaxText> # e.g. 'llama3.1-8b-Instruct'
-export BASE_OUTPUT_DIRECTORY=<output directory to store MaxText checkpoint> # e.g., gs://my-bucket/my-checkpoint-directory
+export MODEL=<HF_MODEL> # e.g. 'llama3.1-8b-Instruct'
+export BASE_OUTPUT_DIRECTORY=<CKPT_PATH> # e.g., gs://my-bucket/my-checkpoint-directory
 export USE_PATHWAYS=0 # Set to 1 for Pathways, 0 for McJAX
-export LAZY_LOAD_TENSORS=<Flag to lazy load> # Set to True to save RAM
+export LAZY_LOAD_TENSORS=<LAZY_LOAD> # Set to True to save RAM
 ```
 
 ### Run Conversion
@@ -93,9 +93,9 @@ Use the `to_huggingface.py` script to convert a MaxText checkpoint into the Hugg
 python3 -m pip install torch --index-url https://download.pytorch.org/whl/cpu
 
 # Setup environment variables
-export MODEL=<MaxText model name> # e.g. 'qwen3-4b'
-export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs://my-bucket/my-model-checkpoint/0/items
-export BASE_OUTPUT_DIRECTORY=<output directory to store Hugging Face checkpoint> # e.g., gs://my-bucket/my-checkpoint-directory
+export MODEL=<MODEL_NAME> # e.g. 'qwen3-4b'
+export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
+export BASE_OUTPUT_DIRECTORY=<HF_CKPT_PATH> # e.g., gs://my-bucket/my-checkpoint-directory
 ```
 
 ### Run Conversion
@@ -134,9 +134,9 @@ To ensure the conversion was successful, you can use the [test script](https://g
 
 ```bash
 # Setup environment variables
-export MODEL=<MaxText model name> # e.g. 'qwen3-4b'
-export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs://my-bucket/my-model-checkpoint/0/items
-export HF_CKPT_PATH=<path to Hugging Face checkpoint> # e.g., gs://my-bucket/my-checkpoint-directory
+export MODEL=<MODEL_NAME> # e.g. 'qwen3-4b'
+export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
+export HF_CKPT_PATH=<HF_CKPT_PATH> # e.g., gs://my-bucket/my-checkpoint-directory
 ```
 
 ### Run Correctness Test
 
@@ -33,8 +33,8 @@ This is the easiest way to get started with the latest stable version.
 1. **Create a virtual environment:**
 
    ```bash
-   uv venv --python 3.12 --seed <virtual env name>
-   source <virtual env name>/bin/activate
+   uv venv --python 3.12 --seed <VENV_NAME>
+   source <VENV_NAME>/bin/activate
    ```
 
 2. **Install MaxText and its dependencies.**
@@ -131,8 +131,8 @@ before proceeding with the installation.
 2. Create virtual environment:
 
    ```bash
-   uv venv --python 3.12 --seed <virtual env name>
-   source <virtual env name>/bin/activate
+   uv venv --python 3.12 --seed <VENV_NAME>
+   source <VENV_NAME>/bin/activate
    ```
 
 3. Install dependencies in editable mode. Choose a single installation option
 
@@ -41,19 +41,19 @@ placeholders with your actual values.
 # -- Model configuration --
 # The MaxText model name. See `src/maxtext/configs/types.py` for `ModelName` for a
 # full list of supported models.
-export MODEL=<MaxText Model> # e.g., 'llama3.1-8b-Instruct'
+export MODEL=<MODEL_NAME> # e.g., 'llama3.1-8b-Instruct'
 
 # -- MaxText configuration --
 # Use a GCS bucket you own to store logs and checkpoints. Ideally in the same
 # region as your TPUs to minimize latency and costs.
 # You can list your buckets and their locations in the
 # [Cloud Console](https://console.cloud.google.com/storage/browser).
-export BASE_OUTPUT_DIRECTORY=<gcs bucket path> # e.g., gs://my-bucket/maxtext-runs
+export BASE_OUTPUT_DIRECTORY=<GCS_BUCKET> # e.g., gs://my-bucket/maxtext-runs
 
 # An arbitrary string to identify this specific run.
 # We recommend to include the model, user, and timestamp.
 # Note: Kubernetes requires workload names to be valid DNS labels (lowercase, no underscores or periods).
-export RUN_NAME=<Name for this run>
+export RUN_NAME=<RUN_NAME>
 ```
 
 ## Hugging Face checkpoint to Maxtext checkpoint
@@ -65,15 +65,15 @@ This section explains how to prepare your model checkpoint for use with MaxText.
 If you already have a MaxText-compatible model checkpoint, simply set the following environment variable and move on to the next section.
 
 ```sh
-export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs://my-bucket/my-model-checkpoint/0/items
+export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items
 ```
 
 ### Option 2: Converting a Hugging Face checkpoint
 
 Refer the steps in [Hugging Face to MaxText](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/guides/checkpointing_solutions/convert_checkpoint.html#hugging-face-to-maxtext) to convert a hugging face checkpoint to MaxText. Make sure you have correct checkpoint files converted and saved. Similar as Option 1, you can set the following environment and move on.
 
 ```bash
-export MAXTEXT_CKPT_PATH=<gcs path for MaxText checkpoint> # gs://my-bucket/my-checkpoint-directory/0/items
+export MAXTEXT_CKPT_PATH=<CKPT_PATH> # gs://my-bucket/my-checkpoint-directory/0/items
 ```
 
 ## Dataset
@@ -90,8 +90,8 @@ Run these steps once per project prior to any local development or cluster exper
 MaxText assumes these GCS buckets are created in the same project and that it has permissions to read and write from them.
 
 ```sh
-export PROJECT_ID=<Google Cloud Project ID>
-export DATASET_GCS_BUCKET=<GCS for dataset> # e.g., gs://my-bucket/my-dataset
+export PROJECT_ID=<PROJECT_ID>
+export DATASET_GCS_BUCKET=<DATASET_PATH> # e.g., gs://my-bucket/my-dataset
 
 bash tools/data_generation/download_dataset.sh ${PROJECT_ID?} ${DATASET_GCS_BUCKET?}
 ```
Original file line number	Diff line number	Diff line change
`@@ -165,6 +165,9 @@`
`165`	`165`	`r"https://github\.com/jax-ml/jax/commits/.*",`
`166`	`166`	`# Ignore Hugging Face settings links which require login`
`167`	`167`	`r"https://huggingface\.co/settings/tokens",`
	`168`	`+ # Ignore GitHub PRs and blobs that trigger rate limiting`
	`169`	`+ r"https://github\.com/AI-Hypercomputer/maxtext/pull/.*",`
	`170`	`+ r"https://github\.com/google/maxtext/blob/.*",`
`168`	`171`	`]`
`169`	`172`
`170`	`173`