Skip to content

Add distillation reference guide and online-distillation recipe#3729

Draft
gagika wants to merge 2 commits intomainfrom
gagik-distill-doc
Draft

Add distillation reference guide and online-distillation recipe#3729
gagika wants to merge 2 commits intomainfrom
gagik-distill-doc

Conversation

@gagika
Copy link
Copy Markdown
Collaborator

@gagika gagika commented Apr 22, 2026

Description

Adds a new reference guide (docs/guides/distillation.md) covering the online distillation trainer — loss anatomy, α/β/temperature schedules, layer-index selection, monitoring, and troubleshooting — and extends the knowledge-distillation tutorial with an Online Distillation section (Pattern A compression, Pattern B depth-pruning recovery, XPK multi-host, and the offline top-k logits variant). The new guide is linked from docs/guides.md.

Tests

Run most of commands.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@github-actions
Copy link
Copy Markdown

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

This Pull Request introduces comprehensive documentation for online distillation in MaxText, including a new reference guide and an expanded tutorial. It also adds Gemma 4 model details and configuration updates for MoE load balancing.

🔍 General Feedback

  • The new distillation guide is very well-structured, covering architecture, loss anatomy, and practical tuning advice.
  • Most documentation updates are accurate and include helpful examples for both single-host and multi-host (XPK) setups.
  • There is a recurring pattern of backslash escaping for MyST directives in docs/guides.md that appears to be a regression and should be corrected to ensure proper rendering.

Comment thread docs/guides.md
Comment on lines +21 to +24
::::\{grid} 1 2 2 2
:gutter: 2

:::{grid-item-card} ⚡ Optimization
:::\{grid-item-card} ⚡ Optimization
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 The backslash escaping \{grid} and \{grid-item-card} is likely a mistake. In MyST markdown, directives are typically formatted as :::{directive}. Adding a backslash before the curly brace will likely cause the documentation builder to render the brace and directive name as literal text instead of parsing it as a directive. This appears to have been applied to all cards in this file.

Suggested change
::::\{grid} 1 2 2 2
:gutter: 2
:::{grid-item-card} ⚡ Optimization
:::\{grid-item-card} ⚡ Optimization
::::{grid} 1 2 2 2
:gutter: 2
:::{grid-item-card} ⚡ Optimization

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant