Implementation for soft offline distillation using saved top-k teacher logits by ajkv-google · Pull Request #3382 · AI-Hypercomputer/maxtext

ajkv-google · 2026-03-11T19:49:06Z

Description

This PR introduces an end-to-end offline distillation training pipeline. Previously, the distillation loop executed in an "online" mode, which required both the frozen Teacher model and the learning Student model to be loaded and executed simultaneously during training. This change allows the trainer to load pre-computed, top-K Teacher logits from .array_record files, which allows us to bybass the forward pass for the teacher model during the training loop.

Tests

Tested this code change by running the following command (using Yaml File for Offline Distillation)

python3 src/maxtext/trainers/post_train/distillation/train_distill.py src/maxtext/configs/post_train/distillation.yml steps=100 tokenizer_path="/mnt/ajkv/disks/codebase/maxtext/src/maxtext/assets/tokenizers/tokenizer_llama3.tiktoken"

Truncated output showing the successful run: https://paste.googleplex.com/4879271282737152.

Verified that the training happened successfully and finished the distillation run.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-03-11T19:56:02Z

Codecov Report

❌ Patch coverage is 45.76271% with 32 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ners/post_train/distillation/distillation_utils.py	28.12%	23 Missing ⚠️
.../trainers/post_train/distillation/train_distill.py	66.66%	8 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

vlad-karp

LGTM overall
need a new unit test for this specific path

ajkv-google · 2026-03-13T22:06:04Z

Added unit tests to make sure that in offline mode, only the student model is loaded, while in online mode, both the student and teacher models are loaded. Below are the commands I used to run each of the unit tests:

Test offline distillation:

pytest tests/post_training/unit/train_distill_test.py -k "test_main_offline_mode_skips_teacher_loading"

Test online distillation:

pytest tests/post_training/unit/train_distill_test.py -k "test_main_online_mode_loads_teacher"

… use

…om config

…a_dir to know when to run offfline vs online distillation

…ing the correct models

ajkv-google requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, dipannita08, gagika, gobbleturk, hengtaoguo, igorts-git, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners March 11, 2026 19:49

entrpn reviewed Mar 11, 2026

View reviewed changes

Comment thread src/maxtext/trainers/post_train/distillation/distillation_utils.py Outdated

Comment thread src/maxtext/trainers/post_train/distillation/train_distill.py Outdated

vlad-karp requested changes Mar 12, 2026

View reviewed changes

Comment thread src/maxtext/configs/types.py Outdated

Comment thread src/maxtext/trainers/post_train/distillation/train_distill.py

vlad-karp approved these changes Mar 13, 2026

View reviewed changes

entrpn approved these changes Mar 19, 2026

View reviewed changes

ajkv-google added 5 commits March 19, 2026 22:02

Added train script for offline distillation training

18f2eb6

updated code formatting and style

1da3909

updated iterator to ensure weight updates when training student model

f61a2e5

moved cmd args into the distillation config to make command easier to…

d486003

… use

removed the need for hardcoding arrayrecord file and read directly fr…

56e6f07

…om config

ajkv-google added 2 commits March 19, 2026 22:02

Removed redundant offline_distillation flag and relied on offline_dat…

cc4b912

…a_dir to know when to run offfline vs online distillation

added unit tests to make sure offline and online distillation is load…

beb19b9

…ing the correct models

ajkv-google force-pushed the ajkv/offline-distillation-soft branch from 44c0cf6 to beb19b9 Compare March 19, 2026 22:13

ajkv-google added 2 commits March 19, 2026 22:21

updated spacing and formatting

f124f10

updated tests to add gradient accumulation flags in config

5fb1c6c

ajkv-google added the pull ready label Mar 20, 2026

copybara-service Bot merged commit b723c4e into main Mar 20, 2026
31 of 32 checks passed

copybara-service Bot deleted the ajkv/offline-distillation-soft branch March 20, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation for soft offline distillation using saved top-k teacher logits#3382

Implementation for soft offline distillation using saved top-k teacher logits#3382
copybara-service[bot] merged 9 commits intomainfrom
ajkv/offline-distillation-soft

ajkv-google commented Mar 11, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

vlad-karp left a comment

Uh oh!

Uh oh!

Uh oh!

ajkv-google commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ajkv-google commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

vlad-karp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ajkv-google commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ajkv-google commented Mar 11, 2026 •

edited

Loading

codecov Bot commented Mar 11, 2026 •

edited

Loading

ajkv-google commented Mar 13, 2026 •

edited

Loading