Skip to content

Workflow to save top-k teacher logits in GCS to use in distillaiton#3193

Merged
copybara-service[bot] merged 2 commits intomainfrom
ajkv-teacher-top-k-distillation
Mar 19, 2026
Merged

Workflow to save top-k teacher logits in GCS to use in distillaiton#3193
copybara-service[bot] merged 2 commits intomainfrom
ajkv-teacher-top-k-distillation

Conversation

@ajkv-google
Copy link
Copy Markdown
Collaborator

@ajkv-google ajkv-google commented Feb 19, 2026

Description

This PR introduces a standalone script (save_top_k_teacher_logits.py) to generate, extract, and save top-K logits and their corresponding vocabulary indices from a teacher model directly to a GCS bucket for offline distillation. In the current form of distillation, both the teacher and student models are loaded together into memory, and the process of distillation is done in an "online" fashion. This can lead to slower training times due to computing the teacher's probability distribution on the fly and training the student model simultaneously. In this change, we are saving the top-k teacher outputs in a GCS bucket (instead of the full probability distribution), so that this can be used to train the student model more efficiently without having to use extra memory to compute teacher logits on the fly.

Tests

Distillation YAML used to run this script: https://paste.googleplex.com/5736358690816000

Ran the following command from the head of the Maxtext directory:
python3 src/maxtext/trainers/post_train/distillation/save_top_k_teacher_logits.py src/maxtext/configs/post_train/distillation.yml

Terminal output for the above command (truncated due to size): https://paste.googleplex.com/4609623547052032

Verified that the ArrayRecord file has been written to a GS bucket in the correct path.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread src/maxtext/trainers/post_train/distillation/save_top_k_teacher_logits.py Outdated
Comment thread src/maxtext/trainers/post_train/distillation/save_top_k_teacher_logits.py Outdated
Comment thread src/maxtext/trainers/post_train/distillation/save_top_k_teacher_logits.py Outdated
Comment thread src/maxtext/trainers/post_train/distillation/save_top_k_teacher_logits.py Outdated
Comment thread src/maxtext/trainers/post_train/distillation/save_top_k_teacher_logits.py Outdated
Comment thread src/maxtext/trainers/post_train/distillation/save_top_k_teacher_logits.py Outdated
@ajkv-google ajkv-google force-pushed the ajkv-teacher-top-k-distillation branch 2 times, most recently from 044cbf7 to 7cffe43 Compare February 20, 2026 22:32
@ajkv-google ajkv-google requested a review from gagika February 23, 2026 19:36
Comment thread src/maxtext/configs/types.py
Comment thread src/maxtext/configs/types.py Outdated
@ajkv-google ajkv-google force-pushed the ajkv-teacher-top-k-distillation branch 2 times, most recently from 4de8440 to 7cddd84 Compare February 25, 2026 19:12
@ajkv-google ajkv-google requested a review from gagika February 26, 2026 23:44
Moved optional keys to be cmd arguments

Added script to verify writing of data to gs bucket

added code to work in multihost

fixed spacing and code formatting

saving top-k teacher logits on one host to local with option to store to a gs bucket

updated code formatting

Updated to provide local filepath as cmd arg
@ajkv-google ajkv-google force-pushed the ajkv-teacher-top-k-distillation branch from 6a79f42 to c5702d8 Compare March 16, 2026 22:54
@copybara-service copybara-service Bot merged commit 0efc6ca into main Mar 19, 2026
40 of 42 checks passed
@copybara-service copybara-service Bot deleted the ajkv-teacher-top-k-distillation branch March 19, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants