Skip to content

New checkpoint saving strategy that keeps every k steps and additionally the n most recent checkpoints.#444

Open
BlueCrescent wants to merge 6 commits intomainfrom
checkpoint_strategy_with_ephemeral_inbetween_checkpoints
Open

New checkpoint saving strategy that keeps every k steps and additionally the n most recent checkpoints.#444
BlueCrescent wants to merge 6 commits intomainfrom
checkpoint_strategy_with_ephemeral_inbetween_checkpoints

Conversation

@BlueCrescent
Copy link
Copy Markdown
Member

What does this PR do?

Adds the stragey.

General Changes

  • Implementation
  • Component

Breaking Changes

Checklist before submitting final PR

  • My PR is minimal and addresses one issue in isolation
  • I have merged the latest version of the target branch into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have checked that all tests run through (python tests/tests.py)
  • I have updated the internal changelog (CHANGELOG_DEV.md)

… additionally the n most recent checkpoints.

Co-authored-by: Copilot <copilot@github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new checkpoint retention strategy to keep checkpoints at every k steps while also preserving the m most recent checkpoints, and wires it into the component registry/config/docs.

Changes:

  • Implemented KeepEveryKStepsAndMMostRecentCheckpointingStrategy and its config model.
  • Registered the new strategy as a configurable component.
  • Added unit tests and documented the new component option.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/checkpointing/test_checkpoint_strategies.py Adds tests and a small simulator for the new checkpoint retention strategy.
src/modalities/registry/components.py Registers the new checkpoint saving strategy component and config.
src/modalities/config/config.py Adds KeepEveryKStepsAndMMostRecentCheckpointingStrategyConfig.
src/modalities/checkpointing/checkpoint_saving_strategies.py Implements the new “keep every k + keep last m” strategy.
docs/components/components.md Documents the new checkpoint strategy component entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/modalities/checkpointing/checkpoint_saving_strategies.py Outdated
Comment thread src/modalities/checkpointing/checkpoint_saving_strategies.py
Comment thread tests/checkpointing/test_checkpoint_strategies.py Outdated
Comment thread docs/components/components.md Outdated
BlueCrescent and others added 5 commits May 8, 2026 16:17
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…tingStrategy does not delete any required checkpoints.

Co-authored-by: Copilot <copilot@github.com>
…k tests.

Co-authored-by: Copilot <copilot@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants