Skip to content

Wire max_checkpoints through SFT, DPO, and GRPO paths#1701

Open
TimDettmers wants to merge 2 commits into
mainfrom
timd/wire-max-checkpoints
Open

Wire max_checkpoints through SFT, DPO, and GRPO paths#1701
TimDettmers wants to merge 2 commits into
mainfrom
timd/wire-max-checkpoints

Conversation

@TimDettmers
Copy link
Copy Markdown

@TimDettmers TimDettmers commented May 27, 2026

Summary

  • Passes keep_last_n_checkpoints (default: 3, from CheckpointConfig) to CheckpointerCallback.max_checkpoints across all OLMo-core training paths (SFT, DPO, GRPO)
  • Centralizes the -1 → None (unlimited) convention inside build_checkpointer_callback so call sites pass the value directly
  • GRPOExperimentConfig already inherits keep_last_n_checkpoints from CheckpointConfig — no new config fields needed

Depends on

Files changed

  • olmo_core_utils.pybuild_checkpointer_callback and build_base_callbacks accept and pass through max_checkpoints; -1 mapped to None inside the builder
  • dpo.py — passes args.keep_last_n_checkpoints through
  • olmo_core_finetune.py — passes args.checkpoint.keep_last_n_checkpoints through
  • grpo_olmo_core_actor.py — passes self.grpo_config.keep_last_n_checkpoints through

Test plan

  • Verify SFT training with default keep_last_n_checkpoints=3 trims checkpoints
  • Verify GRPO training with --keep_last_n_checkpoints -1 keeps all checkpoints
  • Verify DPO training passes the parameter correctly
  • Existing tests pass unchanged (new parameter has defaults matching current behavior)

Pass the existing keep_last_n_checkpoints config (default=3) to
CheckpointerCallback's new max_checkpoints parameter across SFT, DPO,
and GRPO training paths. Also adds keep_last_n_checkpoints to
GRPOExperimentConfig (it was missing, unlike the SFT/DPO configs).

Depends on allenai/OLMo-core#timd/add-max-checkpoints which adds the
max_checkpoints parameter to CheckpointerCallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for limiting the number of saved checkpoints via a new keep_last_n_checkpoints configuration option, which is integrated across DPO, GRPO, and SFT training pipelines. The feedback suggests removing a redundant field definition in GRPOExperimentConfig since it is already inherited, and centralizing the mapping logic that converts negative values (representing unlimited checkpoints) to None inside build_checkpointer_callback to simplify and clean up the caller sites.

Comment thread open_instruct/grpo_utils.py Outdated
Comment on lines +206 to +207
keep_last_n_checkpoints: int = 3
"""Maximum number of permanent checkpoints to keep. -1 for unlimited."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

GRPOExperimentConfig inherits from olmo_core_utils.CheckpointConfig (via multiple inheritance), which already defines the keep_last_n_checkpoints field with a default value of 3. Redefining it here is redundant and can be safely removed to keep the configuration clean and avoid duplication.

Comment on lines +205 to 216
checkpointing_steps: int,
ephemeral_save_interval: int | None,
save_async: bool = True,
max_checkpoints: int | None = 3,
) -> CheckpointerCallback:
"""Construct a CheckpointerCallback with shared Open Instruct defaults."""
return CheckpointerCallback(
save_interval=checkpointing_steps, ephemeral_save_interval=ephemeral_save_interval, save_async=save_async
save_interval=checkpointing_steps,
ephemeral_save_interval=ephemeral_save_interval,
save_async=save_async,
max_checkpoints=max_checkpoints,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of repeating the ternary operator args.keep_last_n_checkpoints if args.keep_last_n_checkpoints >= 0 else None across multiple caller files (dpo.py, grpo_olmo_core_actor.py, olmo_core_finetune.py), we can centralize this mapping logic inside build_checkpointer_callback. This keeps the callers clean and ensures consistent handling of the -1 (unlimited) convention.

def build_checkpointer_callback(
    checkpointing_steps: int,
    ephemeral_save_interval: int | None,
    save_async: bool = True,
    max_checkpoints: int | None = 3,
) -> CheckpointerCallback:
    """Construct a CheckpointerCallback with shared Open Instruct defaults."""
    return CheckpointerCallback(
        save_interval=checkpointing_steps,
        ephemeral_save_interval=ephemeral_save_interval,
        save_async=save_async,
        max_checkpoints=max_checkpoints if (max_checkpoints is not None and max_checkpoints >= 0) else None,
    )

Comment thread open_instruct/dpo.py Outdated
wandb_project=args.wandb_project,
wandb_entity=args.wandb_entity,
save_async=False,
max_checkpoints=args.keep_last_n_checkpoints if args.keep_last_n_checkpoints >= 0 else None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the -1 to None mapping logic centralized inside build_checkpointer_callback, we can simplify this call by passing args.keep_last_n_checkpoints directly.

Suggested change
max_checkpoints=args.keep_last_n_checkpoints if args.keep_last_n_checkpoints >= 0 else None,
max_checkpoints=args.keep_last_n_checkpoints,

Comment thread open_instruct/grpo_olmo_core_actor.py Outdated
checkpointing_steps=self.grpo_config.checkpoint_state_freq, ephemeral_save_interval=None
checkpointing_steps=self.grpo_config.checkpoint_state_freq,
ephemeral_save_interval=None,
max_checkpoints=self.grpo_config.keep_last_n_checkpoints if self.grpo_config.keep_last_n_checkpoints >= 0 else None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the -1 to None mapping logic centralized inside build_checkpointer_callback, we can simplify this call by passing self.grpo_config.keep_last_n_checkpoints directly.

Suggested change
max_checkpoints=self.grpo_config.keep_last_n_checkpoints if self.grpo_config.keep_last_n_checkpoints >= 0 else None,
max_checkpoints=self.grpo_config.keep_last_n_checkpoints,

Comment thread open_instruct/olmo_core_finetune.py Outdated
with_tracking=args.logging.with_tracking,
wandb_project=args.logging.wandb_project,
wandb_entity=args.logging.wandb_entity or "ai2-llm",
max_checkpoints=args.checkpoint.keep_last_n_checkpoints if args.checkpoint.keep_last_n_checkpoints >= 0 else None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the -1 to None mapping logic centralized inside build_checkpointer_callback, we can simplify this call by passing args.checkpoint.keep_last_n_checkpoints directly.

Suggested change
max_checkpoints=args.checkpoint.keep_last_n_checkpoints if args.checkpoint.keep_last_n_checkpoints >= 0 else None,
max_checkpoints=args.checkpoint.keep_last_n_checkpoints,

… field

- Move -1 to None conversion into build_checkpointer_callback
- Remove redundant keep_last_n_checkpoints from GRPOExperimentConfig (inherited from CheckpointConfig)
- Simplify all call sites to pass value directly
- Add CHANGELOG entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mnoukhov
Copy link
Copy Markdown
Contributor

Looks good to me. I guess wait until allenai/OLMo-core#694 has been merged and update olmo-core version in pyproject.toml to include it.

Also consider keeping the name keep_last_n_checkpoints as that feels a bit clearer than max_checkpoints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants