Skip to content

feat: add stop condition to model customization trainers#5579

Merged
mollyheamazon merged 1 commit intoaws:masterfrom
mollyheamazon:feat/stop-condition
Feb 26, 2026
Merged

feat: add stop condition to model customization trainers#5579
mollyheamazon merged 1 commit intoaws:masterfrom
mollyheamazon:feat/stop-condition

Conversation

@mollyheamazon
Copy link
Copy Markdown
Contributor

Add stopping_condition parameter to model customization trainers

Problem

Customers need to run multi-day training jobs with large datasets (1M+ samples), but there was no way to override the training runtime limit through the trainer APIs. SageMaker Training Jobs support up to 28 days, but the SDK didn't expose this configuration.

Solution

Added stopping_condition parameter to all model customization trainers (SFT, DPO, RLVR, RLAIF) following the ModelTrainer pattern.

Changes

  • Added stopping_condition: Optional[StoppingCondition] = None parameter to:
    • SFTTrainer
    • DPOTrainer
    • RLVRTrainer
    • RLAIFTrainer
  • Parameter is passed through TrainDefaults.get_stopping_condition() which defaults to 1 hour if not specified
  • Added unit tests to existing test files (5 tests total)

Usage

python
from sagemaker.train import SFTTrainer
from sagemaker.train.configs import StoppingCondition

trainer = SFTTrainer(
model="meta-llama/Llama-2-7b-hf",
model_package_group="my-model-group",
training_dataset="s3://bucket/data.jsonl",
stopping_condition=StoppingCondition(
max_runtime_in_seconds=259200 # 3 days
)
)

Backward Compatibility

✅ Fully backward compatible - defaults to 1 hour if not specified

Testing

  • Added unit tests to test_sft_trainer.py, test_dpo_trainer.py, test_rlvr_trainer.py, test_rlaif_trainer.py
  • All tests passing

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mollyheamazon mollyheamazon merged commit d3770cc into aws:master Feb 26, 2026
13 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants