Skip to content

Commit 764fb2e

Browse files
committed
test: cap neuron gated model training to 1 epoch in slow test
test_gated_model_training_v2_neuron ran a full Llama-2-7B neuron training on ml.trn1.32xlarge with no epoch cap, taking ~170+ min and timing out the 3h slow-tests build. Cap training to a single epoch to exercise only the train/deploy/predict flow, matching the existing epochs="1" / max_steps="1" pattern used by the other JumpStart estimator slow tests. X-AI-Prompt: Add epochs=1 hyperparameter to test_gated_model_training_v2_neuron to fix slow-tests timeout X-AI-Tool: kiro-cli
1 parent d0cbf41 commit 764fb2e

1 file changed

Lines changed: 3 additions & 0 deletions

File tree

tests/integ/sagemaker/jumpstart/estimator/test_jumpstart_estimator.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,9 @@ def test_gated_model_training_v2_neuron(setup):
215215
tags=[{"Key": JUMPSTART_TAG, "Value": os.environ[ENV_VAR_JUMPSTART_SDK_TEST_SUITE_ID]}],
216216
environment={"accept_eula": "true"},
217217
max_run=259200, # avoid exceeding resource limits
218+
# Canary only verifies the train/deploy flow, so cap training to a
219+
# single epoch to keep fit() fast.
220+
hyperparameters={"epochs": "1"},
218221
)
219222

220223
# uses ml.trn1.32xlarge instance

0 commit comments

Comments
 (0)