test: cap neuron gated model training to 1 epoch in slow test

lucasjia-aws · lucasjia-aws · commit 764fb2ee5bc4 · 2026-06-12T10:49:45.000-07:00
test_gated_model_training_v2_neuron ran a full Llama-2-7B neuron
training on ml.trn1.32xlarge with no epoch cap, taking ~170+ min and
timing out the 3h slow-tests build. Cap training to a single epoch to
exercise only the train/deploy/predict flow, matching the existing
epochs="1" / max_steps="1" pattern used by the other JumpStart
estimator slow tests.

X-AI-Prompt: Add epochs=1 hyperparameter to test_gated_model_training_v2_neuron to fix slow-tests timeout
X-AI-Tool: kiro-cli
diff --git a/tests/integ/sagemaker/jumpstart/estimator/test_jumpstart_estimator.py b/tests/integ/sagemaker/jumpstart/estimator/test_jumpstart_estimator.py
@@ -215,6 +215,9 @@ def test_gated_model_training_v2_neuron(setup):
         tags=[{"Key": JUMPSTART_TAG, "Value": os.environ[ENV_VAR_JUMPSTART_SDK_TEST_SUITE_ID]}],
         environment={"accept_eula": "true"},
         max_run=259200,  # avoid exceeding resource limits
+        # Canary only verifies the train/deploy flow, so cap training to a
+        # single epoch to keep fit() fast.
+        hyperparameters={"epochs": "1"},
     )
 
     # uses ml.trn1.32xlarge instance

Original file line number	Diff line number	Diff line change
`@@ -215,6 +215,9 @@ def test_gated_model_training_v2_neuron(setup):`
`215`	`215`	`tags=[{"Key": JUMPSTART_TAG, "Value": os.environ[ENV_VAR_JUMPSTART_SDK_TEST_SUITE_ID]}],`
`216`	`216`	`environment={"accept_eula": "true"},`
`217`	`217`	`max_run=259200, # avoid exceeding resource limits`
	`218`	`+ # Canary only verifies the train/deploy flow, so cap training to a`
	`219`	`+ # single epoch to keep fit() fast.`
	`220`	`+ hyperparameters={"epochs": "1"},`
`218`	`221`	`)`
`219`	`222`
`220`	`223`	`# uses ml.trn1.32xlarge instance`