feat(serve): add SageMaker GenAI inference benchmarking and recommendation

ZealSV · ZealSV · commit 19cba63ce980 · 2026-05-27T12:28:40.000-07:00
Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer over sagemaker-core's AIBenchmarkJob, AIRecommendationJob, and AIWorkloadConfig resources. ModelBuilder gains a new entry point and extends two existing verbs: # Benchmark a deployed endpoint job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...)) result = BenchmarkResult.from_job(job) # Recommendation flow extends optimize() and deploy() mb.optimize(workload=..., performance_target="throughput", instance_types=["ml.g6.12xlarge"]) endpoint = mb.deploy(role=role) # top recommendation endpoint = mb.deploy(role=role, recommendation_index=2) # alternative print(result) and print(mb.recommendations[0]) render their data as tables. Public surface added under sagemaker.serve: * Workload -- typed factory; extras pass through **params, validated server-side. * BenchmarkResult / BenchmarkMetrics / BenchmarkMetric -- parses the AIPerf output.tar.gz from S3. * Secret -- opt-in helper for tokens >512 chars (Secrets Manager). * BenchmarkJob, RecommendationJob -- re-exports without the AI prefix. * FeatureGatedError, WorkloadValidationError -- typed exceptions. Pin-mode and workload-mode optimize() kwargs are mutually exclusive. Recommendation deploy uses the ModelPackage path (auto-approves the package the rec job publishes). Includes 51 unit tests and 2 slow_test integ tests (tests/integ/test_ai_inference_recommender_integration.py) verified end-to-end against real AWS. Rebased onto upstream to pick up #5860 (preserve falsy values in sagemaker-core serialize), required so optimize_model=False reaches the wire.
diff --git a/sagemaker-serve/src/sagemaker/serve/ai_inference_recommender/workload.py b/sagemaker-serve/src/sagemaker/serve/ai_inference_recommender/workload.py
@@ -97,6 +97,15 @@ def synthetic(
             secrets["hf_token"] = hf_token
         return cls(parameters=parameters, secrets=secrets)
 
+    @classmethod
+    def sonnet(cls, **kwargs: Any) -> "Workload":
+        """Alias for :meth:`synthetic`.
+
+        AIPerf seeds synthetic prompts from the Sonnet dataset by default,
+        so ``Workload.sonnet(...)`` is the same as ``Workload.synthetic(...)``.
+        """
+        return cls.synthetic(**kwargs)
+
     @classmethod
     def from_dataset(
         cls,
diff --git a/sagemaker-serve/tests/unit/test_ai_inference_recommender/test_workload.py b/sagemaker-serve/tests/unit/test_ai_inference_recommender/test_workload.py
@@ -82,6 +82,13 @@ def test_hf_token_as_arn_string(self):
         wl = Workload.synthetic(tokenizer="t", hf_token=arn)
         assert wl.secrets == {"hf_token": arn}
 
+    def test_sonnet_is_alias_for_synthetic(self):
+        wl_sonnet = Workload.sonnet(tokenizer="meta-llama/Llama-3.2-1B")
+        wl_synth = Workload.synthetic(tokenizer="meta-llama/Llama-3.2-1B")
+        assert wl_sonnet.parameters == wl_synth.parameters
+        assert wl_sonnet.secrets == wl_synth.secrets
+        assert wl_sonnet.tooling == wl_synth.tooling
+
 
 class TestWorkloadToInline:
     def test_envelope(self):