Skip to content

Commit 19cba63

Browse files
committed
feat(serve): add SageMaker GenAI inference benchmarking and recommendation
Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer over sagemaker-core's AIBenchmarkJob, AIRecommendationJob, and AIWorkloadConfig resources. ModelBuilder gains a new entry point and extends two existing verbs: # Benchmark a deployed endpoint job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...)) result = BenchmarkResult.from_job(job) # Recommendation flow extends optimize() and deploy() mb.optimize(workload=..., performance_target="throughput", instance_types=["ml.g6.12xlarge"]) endpoint = mb.deploy(role=role) # top recommendation endpoint = mb.deploy(role=role, recommendation_index=2) # alternative print(result) and print(mb.recommendations[0]) render their data as tables. Public surface added under sagemaker.serve: * Workload -- typed factory; extras pass through **params, validated server-side. * BenchmarkResult / BenchmarkMetrics / BenchmarkMetric -- parses the AIPerf output.tar.gz from S3. * Secret -- opt-in helper for tokens >512 chars (Secrets Manager). * BenchmarkJob, RecommendationJob -- re-exports without the AI prefix. * FeatureGatedError, WorkloadValidationError -- typed exceptions. Pin-mode and workload-mode optimize() kwargs are mutually exclusive. Recommendation deploy uses the ModelPackage path (auto-approves the package the rec job publishes). Includes 51 unit tests and 2 slow_test integ tests (tests/integ/test_ai_inference_recommender_integration.py) verified end-to-end against real AWS. Rebased onto upstream to pick up #5860 (preserve falsy values in sagemaker-core serialize), required so optimize_model=False reaches the wire.
1 parent 600bf9d commit 19cba63

2 files changed

Lines changed: 16 additions & 0 deletions

File tree

sagemaker-serve/src/sagemaker/serve/ai_inference_recommender/workload.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,15 @@ def synthetic(
9797
secrets["hf_token"] = hf_token
9898
return cls(parameters=parameters, secrets=secrets)
9999

100+
@classmethod
101+
def sonnet(cls, **kwargs: Any) -> "Workload":
102+
"""Alias for :meth:`synthetic`.
103+
104+
AIPerf seeds synthetic prompts from the Sonnet dataset by default,
105+
so ``Workload.sonnet(...)`` is the same as ``Workload.synthetic(...)``.
106+
"""
107+
return cls.synthetic(**kwargs)
108+
100109
@classmethod
101110
def from_dataset(
102111
cls,

sagemaker-serve/tests/unit/test_ai_inference_recommender/test_workload.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,13 @@ def test_hf_token_as_arn_string(self):
8282
wl = Workload.synthetic(tokenizer="t", hf_token=arn)
8383
assert wl.secrets == {"hf_token": arn}
8484

85+
def test_sonnet_is_alias_for_synthetic(self):
86+
wl_sonnet = Workload.sonnet(tokenizer="meta-llama/Llama-3.2-1B")
87+
wl_synth = Workload.synthetic(tokenizer="meta-llama/Llama-3.2-1B")
88+
assert wl_sonnet.parameters == wl_synth.parameters
89+
assert wl_sonnet.secrets == wl_synth.secrets
90+
assert wl_sonnet.tooling == wl_synth.tooling
91+
8592

8693
class TestWorkloadToInline:
8794
def test_envelope(self):

0 commit comments

Comments
 (0)