Skip to content

Commit d1e34cc

Browse files
committed
feat(serve): add SageMaker GenAI inference benchmarking and recommendation
Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer over sagemaker-core's AIBenchmarkJob, AIRecommendationJob, and AIWorkloadConfig resources. ModelBuilder gains a new entry point and extends two existing verbs: # Benchmark a deployed endpoint job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...)) result = BenchmarkResult.from_job(job) # Recommendation flow extends optimize() and deploy() mb.optimize(workload=..., performance_target="throughput", instance_types=["ml.g6.12xlarge"]) endpoint = mb.deploy(role=role) # top recommendation endpoint = mb.deploy(role=role, recommendation_index=2) # alternative print(result) and print(mb.recommendations[0]) render their data as tables. Public surface added under sagemaker.serve: * Workload -- typed factory; extras pass through **params, validated server-side. * BenchmarkResult / BenchmarkMetrics / BenchmarkMetric -- parses the AIPerf output.tar.gz from S3. * Secret -- opt-in helper for tokens >512 chars (Secrets Manager). * BenchmarkJob, RecommendationJob -- re-exports without the AI prefix. * FeatureGatedError, WorkloadValidationError -- typed exceptions. Pin-mode and workload-mode optimize() kwargs are mutually exclusive. Recommendation deploy uses the ModelPackage path (auto-approves the package the rec job publishes). Includes 51 unit tests and 2 slow_test integ tests (tests/integ/test_ai_inference_recommender_integration.py) verified end-to-end against real AWS. Rebased onto upstream to pick up #5860 (preserve falsy values in sagemaker-core serialize), required so optimize_model=False reaches the wire.
1 parent fda2565 commit d1e34cc

21 files changed

Lines changed: 3506 additions & 7 deletions

sagemaker-serve/VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.13.0
1+
1.13.1

sagemaker-serve/src/sagemaker/serve/__init__.py

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,27 @@
2929
from sagemaker.serve.utils.types import ModelServer
3030
from sagemaker.serve.model_builder import ModelBuilder
3131

32-
__all__ = ["InferenceSpec", "ModelServer", "ModelBuilder"]
32+
from sagemaker.serve.ai_inference_recommender import (
33+
BenchmarkJob,
34+
BenchmarkResult,
35+
RecommendationJob,
36+
Secret,
37+
Workload,
38+
FeatureGatedError,
39+
WorkloadValidationError,
40+
start_benchmark,
41+
)
42+
43+
__all__ = [
44+
"InferenceSpec",
45+
"ModelServer",
46+
"ModelBuilder",
47+
"BenchmarkJob",
48+
"BenchmarkResult",
49+
"RecommendationJob",
50+
"Secret",
51+
"Workload",
52+
"FeatureGatedError",
53+
"WorkloadValidationError",
54+
"start_benchmark",
55+
]
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"). You
4+
# may not use this file except in compliance with the License. A copy of
5+
# the License is located at
6+
#
7+
# http://aws.amazon.com/apache2.0/
8+
#
9+
# or in the "license" file accompanying this file. This file is
10+
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
11+
# ANY KIND, either express or implied. See the License for the specific
12+
# language governing permissions and limitations under the License.
13+
"""SageMaker GenAI inference benchmarking and recommendation."""
14+
from __future__ import absolute_import
15+
16+
from sagemaker.serve.ai_inference_recommender.exceptions import (
17+
FeatureGatedError,
18+
WorkloadValidationError,
19+
)
20+
from sagemaker.serve.ai_inference_recommender.jobs import (
21+
BenchmarkJob,
22+
RecommendationJob,
23+
)
24+
from sagemaker.serve.ai_inference_recommender.result import (
25+
BenchmarkMetric,
26+
BenchmarkMetrics,
27+
BenchmarkResult,
28+
)
29+
from sagemaker.serve.ai_inference_recommender.secrets import Secret
30+
from sagemaker.serve.ai_inference_recommender.workload import Workload
31+
from sagemaker.serve.ai_inference_recommender._model_builder_methods import (
32+
start_benchmark,
33+
)
34+
35+
36+
__all__ = [
37+
"BenchmarkJob",
38+
"BenchmarkMetric",
39+
"BenchmarkMetrics",
40+
"BenchmarkResult",
41+
"FeatureGatedError",
42+
"RecommendationJob",
43+
"Secret",
44+
"Workload",
45+
"WorkloadValidationError",
46+
"start_benchmark",
47+
]
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"). You
4+
# may not use this file except in compliance with the License. A copy of
5+
# the License is located at
6+
#
7+
# http://aws.amazon.com/apache2.0/
8+
#
9+
# or in the "license" file accompanying this file. This file is
10+
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
11+
# ANY KIND, either express or implied. See the License for the specific
12+
# language governing permissions and limitations under the License.
13+
"""Constants for the AI inference recommender module."""
14+
from __future__ import absolute_import
15+
16+
MAX_INSTANCE_TYPES = 3
17+
18+
FEATURE_GATING_RUNBOOK_URL = (
19+
"https://docs.aws.amazon.com/sagemaker/latest/dg/"
20+
"generative-ai-inference-recommendations.html"
21+
)

0 commit comments

Comments
 (0)