Commit 19cba63
committed
feat(serve): add SageMaker GenAI inference benchmarking and recommendation
Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer
over sagemaker-core's AIBenchmarkJob, AIRecommendationJob, and
AIWorkloadConfig resources.
ModelBuilder gains a new entry point and extends two existing verbs:
# Benchmark a deployed endpoint
job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...))
result = BenchmarkResult.from_job(job)
# Recommendation flow extends optimize() and deploy()
mb.optimize(workload=..., performance_target="throughput",
instance_types=["ml.g6.12xlarge"])
endpoint = mb.deploy(role=role) # top recommendation
endpoint = mb.deploy(role=role, recommendation_index=2) # alternative
print(result) and print(mb.recommendations[0]) render their data as
tables.
Public surface added under sagemaker.serve:
* Workload -- typed factory; extras pass through **params, validated
server-side.
* BenchmarkResult / BenchmarkMetrics / BenchmarkMetric -- parses the
AIPerf output.tar.gz from S3.
* Secret -- opt-in helper for tokens >512 chars (Secrets Manager).
* BenchmarkJob, RecommendationJob -- re-exports without the AI prefix.
* FeatureGatedError, WorkloadValidationError -- typed exceptions.
Pin-mode and workload-mode optimize() kwargs are mutually exclusive.
Recommendation deploy uses the ModelPackage path (auto-approves the
package the rec job publishes).
Includes 51 unit tests and 2 slow_test integ tests
(tests/integ/test_ai_inference_recommender_integration.py) verified
end-to-end against real AWS.
Rebased onto upstream to pick up #5860 (preserve falsy values in
sagemaker-core serialize), required so optimize_model=False reaches
the wire.1 parent 600bf9d commit 19cba63
2 files changed
Lines changed: 16 additions & 0 deletions
File tree
- sagemaker-serve
- src/sagemaker/serve/ai_inference_recommender
- tests/unit/test_ai_inference_recommender
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
100 | 109 | | |
101 | 110 | | |
102 | 111 | | |
| |||
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
85 | 92 | | |
86 | 93 | | |
87 | 94 | | |
| |||
0 commit comments