Commit d1e34cc
committed
feat(serve): add SageMaker GenAI inference benchmarking and recommendation
Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer
over sagemaker-core's AIBenchmarkJob, AIRecommendationJob, and
AIWorkloadConfig resources.
ModelBuilder gains a new entry point and extends two existing verbs:
# Benchmark a deployed endpoint
job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...))
result = BenchmarkResult.from_job(job)
# Recommendation flow extends optimize() and deploy()
mb.optimize(workload=..., performance_target="throughput",
instance_types=["ml.g6.12xlarge"])
endpoint = mb.deploy(role=role) # top recommendation
endpoint = mb.deploy(role=role, recommendation_index=2) # alternative
print(result) and print(mb.recommendations[0]) render their data as
tables.
Public surface added under sagemaker.serve:
* Workload -- typed factory; extras pass through **params, validated
server-side.
* BenchmarkResult / BenchmarkMetrics / BenchmarkMetric -- parses the
AIPerf output.tar.gz from S3.
* Secret -- opt-in helper for tokens >512 chars (Secrets Manager).
* BenchmarkJob, RecommendationJob -- re-exports without the AI prefix.
* FeatureGatedError, WorkloadValidationError -- typed exceptions.
Pin-mode and workload-mode optimize() kwargs are mutually exclusive.
Recommendation deploy uses the ModelPackage path (auto-approves the
package the rec job publishes).
Includes 51 unit tests and 2 slow_test integ tests
(tests/integ/test_ai_inference_recommender_integration.py) verified
end-to-end against real AWS.
Rebased onto upstream to pick up #5860 (preserve falsy values in
sagemaker-core serialize), required so optimize_model=False reaches
the wire.1 parent fda2565 commit d1e34cc
21 files changed
Lines changed: 3506 additions & 7 deletions
File tree
- sagemaker-serve
- src/sagemaker/serve
- ai_inference_recommender
- tests
- integ
- unit/test_ai_inference_recommender
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
Lines changed: 47 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
0 commit comments