|
1 | 1 | Inference |
2 | 2 | ========= |
3 | 3 |
|
4 | | -Inference documentation for SageMaker Python SDK V3. |
| 4 | +SageMaker Python SDK V3 transforms model deployment and inference with the unified **ModelBuilder** class, replacing the complex framework-specific model classes from V2. This modern approach provides a consistent interface for all inference scenarios while maintaining the flexibility and performance you need. |
5 | 5 |
|
6 | | -Coming soon... |
| 6 | +Key Benefits of V3 Inference |
| 7 | +---------------------------- |
| 8 | + |
| 9 | +* **Unified Interface**: Single ``ModelBuilder`` class replaces multiple framework-specific model classes |
| 10 | +* **Simplified Deployment**: Object-oriented API with intelligent defaults for endpoint configuration |
| 11 | +* **Enhanced Performance**: Optimized inference pipelines with automatic scaling and load balancing |
| 12 | +* **Multi-Modal Support**: Deploy models for real-time, batch, and serverless inference scenarios |
| 13 | + |
| 14 | +Quick Start Example |
| 15 | +------------------- |
| 16 | + |
| 17 | +Here's how inference has evolved from V2 to V3: |
| 18 | + |
| 19 | +**SageMaker Python SDK V2:** |
| 20 | + |
| 21 | +.. code-block:: python |
| 22 | +
|
| 23 | + from sagemaker.model import Model |
| 24 | + from sagemaker.predictor import Predictor |
| 25 | + |
| 26 | + model = Model( |
| 27 | + image_uri="my-inference-image", |
| 28 | + model_data="s3://my-bucket/model.tar.gz", |
| 29 | + role="arn:aws:iam::123456789012:role/SageMakerRole" |
| 30 | + ) |
| 31 | + predictor = model.deploy( |
| 32 | + initial_instance_count=1, |
| 33 | + instance_type="ml.m5.xlarge" |
| 34 | + ) |
| 35 | + result = predictor.predict(data) |
| 36 | +
|
| 37 | +**SageMaker Python SDK V3:** |
| 38 | + |
| 39 | +.. code-block:: python |
| 40 | +
|
| 41 | + from sagemaker.serve import ModelBuilder |
| 42 | + |
| 43 | + model_builder = ModelBuilder( |
| 44 | + model="my-model", |
| 45 | + model_path="s3://my-bucket/model.tar.gz" |
| 46 | + ) |
| 47 | + endpoint = model_builder.build() |
| 48 | + result = endpoint.invoke(data) |
| 49 | +
|
| 50 | +ModelBuilder Overview |
| 51 | +-------------------- |
| 52 | + |
| 53 | +The ``ModelBuilder`` class is the cornerstone of SageMaker Python SDK V3 inference, providing a unified interface for all deployment scenarios. This single class replaces the complex web of framework-specific model classes from V2, offering: |
| 54 | + |
| 55 | +**Unified Deployment Interface** |
| 56 | + One class handles PyTorch, TensorFlow, Scikit-learn, XGBoost, HuggingFace, and custom containers |
| 57 | + |
| 58 | +**Intelligent Optimization** |
| 59 | + Automatically optimizes model serving configuration based on your model characteristics |
| 60 | + |
| 61 | +**Flexible Deployment Options** |
| 62 | + Support for real-time endpoints, batch transform, and serverless inference |
| 63 | + |
| 64 | +**Seamless Integration** |
| 65 | + Works seamlessly with SageMaker features like auto-scaling, multi-model endpoints, and A/B testing |
| 66 | + |
| 67 | +.. code-block:: python |
| 68 | +
|
| 69 | + from sagemaker.serve import ModelBuilder |
| 70 | + from sagemaker.serve.configs import EndpointConfig |
| 71 | +
|
| 72 | + # Create model builder with intelligent defaults |
| 73 | + model_builder = ModelBuilder( |
| 74 | + model="your-model", |
| 75 | + model_path="s3://your-bucket/model-artifacts", |
| 76 | + role="your-sagemaker-role" |
| 77 | + ) |
| 78 | +
|
| 79 | + # Configure endpoint settings |
| 80 | + endpoint_config = EndpointConfig( |
| 81 | + instance_type="ml.m5.xlarge", |
| 82 | + initial_instance_count=1, |
| 83 | + auto_scaling_enabled=True |
| 84 | + ) |
| 85 | +
|
| 86 | + # Deploy model |
| 87 | + endpoint = model_builder.build(endpoint_config=endpoint_config) |
| 88 | + |
| 89 | + # Make predictions |
| 90 | + response = endpoint.invoke({"inputs": "your-input-data"}) |
| 91 | +
|
| 92 | +Inference Capabilities |
| 93 | +---------------------- |
| 94 | + |
| 95 | +Model Optimization Support |
| 96 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 97 | + |
| 98 | +V3 introduces powerful model optimization capabilities for enhanced performance: |
| 99 | + |
| 100 | +* **SageMaker Neo** - Optimize models for specific hardware targets |
| 101 | +* **TensorRT Integration** - Accelerate deep learning inference on NVIDIA GPUs |
| 102 | +* **ONNX Runtime** - Cross-platform model optimization and acceleration |
| 103 | +* **Quantization Support** - Reduce model size and improve inference speed |
| 104 | + |
| 105 | +**Quick Optimization Example:** |
| 106 | + |
| 107 | +.. code-block:: python |
| 108 | +
|
| 109 | + from sagemaker.serve import ModelBuilder |
| 110 | + from sagemaker.serve.configs import OptimizationConfig |
| 111 | +
|
| 112 | + model_builder = ModelBuilder( |
| 113 | + model="huggingface-bert-base", |
| 114 | + optimization_config=OptimizationConfig( |
| 115 | + target_device="ml_inf1", |
| 116 | + optimization_level="O2", |
| 117 | + quantization_enabled=True |
| 118 | + ) |
| 119 | + ) |
| 120 | +
|
| 121 | + optimized_endpoint = model_builder.build() |
| 122 | +
|
| 123 | +Key Inference Features |
| 124 | +~~~~~~~~~~~~~~~~~~~~~ |
| 125 | + |
| 126 | +* **Multi-Model Endpoints** - Host multiple models on a single endpoint with automatic model loading and unloading for cost optimization |
| 127 | +* **Auto-Scaling Integration** - Automatically scale endpoint capacity based on traffic patterns with configurable scaling policies |
| 128 | +* **A/B Testing Support** - Deploy multiple model variants with traffic splitting for safe model updates and performance comparison |
| 129 | +* **Batch Transform Jobs** - Process large datasets efficiently with automatic data partitioning and parallel processing |
| 130 | +* **Serverless Inference** - Pay-per-request pricing with automatic scaling from zero to handle variable workloads |
| 131 | + |
| 132 | +Supported Inference Scenarios |
| 133 | +----------------------------- |
| 134 | + |
| 135 | +Deployment Types |
| 136 | +~~~~~~~~~~~~~~~ |
| 137 | + |
| 138 | +* **Real-Time Endpoints** - Low-latency inference for interactive applications |
| 139 | +* **Batch Transform** - High-throughput processing for large datasets |
| 140 | +* **Serverless Inference** - Cost-effective inference for variable workloads |
| 141 | +* **Multi-Model Endpoints** - Host multiple models on shared infrastructure |
| 142 | + |
| 143 | +Framework Support |
| 144 | +~~~~~~~~~~~~~~~~~ |
| 145 | + |
| 146 | +* **PyTorch** - Deep learning models with dynamic computation graphs |
| 147 | +* **TensorFlow** - Production-ready machine learning models at scale |
| 148 | +* **Scikit-learn** - Classical machine learning algorithms |
| 149 | +* **XGBoost** - Gradient boosting models for structured data |
| 150 | +* **HuggingFace** - Pre-trained transformer models for NLP tasks |
| 151 | +* **Custom Containers** - Bring your own inference logic and dependencies |
| 152 | + |
| 153 | +Advanced Features |
| 154 | +~~~~~~~~~~~~~~~~ |
| 155 | + |
| 156 | +* **Model Monitoring** - Track model performance and data drift in production |
| 157 | +* **Endpoint Security** - VPC support, encryption, and IAM-based access control |
| 158 | +* **Multi-AZ Deployment** - High availability with automatic failover |
| 159 | +* **Custom Inference Logic** - Implement preprocessing, postprocessing, and custom prediction logic |
| 160 | + |
| 161 | +Migration from V2 |
| 162 | +------------------ |
| 163 | + |
| 164 | +If you're migrating from V2, the key changes are: |
| 165 | + |
| 166 | +* Replace framework-specific model classes (PyTorchModel, TensorFlowModel, etc.) with ``ModelBuilder`` |
| 167 | +* Use structured configuration objects instead of parameter dictionaries |
| 168 | +* Leverage the new ``invoke()`` method instead of ``predict()`` for more consistent API |
| 169 | +* Take advantage of built-in optimization and auto-scaling features |
| 170 | + |
| 171 | +Inference Examples |
| 172 | +----------------- |
| 173 | + |
| 174 | +Explore comprehensive inference examples that demonstrate V3 capabilities: |
| 175 | + |
| 176 | +.. toctree:: |
| 177 | + :maxdepth: 1 |
| 178 | + |
| 179 | + ../v3-examples/inference-examples/inference-spec-example |
| 180 | + ../v3-examples/inference-examples/jumpstart-example |
| 181 | + ../v3-examples/inference-examples/optimize-example |
| 182 | + ../v3-examples/inference-examples/train-inference-e2e-example |
| 183 | + ../v3-examples/inference-examples/jumpstart-e2e-training-example |
| 184 | + ../v3-examples/inference-examples/local-mode-example |
| 185 | + ../v3-examples/inference-examples/huggingface-example |
| 186 | + ../v3-examples/inference-examples/in-process-mode-example |
0 commit comments