Skip to content

Commit 6abbe37

Browse files
Aditi2424adishaa
andauthored
doc: Add content for V3 training, inference, model customization, ml_ops and sagemaker core (#5446)
* feat: Add basic V3 documentation structure and configuration - Add ReadTheDocs configuration with sphinx-book-theme - Create comprehensive overview, installation, and quickstart pages - Set up documentation structure for all core capabilities - Add custom CSS and Sphinx configuration - Create symlinks to v3-examples and sagemaker-core directories - Add docs/_build/ to .gitignore to exclude build artifacts * feat: Add comprehensive training, inference, and model customization content - Add detailed model customization with SFTTrainer, DPOTrainer, RLAIFTrainer, RLVRTrainer - Include unified ModelTrainer interface for training workflows - Add ModelBuilder for streamlined inference deployment - Cover LoRA, full fine-tuning, and advanced ML techniques - Include migration guides and production-ready examples * feat: Add ML Ops and SageMaker Core documentation - Add comprehensive MLOps pipeline orchestration and monitoring - Include SageMaker Core low-level API documentation - Cover advanced pipeline features and governance capabilities --------- Co-authored-by: adishaa <adishaa@amazon.com>
1 parent c846cde commit 6abbe37

5 files changed

Lines changed: 968 additions & 10 deletions

File tree

docs/inference/index.rst

Lines changed: 182 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,186 @@
11
Inference
22
=========
33

4-
Inference documentation for SageMaker Python SDK V3.
4+
SageMaker Python SDK V3 transforms model deployment and inference with the unified **ModelBuilder** class, replacing the complex framework-specific model classes from V2. This modern approach provides a consistent interface for all inference scenarios while maintaining the flexibility and performance you need.
55

6-
Coming soon...
6+
Key Benefits of V3 Inference
7+
----------------------------
8+
9+
* **Unified Interface**: Single ``ModelBuilder`` class replaces multiple framework-specific model classes
10+
* **Simplified Deployment**: Object-oriented API with intelligent defaults for endpoint configuration
11+
* **Enhanced Performance**: Optimized inference pipelines with automatic scaling and load balancing
12+
* **Multi-Modal Support**: Deploy models for real-time, batch, and serverless inference scenarios
13+
14+
Quick Start Example
15+
-------------------
16+
17+
Here's how inference has evolved from V2 to V3:
18+
19+
**SageMaker Python SDK V2:**
20+
21+
.. code-block:: python
22+
23+
from sagemaker.model import Model
24+
from sagemaker.predictor import Predictor
25+
26+
model = Model(
27+
image_uri="my-inference-image",
28+
model_data="s3://my-bucket/model.tar.gz",
29+
role="arn:aws:iam::123456789012:role/SageMakerRole"
30+
)
31+
predictor = model.deploy(
32+
initial_instance_count=1,
33+
instance_type="ml.m5.xlarge"
34+
)
35+
result = predictor.predict(data)
36+
37+
**SageMaker Python SDK V3:**
38+
39+
.. code-block:: python
40+
41+
from sagemaker.serve import ModelBuilder
42+
43+
model_builder = ModelBuilder(
44+
model="my-model",
45+
model_path="s3://my-bucket/model.tar.gz"
46+
)
47+
endpoint = model_builder.build()
48+
result = endpoint.invoke(data)
49+
50+
ModelBuilder Overview
51+
--------------------
52+
53+
The ``ModelBuilder`` class is the cornerstone of SageMaker Python SDK V3 inference, providing a unified interface for all deployment scenarios. This single class replaces the complex web of framework-specific model classes from V2, offering:
54+
55+
**Unified Deployment Interface**
56+
One class handles PyTorch, TensorFlow, Scikit-learn, XGBoost, HuggingFace, and custom containers
57+
58+
**Intelligent Optimization**
59+
Automatically optimizes model serving configuration based on your model characteristics
60+
61+
**Flexible Deployment Options**
62+
Support for real-time endpoints, batch transform, and serverless inference
63+
64+
**Seamless Integration**
65+
Works seamlessly with SageMaker features like auto-scaling, multi-model endpoints, and A/B testing
66+
67+
.. code-block:: python
68+
69+
from sagemaker.serve import ModelBuilder
70+
from sagemaker.serve.configs import EndpointConfig
71+
72+
# Create model builder with intelligent defaults
73+
model_builder = ModelBuilder(
74+
model="your-model",
75+
model_path="s3://your-bucket/model-artifacts",
76+
role="your-sagemaker-role"
77+
)
78+
79+
# Configure endpoint settings
80+
endpoint_config = EndpointConfig(
81+
instance_type="ml.m5.xlarge",
82+
initial_instance_count=1,
83+
auto_scaling_enabled=True
84+
)
85+
86+
# Deploy model
87+
endpoint = model_builder.build(endpoint_config=endpoint_config)
88+
89+
# Make predictions
90+
response = endpoint.invoke({"inputs": "your-input-data"})
91+
92+
Inference Capabilities
93+
----------------------
94+
95+
Model Optimization Support
96+
~~~~~~~~~~~~~~~~~~~~~~~~~~
97+
98+
V3 introduces powerful model optimization capabilities for enhanced performance:
99+
100+
* **SageMaker Neo** - Optimize models for specific hardware targets
101+
* **TensorRT Integration** - Accelerate deep learning inference on NVIDIA GPUs
102+
* **ONNX Runtime** - Cross-platform model optimization and acceleration
103+
* **Quantization Support** - Reduce model size and improve inference speed
104+
105+
**Quick Optimization Example:**
106+
107+
.. code-block:: python
108+
109+
from sagemaker.serve import ModelBuilder
110+
from sagemaker.serve.configs import OptimizationConfig
111+
112+
model_builder = ModelBuilder(
113+
model="huggingface-bert-base",
114+
optimization_config=OptimizationConfig(
115+
target_device="ml_inf1",
116+
optimization_level="O2",
117+
quantization_enabled=True
118+
)
119+
)
120+
121+
optimized_endpoint = model_builder.build()
122+
123+
Key Inference Features
124+
~~~~~~~~~~~~~~~~~~~~~
125+
126+
* **Multi-Model Endpoints** - Host multiple models on a single endpoint with automatic model loading and unloading for cost optimization
127+
* **Auto-Scaling Integration** - Automatically scale endpoint capacity based on traffic patterns with configurable scaling policies
128+
* **A/B Testing Support** - Deploy multiple model variants with traffic splitting for safe model updates and performance comparison
129+
* **Batch Transform Jobs** - Process large datasets efficiently with automatic data partitioning and parallel processing
130+
* **Serverless Inference** - Pay-per-request pricing with automatic scaling from zero to handle variable workloads
131+
132+
Supported Inference Scenarios
133+
-----------------------------
134+
135+
Deployment Types
136+
~~~~~~~~~~~~~~~
137+
138+
* **Real-Time Endpoints** - Low-latency inference for interactive applications
139+
* **Batch Transform** - High-throughput processing for large datasets
140+
* **Serverless Inference** - Cost-effective inference for variable workloads
141+
* **Multi-Model Endpoints** - Host multiple models on shared infrastructure
142+
143+
Framework Support
144+
~~~~~~~~~~~~~~~~~
145+
146+
* **PyTorch** - Deep learning models with dynamic computation graphs
147+
* **TensorFlow** - Production-ready machine learning models at scale
148+
* **Scikit-learn** - Classical machine learning algorithms
149+
* **XGBoost** - Gradient boosting models for structured data
150+
* **HuggingFace** - Pre-trained transformer models for NLP tasks
151+
* **Custom Containers** - Bring your own inference logic and dependencies
152+
153+
Advanced Features
154+
~~~~~~~~~~~~~~~~
155+
156+
* **Model Monitoring** - Track model performance and data drift in production
157+
* **Endpoint Security** - VPC support, encryption, and IAM-based access control
158+
* **Multi-AZ Deployment** - High availability with automatic failover
159+
* **Custom Inference Logic** - Implement preprocessing, postprocessing, and custom prediction logic
160+
161+
Migration from V2
162+
------------------
163+
164+
If you're migrating from V2, the key changes are:
165+
166+
* Replace framework-specific model classes (PyTorchModel, TensorFlowModel, etc.) with ``ModelBuilder``
167+
* Use structured configuration objects instead of parameter dictionaries
168+
* Leverage the new ``invoke()`` method instead of ``predict()`` for more consistent API
169+
* Take advantage of built-in optimization and auto-scaling features
170+
171+
Inference Examples
172+
-----------------
173+
174+
Explore comprehensive inference examples that demonstrate V3 capabilities:
175+
176+
.. toctree::
177+
:maxdepth: 1
178+
179+
../v3-examples/inference-examples/inference-spec-example
180+
../v3-examples/inference-examples/jumpstart-example
181+
../v3-examples/inference-examples/optimize-example
182+
../v3-examples/inference-examples/train-inference-e2e-example
183+
../v3-examples/inference-examples/jumpstart-e2e-training-example
184+
../v3-examples/inference-examples/local-mode-example
185+
../v3-examples/inference-examples/huggingface-example
186+
../v3-examples/inference-examples/in-process-mode-example

0 commit comments

Comments
 (0)