Skip to content

ModelBuilder.deploy() should expose DataCacheConfig and other CreateInferenceComponent API parameters #5750

@manuwaik

Description

@manuwaik

Describe the feature you'd like

Expose additional CreateInferenceComponent API parameters through ModelBuilder.deploy() when deploying Inference Components. Currently, ModelBuilder._deploy_core_endpoint() builds a minimal InferenceComponentSpecification and hardcodes several values. The following API-supported configuration is not surfaced:

  • Specification.DataCacheConfig.EnableCaching — Cache model artifacts and container images on instances for faster auto-scaling cold starts
  • Specification.BaseInferenceComponentName — Adapter component deployment (e.g. LoRA adapters attached to a base model)
  • Specification.Container (Image, ArtifactUrl, Environment) — Custom container images, artifact URLs, and environment variables at the IC level
  • VariantName (top-level) — Currently hardcoded to "AllTraffic", not configurable for multi-variant endpoints

How would this feature be used? Please describe.

Our immediate need is DataCacheConfig.EnableCaching. When deploying large model artifacts (700MB+ Triton ensembles) on auto-scaling endpoints, new instances must re-download all artifacts during scale-out. Enabling caching eliminates this overhead.

Ideally these would be optional parameters on deploy() or on ResourceRequirements:

builder.deploy(
    endpoint_name="my-endpoint",
    inference_component_name="my-ic",
    instance_type="ml.g5.2xlarge",
    initial_instance_count=1,
    inference_config=ResourceRequirements(
        requests={"memory": 8192, "num_accelerators": 1, "num_cpus": 2, "copies": 1}
    ),
    data_cache_config={"enable_caching": True},  # new
)

Describe alternatives you've considered

The current workaround is to use ModelBuilder.build() for model creation, then call EndpointConfig.create(), Endpoint.create(), and InferenceComponent.create() directly via sagemaker-core, which supports the full InferenceComponentSpecification:

from sagemaker.core.resources import EndpointConfig, Endpoint, InferenceComponent
from sagemaker.core.shapes import (
    ProductionVariant,
    InferenceComponentSpecification,
    InferenceComponentComputeResourceRequirements,
    InferenceComponentRuntimeConfig,
    InferenceComponentDataCacheConfig,
)

builder.build(model_name=sm_model_name)

EndpointConfig.create(
    endpoint_config_name=epc_name,
    production_variants=[
        ProductionVariant(
            variant_name="AllTraffic",
            instance_type="ml.g5.2xlarge",
            initial_instance_count=1,
        )
    ],
    execution_role_arn=role,
)

Endpoint.create(
    endpoint_name=endpoint_name,
    endpoint_config_name=epc_name,
)

InferenceComponent.create(
    inference_component_name=ic_name,
    endpoint_name=endpoint_name,
    variant_name="AllTraffic",
    specification=InferenceComponentSpecification(
        model_name=sm_model_name,
        compute_resource_requirements=InferenceComponentComputeResourceRequirements(
            min_memory_required_in_mb=8192,
            number_of_accelerator_devices_required=1,
            number_of_cpu_cores_required=2,
        ),
        data_cache_config=InferenceComponentDataCacheConfig(enable_caching=True),
    ),
    runtime_config=InferenceComponentRuntimeConfig(copy_count=1),
)

This works but defeats the purpose of ModelBuilder.deploy() as a high-level abstraction. Customers shouldn't need to drop down to sagemaker-core or boto3 for commonly used API parameters.

Additional context

  • API reference: CreateInferenceComponent
  • The shape classes (InferenceComponentDataCacheConfig, InferenceComponentContainerSpecification, etc.) already exist in sagemaker.core.shapes — they just need to be wired into ModelBuilder._deploy_core_endpoint().
  • SageMaker SDK version: 3.x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions