Problem Statement
Problem Statement
When using BedrockModel with an Application Inference Profile (AIP) ARN and cache_points enabled, caching is silently disabled because the SDK cannot determine the underlying model from the AIP ARN.
from strands.models import BedrockModel
model = BedrockModel(
model_id="arn:aws:bedrock:eu-west-1:123456789012:application-inference-profile/my-profile-id",
cache_points=["system", "tools"],
)
WARNING strands.models.bedrock:bedrock.py:449
model_id=arn:aws:bedrock:eu-west-1:123456789012:application-inference-profile/my-profile-id
| cache_config is enabled but this model does not support caching
The same configuration works correctly with a cross-region inference profile because the model identifier is embedded in the ARN:
model = BedrockModel(
model_id="arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.anthropic.claude-haiku-4-5-20251001-v1:0",
cache_points=["system", "tools"],
)
Works — caching is active
Root Cause
The caching strategy implemented in PR #1438 (as part of #1432) requires identifying the model family before enabling caching. The model detection logic parses the model_id string to extract the model identifier. This works for base model IDs and cross-region inference profiles where the model name is visible in the ARN, but fails for Application Inference Profiles where the ARN contains only a custom profile identifier (e.g., application-inference-profile/my-profile-id).
Proposed Solution
Proposed Solution
When the model_id is an Application Inference Profile ARN and cannot be resolved through string parsing, the SDK should call the Bedrock GetInferenceProfile API to resolve the underlying model:
import boto3
def _resolve_model_from_aip(aip_arn: str, region: str) -> str:
"""Resolve the underlying model from an Application Inference Profile ARN."""
bedrock = boto3.client("bedrock", region_name=region)
response = bedrock.get_inference_profile(inferenceProfileIdentifier=aip_arn)
models = response.get("models", [])
if models:
return models[0].get("modelArn", "")
return ""
The resolved model ARN would be used solely for the caching capability check. The original AIP ARN would continue to be passed as the modelId in Converse API calls to preserve cost attribution.
API reference: [GetInferenceProfile](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetInferenceProfile.html)
Use Case
Why This Matters
Application Inference Profiles are used for granular cost attribution, per-team IAM access control, and per-profile quota management. They are essential for organizations that need to track inference costs by team, project, or customer. The current behavior forces users to choose between:
- AIP (cost tracking, access control) — but no prompt caching
- Cross-region inference profile (prompt caching works) — but no granular cost tracking
I have verified that the Amazon Bedrock Converse API fully supports prompt caching with AIP ARNs when cachePoint blocks are included directly in the request. The limitation is solely in the SDK's client-side model detection logic, which cannot extract the model family from an AIP ARN and therefore defaults to disabling caching.
Alternatives Solutions
No response
Additional Context
No response
Problem Statement
Problem Statement
When using BedrockModel with an Application Inference Profile (AIP) ARN and cache_points enabled, caching is silently disabled because the SDK cannot determine the underlying model from the AIP ARN.
from strands.models import BedrockModel
model = BedrockModel(
model_id="arn:aws:bedrock:eu-west-1:123456789012:application-inference-profile/my-profile-id",
cache_points=["system", "tools"],
)
WARNING strands.models.bedrock:bedrock.py:449
model_id=arn:aws:bedrock:eu-west-1:123456789012:application-inference-profile/my-profile-id
| cache_config is enabled but this model does not support caching
The same configuration works correctly with a cross-region inference profile because the model identifier is embedded in the ARN:
model = BedrockModel(
model_id="arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.anthropic.claude-haiku-4-5-20251001-v1:0",
cache_points=["system", "tools"],
)
Works — caching is active
Root Cause
The caching strategy implemented in PR #1438 (as part of #1432) requires identifying the model family before enabling caching. The model detection logic parses the model_id string to extract the model identifier. This works for base model IDs and cross-region inference profiles where the model name is visible in the ARN, but fails for Application Inference Profiles where the ARN contains only a custom profile identifier (e.g., application-inference-profile/my-profile-id).
Proposed Solution
Proposed Solution
When the model_id is an Application Inference Profile ARN and cannot be resolved through string parsing, the SDK should call the Bedrock GetInferenceProfile API to resolve the underlying model:
import boto3
def _resolve_model_from_aip(aip_arn: str, region: str) -> str:
"""Resolve the underlying model from an Application Inference Profile ARN."""
bedrock = boto3.client("bedrock", region_name=region)
response = bedrock.get_inference_profile(inferenceProfileIdentifier=aip_arn)
models = response.get("models", [])
if models:
return models[0].get("modelArn", "")
return ""
The resolved model ARN would be used solely for the caching capability check. The original AIP ARN would continue to be passed as the modelId in Converse API calls to preserve cost attribution.
API reference: [GetInferenceProfile](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetInferenceProfile.html)
Use Case
Why This Matters
Application Inference Profiles are used for granular cost attribution, per-team IAM access control, and per-profile quota management. They are essential for organizations that need to track inference costs by team, project, or customer. The current behavior forces users to choose between:
I have verified that the Amazon Bedrock Converse API fully supports prompt caching with AIP ARNs when cachePoint blocks are included directly in the request. The limitation is solely in the SDK's client-side model detection logic, which cannot extract the model family from an AIP ARN and therefore defaults to disabling caching.
Alternatives Solutions
No response
Additional Context
No response