Skip to content

Feature Request: Container Apps: Validate registry DNS resolution at deployment time and surface actionable errors #1710

@ractando

Description

@ractando

Is your feature request related to a problem? Please describe.

When a Container App has multiple entries in the registries array and one of them has a DNS resolution failure (e.g., missing private DNS zone record), the deployment succeeds (HTTP 200/201) but the pod enters CrashLoopBackOff / Pending:NotReady at runtime with no clear surface-level error.

The actual failure (no such host on token exchange) is only visible deep in platform telemetry (Kusto), not in the ARM response, provisioning error, or az containerapp show output.

This commonly happens when:

  • An ACR has dedicated data endpoints enabled, creating two FQDNs (login server + data endpoint)
  • A Private Endpoint is configured but the private DNS zone only has A records for one of the two FQDNs
  • The registries array carries a stale entry that was valid previously but no longer resolves

Describe the solution you'd like

  1. Deployment-time validation: When a PUT/PATCH to a container app includes a registries array, the control plane should attempt DNS resolution for each server entry. If resolution fails, return a clear error in the ARM response (e.g., "Registry server 'foo.azurecr.io' could not be resolved. Verify DNS configuration and private endpoint setup."), rather than accepting the deployment and failing silently at pod scheduling.

  2. Provisioning error surfacing: If DNS validation at deployment time is not feasible (e.g., DNS is only resolvable from the VNet), surface the image pull failure reason in properties.provisioningState or a new properties.latestRevisionError field on az containerapp show, so customers don't need platform telemetry access to diagnose.

  3. Warning for unreachable registries: If a registry in the array is not referenced by any container's image field, surface a warning (non-blocking) suggesting the entry may be stale.

Describe alternatives you've considered

  • Customers manually validating DNS from within the VNet before deploying — error-prone and not always feasible
  • Relying on az containerapp logs — these show the app crash but not the upstream token exchange / DNS failure
  • Removing unused registries manually — customers often don't know which entry is stale vs. active

Additional context

  • The lack of deployment-time feedback led to multiple hours of troubleshooting with MS to identify the root cause.
  • Related: ACR Private Endpoint creation does not warn customers that dedicated data endpoint FQDNs also need DNS records.

Component: Microsoft.App/containerApps — Registry configuration & image pull validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    BacklogIssue has been validated and logged in our backlog for future workNetworkingRelated to ACA networkingProvisioningRelated to deployment issues, revision provisioning, etc.enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions