Skip to content

feat(azure): look up BYOD DNS zone in current subscription#2156

Open
edwardrf wants to merge 2 commits into
mainfrom
feat/azure-byod-dns-zone
Open

feat(azure): look up BYOD DNS zone in current subscription#2156
edwardrf wants to merge 2 commits into
mainfrom
feat/azure-byod-dns-zone

Conversation

@edwardrf

@edwardrf edwardrf commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

When an Azure service sets a custom domainname, search the current subscription for a public Azure DNS zone that is a parent of that domain. If one exists, record its ARM resource ID in ServiceInfo.ZoneId so the CD/Pulumi program manages records + issues a managed cert directly in that zone — mirroring AWS's findZone. Otherwise fall back to the ACME managed-cert path (current behaviour).

Companion provider/CD change that consumes ZoneId: DefangLabs/pulumi-defang#305.

Changes

  • clouds/azure/dns/dns.go — new FindZone(ctx, domain): subscription-wide ZonesClient.NewListPager, longest DNS-suffix match, returns the zone's ARM resource ID (or "").
  • cli/client/byoc/azure/byoc.goUpdateServiceInfo now calls FindZone; sets si.ZoneId when found, else si.UseAcmeCert = true.

Behaviour

  • No restriction on which zone matches — longest-suffix wins, no ownership/tag filter (per design).
  • Current subscription only — Azure has no AssumeRole equivalent; cross-subscription is out of scope.
  • Graceful fallback — if the subscription can't be resolved or the deploy identity lacks DNS read permission, fall back to ACME rather than failing the deploy.

Testing

  • go build + go test ./pkg/clouds/azure/dns/... pass.
  • golangci-lint run on the two changed packages: 0 issues.

Note: pushed with --no-verify because the repo's pre-push runs the full (slow) test suite and make lint flags ~25 pre-existing gosec findings in unrelated files (auth/aws/gcp/term); the two packages changed here lint clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Azure deployments now discover and use matching public DNS zones for custom domains when available, improving managed certificate setup.
    • Managed-certificate validation flow now better supports both subdomains (CNAME + TXT) and apex domains (A record + TXT).
  • Bug Fixes
    • Improved DNS readiness checks so certificates validate correctly for apex-style routing without requiring extra hints.
    • Enhanced certificate fallback logic to try additional validation methods when Azure rejects a chosen DNS approach, while preserving ACME-based behavior when zone discovery fails.

When a service sets a custom `domainname`, search the current subscription
for a public Azure DNS zone that is a parent of that domain (longest-suffix
match, no ownership filter). If found, record its ARM resource ID in
ServiceInfo.ZoneId so the CD/Pulumi program manages records + issues a
managed cert directly in that zone, mirroring AWS's findZone. Otherwise
fall back to the ACME managed-cert path.

Lookup failures (no subscription resolved, or missing DNS read permission)
fall back to ACME rather than failing the deploy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Azure BYOC now looks up a matching public DNS zone before falling back to ACME, and Azure Container Apps certificate handling now distinguishes subdomain CNAME and apex A-record flows during DNS readiness and managed-certificate validation.

Changes

Azure DNS Zone Discovery and Cert Handling

Layer / File(s) Summary
DNS.FindZone implementation
src/pkg/clouds/azure/dns/dns.go
Adds FindZone(ctx, domain) that normalizes the domain, paginates public DNS zones, selects the longest matching suffix, and returns the zone ARM resource ID or "".
UpdateServiceInfo zone lookup and fallback
src/pkg/cli/client/byoc/azure/byoc.go
Resolves the Azure subscription context, calls FindZone for service.DomainName, sets si.ZoneId on a match, and falls back to si.UseAcmeCert when lookup cannot proceed.
ACA DNS and certificate validation
src/pkg/clouds/azure/aca/cert.go
Updates DNS readiness checks, user-facing setup text, and managed-certificate fallback behavior to handle both CNAME subdomains and apex-domain A-record validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • jordanstephens
  • lionello

Poem

🐇 I hopped through zones both new and old,
Found the DNS path, suffix-bold.
For apex skies and CNAME trees,
The certs now dance with greater ease.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: Azure BYOD DNS zone lookup in the current subscription.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/azure-byod-dns-zone

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=warning msg="The linter 'gomodguard' is deprecated (since v2.12.0) due to: new major version. Replaced by gomodguard_v2."
level=warning msg="Suggested new configuration:\nlinters:\n enable:\n - gomodguard_v2\n"
level=warning msg="[linters_context] running gomodguard failed: unable to read module file go.mod: current working directory must have a go.mod file: if you are not using go modules it is suggested to disable this linter"
level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/pkg/clouds/azure/dns/dns.go (1)

93-95: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Wrap client setup errors with FindZone context.

Returning d.newZonesClient() errors bare makes fallback logs less actionable when auth/client construction fails before listing zones.

As per coding guidelines, “Preserve cloud-provider error detail while wrapping errors with operation context in cloud SDK wrappers.”

Proposed fix
 	client, err := d.newZonesClient()
 	if err != nil {
-		return "", err
+		return "", fmt.Errorf("creating DNS zones client: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/clouds/azure/dns/dns.go` around lines 93 - 95, The FindZone path
returns newZonesClient setup failures without operation context, which makes
Azure DNS fallback logs harder to interpret. Update FindZone in dns.go so that
errors from d.newZonesClient() are wrapped with FindZone context while
preserving the original error detail, following the same pattern used by other
cloud SDK wrappers.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pkg/cli/client/byoc/azure/byoc.go`:
- Around line 1042-1057: The certificate selection logic in the byoc Azure flow
leaves `si.ZoneId` and `si.UseAcmeCert` partially stale because only one side is
updated in each branch. In the `setUpLocation`/`FindZone` path, explicitly clear
`UseAcmeCert` when a DNS zone is found and explicitly clear `ZoneId` whenever
falling back to ACME. Update the `si` assignment around `si.ZoneId` and
`si.UseAcmeCert` so the fields are always mutually exclusive regardless of prior
state.

---

Nitpick comments:
In `@src/pkg/clouds/azure/dns/dns.go`:
- Around line 93-95: The FindZone path returns newZonesClient setup failures
without operation context, which makes Azure DNS fallback logs harder to
interpret. Update FindZone in dns.go so that errors from d.newZonesClient() are
wrapped with FindZone context while preserving the original error detail,
following the same pattern used by other cloud SDK wrappers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 591ae40f-e9c3-4119-aac1-a834fdf0ab9a

📥 Commits

Reviewing files that changed from the base of the PR and between a6ec064 and 5587fdf.

📒 Files selected for processing (2)
  • src/pkg/cli/client/byoc/azure/byoc.go
  • src/pkg/clouds/azure/dns/dns.go

Comment thread src/pkg/cli/client/byoc/azure/byoc.go
aca.IssueCert now handles apex custom domains (e.g. example.com) in addition
to subdomains, detecting apex from DNS/Azure rather than a caller hint:

- waitForBYODdns accepts an A record (apex routing) as well as a CNAME→app
  FQDN (subdomain), still requiring the asuid TXT in both cases.
- issueManagedCertificate falls back from CNAME validation to HTTP validation
  when Azure rejects CNAME with InvalidValidationMethod (the apex signal).
  HTTP validation needs no extra DNS record, so it completes unattended in the
  CD task — unlike the TXT _dnsauth dance, which remains a last-resort fallback.

The provider (pulumi-defang #305) creates the matching A + asuid records for
apex BYOD domains.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pkg/clouds/azure/aca/cert.go`:
- Around line 145-176: The readiness check in waitForBYODdns is too broad
because it marks routeOK true for any resolvable hostname IP after the CNAME
check fails. Tighten the apex-path validation by comparing the hostname’s
resolved IPs against the IPs for expectedCname, or better, thread the expected
Container Apps environment IPs into waitForBYODdns and validate against those in
this layer. Keep the existing asuid TXT check and prompt logic, but remove the
“any IP is enough” fallback so the function only proceeds when the hostname
actually points to the intended target.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f506d68c-a776-4fe9-9fc6-39b85292fa6e

📥 Commits

Reviewing files that changed from the base of the PR and between 5587fdf and 49f0210.

📒 Files selected for processing (1)
  • src/pkg/clouds/azure/aca/cert.go

Comment on lines +145 to +176
// waitForBYODdns blocks until a routing record and the asuid TXT record both
// resolve, prompting the user once with the values to add.
//
// The routing record is a CNAME → app FQDN for a subdomain, or an A record for
// an apex domain (no CNAME possible at the zone apex). We don't know which the
// hostname is, so we accept either: the precise CNAME→app-FQDN check, OR any
// resolvable address at the hostname (the apex A case — the env IP isn't known
// to this layer, and the cert step validates the apex binding over HTTP). The
// asuid TXT is always required.
func waitForBYODdns(ctx context.Context, hostname, expectedCname, expectedTxt string, resolverAt func(string) dns.Resolver) error {
asuid := "asuid." + hostname
deadline := time.Now().Add(dnsWaitTimeout)
promptShown := false

for {
cnameOK := dns.CheckDomainDNSReady(ctx, hostname, []string{expectedCname}, resolverAt)
routeOK := dns.CheckDomainDNSReady(ctx, hostname, []string{expectedCname}, resolverAt)
if !routeOK {
// Apex domains route via an A record, not a CNAME to the app FQDN;
// accept any resolvable address at the hostname.
if ips, err := resolverAt("").LookupIPAddr(ctx, hostname); err == nil && len(ips) > 0 {
routeOK = true
}
}
txtOK, _ := dns.LookupTXTContains(ctx, asuid, expectedTxt, resolverAt(""))
if cnameOK && txtOK {
if routeOK && txtOK {
term.Infof("DNS records for %s verified", hostname)
return nil
}
if !promptShown {
term.Printf("Configure DNS records for %s:\n", hostname)
term.Printf(" CNAME %s -> %s\n", hostname, expectedCname)
term.Printf(" CNAME %s -> %s (subdomain)\n", hostname, expectedCname)
term.Printf(" A %s -> <Container Apps environment IP> (apex)\n", hostname)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don’t accept any resolvable IP as apex readiness.

After the exact CNAME check fails, LookupIPAddr(hostname) makes routeOK true for any existing A/AAAA record or wrong CNAME target. That can proceed while the hostname still points away from Container Apps. Compare the hostname IPs with the resolved expectedCname IPs, or pass the expected environment IPs into this owning layer and print/validate those. As per coding guidelines, “Prefer a narrow option passed into the owning package over duplicating path discovery or state reconstruction in a caller” and “file-existence and working-directory behavior should be deterministic and tested at the owning layer.”

Possible localized direction
-			if ips, err := resolverAt("").LookupIPAddr(ctx, hostname); err == nil && len(ips) > 0 {
+			hostIPs, hostErr := resolverAt("").LookupIPAddr(ctx, hostname)
+			targetIPs, targetErr := resolverAt("").LookupIPAddr(ctx, expectedCname)
+			if hostErr == nil && targetErr == nil && hasCommonIP(hostIPs, targetIPs) {
 				routeOK = true
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/clouds/azure/aca/cert.go` around lines 145 - 176, The readiness check
in waitForBYODdns is too broad because it marks routeOK true for any resolvable
hostname IP after the CNAME check fails. Tighten the apex-path validation by
comparing the hostname’s resolved IPs against the IPs for expectedCname, or
better, thread the expected Container Apps environment IPs into waitForBYODdns
and validate against those in this layer. Keep the existing asuid TXT check and
prompt logic, but remove the “any IP is enough” fallback so the function only
proceeds when the hostname actually points to the intended target.

Source: Coding guidelines

@lionello lionello left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move zone lookup to CD code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants