Skip to content

DNS delegation set conflict check fails when transparent DNS proxy is active (VPN, ISP interception) #1955

@raphaeltm

Description

@raphaeltm

Bug Description

defang up for BYOC AWS fails with:

Error: failed to create a usable delegation set without conflicting NS records after multiple attempts

This happens when the user's machine has a transparent DNS proxy active (VPN, ISP DNS interception, corporate firewall, macOS network extensions like Cloudflare WARP, etc.).

Root Cause

The NS conflict check in nameServersHasConflict (src/pkg/cli/client/byoc/aws/domain.go:131-146) sends direct DNS queries from the user's machine to each Route 53 nameserver on UDP port 53:

// domain.go:131-146
func nameServersHasConflict(ctx context.Context, nameServers []string, domains []string, resolverAt func(string) dns.Resolver) (bool, error) {
    for _, nsServer := range nameServers {
        resolver := resolverAt(nsServer)
        for _, domain := range domains {
            if records, err := resolver.LookupNS(ctx, domain); err != nil {
                return false, err
            } else if len(records) > 0 {
                return true, nil
            }
        }
    }
    return false, nil
}

The underlying DNS query (src/pkg/dns/resolver.go:115-119) uses miekg/dns to send a raw UDP packet:

func (r DirectResolver) query(ctx context.Context, domain string, qtype uint16) (*dns.Msg, error) {
    req := &dns.Msg{}
    req.SetQuestion(dns.Fqdn(domain), qtype)
    return dns.ExchangeContext(ctx, req, r.NSServer+":53")
}

When a transparent DNS proxy intercepts UDP port 53, all queries are answered by the proxy's recursive resolver instead of the intended Route 53 nameserver. The recursive resolver correctly resolves defang.app NS and returns valid NS records for every query, regardless of which nameserver was targeted.

The LookupNS function (resolver.go:179-201) then parses both the Answer and Authority sections:

func (r DirectResolver) LookupNS(ctx context.Context, domain string) ([]*net.NS, error) {
    // ...
    for _, rr := range res.Ns {       // Authority section
        if ns, ok := rr.(*dns.NS); ok { ... }
    }
    for _, rr := range res.Answer {    // Answer section
        if ns, ok := rr.(*dns.NS); ok { ... }
    }
}

A recursive resolver response will have NS records in the Answer section for defang.app, causing every nameserver to appear "conflicting."

Evidence

Debug output shows ALL nameservers (11 unique across existing + 5 new delegation sets) returning records for defang.app — which is statistically impossible (~3.2e-13 probability per the code comment) unless something external is answering the queries:

Name server "ns-881.awsdns-46.net" has conflicting records for domain "defang.app": [...]
Name server "ns-1434.awsdns-51.org" has conflicting records for domain "defang.app": [...]
Name server "ns-1325.awsdns-37.org" has conflicting records for domain "defang.app": [...]
... (11 unique nameservers, ALL conflicting on defang.app, NONE on the project domain)

The user had a VPN active, which was intercepting DNS traffic.

Impact

Any user with a transparent DNS proxy (VPN, ISP interception, corporate network) cannot deploy to BYOC AWS. The error message gives no indication that DNS interception is the cause.

Proposed Solution

1. Detect DNS interception (CLI-side)

Before running the conflict check, verify that direct DNS queries actually reach the intended nameservers. For example:

  • Check the AA (Authoritative Answer) flag: A genuine Route 53 nameserver sets res.Authoritative = true. A recursive resolver/proxy response has AA=0. Reject non-authoritative responses in the conflict check.
  • Query a known nameserver for a domain it doesn't serve: If it returns records, DNS is being intercepted. Warn the user.

2. Cross-validate CLI vs cloud resolution

Run the NS conflict check from both the CLI (user's machine) and the CD task (running in the user's cloud). If there's a delta between what the CLI sees and what the cloud sees, flag it:

  • CLI sees conflicts on all nameservers + cloud sees no conflicts → transparent DNS proxy detected
  • Both see conflicts → genuine Route 53 issue

This could be done by having the CLI report its DNS findings and comparing against a cloud-side check (e.g., during the CD task's Pulumi run or via a lightweight pre-flight Lambda/Fargate task).

3. Improve the error message

At minimum, when ALL attempts fail (which is statistically near-impossible under normal conditions), the error message should suggest:

Error: failed to create a usable delegation set without conflicting NS records after multiple attempts.
This can happen if your network intercepts DNS traffic (VPN, corporate proxy, ISP DNS interception).
Try disabling your VPN or connecting to a direct internet connection.

4. Consider moving the check server-side

The most robust fix would be to move the NS conflict validation entirely to the CD task (which runs in the user's cloud with clean networking), rather than performing it from the user's local machine where DNS may be unreliable.

Reproduction

  1. Enable a VPN that intercepts DNS (Cloudflare WARP, corporate VPN, etc.)
  2. Run defang -s <stack> up targeting BYOC AWS
  3. Observe the delegation set creation fails on every attempt

Files

  • src/pkg/cli/client/byoc/aws/domain.gonameServersHasConflict, createUsableDelegationSet, findUsableDelegationSet
  • src/pkg/dns/resolver.goDirectResolver.query, DirectResolver.LookupNS
  • src/pkg/cli/composeUp.go — orchestration of PrepareDomainDelegation

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions