Skip to content

Add DNS SRV discovery via postgres+srv:// URI scheme#2538

Open
x4m wants to merge 1 commit into
jackc:masterfrom
x4m:dnssrv
Open

Add DNS SRV discovery via postgres+srv:// URI scheme#2538
x4m wants to merge 1 commit into
jackc:masterfrom
x4m:dnssrv

Conversation

@x4m
Copy link
Copy Markdown
Contributor

@x4m x4m commented Apr 21, 2026

Add DNS SRV discovery via postgres+srv:// URI scheme

Problem

When running a replicated PostgreSQL cluster, clients need to know the address of every node. Today the only way to express this in a connection string is an explicit comma-separated host list:

host=pg1.example.com,pg2.example.com,pg3.example.com port=5432 target_session_attrs=read-write

This creates operational coupling: every time a node is added, removed, or replaced, the connection string in every application must be updated and the application restarted. Managed database providers and HA tooling (Patroni, pg_auto_failover, etc.) cannot change the cluster topology without coordinating with application teams.

Solution

DNS SRV records (RFC 2782) were designed exactly for this: a single name that maps to an ordered, weighted list of (host, port) endpoints. MongoDB adopted the mongodb+srv:// URI scheme for the same reason.

This PR adds a postgres+srv:// URI scheme (and a srvhost= keyword parameter) that tells pgconn to resolve _postgresql._tcp.{cluster} at connect time and use the returned targets in priority/weight order as the list of hosts to attempt:

// Instead of hard-coding every node:
conn, err := pgconn.Connect(ctx,
    "postgres+srv://user:pass@cluster.example.com/mydb?target_session_attrs=read-write")

DNS at the provider's side:

_postgresql._tcp.cluster.example.com.  SRV  0  1  5432  pg1.example.com.
_postgresql._tcp.cluster.example.com.  SRV  0  1  5432  pg2.example.com.
_postgresql._tcp.cluster.example.com.  SRV  0  1  5433  pg3.example.com.

Adding or removing a node now requires only a DNS record change — no application config or restart.

API

Two new fields on Config:

// SRVHost is the cluster name for SRV discovery. Set automatically by
// ParseConfig when using the postgres+srv:// scheme or srvhost= keyword.
SRVHost string

// LookupSRVFunc resolves SRV records. Defaults to net.DefaultResolver.LookupSRV.
// Override in tests to return mock records without a real DNS server.
LookupSRVFunc func(ctx context.Context, service, proto, name string) (string, []*net.SRV, error)

Both connection string forms are supported:

# URI
postgres+srv://user:pass@cluster.example.com/mydb?sslmode=require

# Keyword/value
srvhost=cluster.example.com user=bob dbname=mydb sslmode=require

SRVHost is mutually exclusive with host= / comma-separated hosts in the URI.

TLS

Each SRV-resolved target gets its own TLS configuration with the correct ServerName for that specific host, so sslmode=verify-full works correctly against individual server certificates — not just the cluster name.

Testing without a DNS server

LookupSRVFunc can be replaced in tests to return any records without standing up a DNS server:

config, _ := pgconn.ParseConfig("postgres+srv://user@cluster.test/db?sslmode=disable")
config.LookupSRVFunc = func(ctx context.Context, service, proto, name string) (string, []*net.SRV, error) {
    return name, []*net.SRV{
        {Target: "127.0.0.1", Port: 5432, Priority: 0, Weight: 1},
        {Target: "127.0.0.1", Port: 5433, Priority: 0, Weight: 1},
    }, nil
}
conn, err := pgconn.ConnectConfig(ctx, config)

Test plan

Test What it covers Needs Postgres
TestParseConfigSRVScheme postgres+srv:// and postgresql+srv:// set SRVHost No
TestParseConfigSRVKeyword srvhost= keyword sets SRVHost No
TestParseConfigSRVAndHostMutuallyExclusive error when both srvhost and host given No
TestConnectSRVMocked end-to-end connect via mocked LookupSRVFunc Yes (PGX_TEST_TCP_CONN_STRING)
TestConnectSRVMockedMultipleTargets falls through dead first target to live second Yes
TestConnectSRVAllTargetsDead returns error when all targets unreachable No
TestConnectSRVLookupFailure DNS error wrapped and propagated No
TestResolveSRVLive real internet DNS (_postgresql._tcp.mmatvei.ru), verifies RFC 2782 priority ordering end-to-end No
TestConnectSRVLive full connect against real SRV-discovered server Yes (PGX_TEST_SRV_CONN_STRING)

TestResolveSRVLive runs against four real public SRV records and passes with no PostgreSQL server. If the system resolver has a stale negative cache, set PGX_TEST_SRV_DNS_SERVER=<nameserver-ip> to query a specific server.

Prior art and related discussion

When a connection string uses the postgres+srv:// or postgresql+srv://
URI scheme, or the srvhost= keyword parameter, pgconn resolves
_postgresql._tcp.{cluster} at connect time via DNS SRV (RFC 2782) and
uses the returned targets in priority/weight order as the list of hosts
to try. This allows pointing clients at a single DNS name that describes
an entire HA cluster; topology changes (failovers, node additions/
removals) require only a DNS record update with no application restart.

Design:

  Config.SRVHost       — cluster name for SRV lookup; set by ParseConfig
                         when the +srv URI scheme or srvhost= is used.
  Config.LookupSRVFunc — pluggable resolver (default: net.DefaultResolver.
                         LookupSRV); replace in tests to mock DNS without
                         running a real nameserver.

Each SRV-resolved target gets its own TLS configuration with the correct
SNI ServerName (captured via a closure over the TLS settings from
ParseConfig), so sslmode=verify-full works against individual server
certificates.  SRVHost is mutually exclusive with specifying hosts
directly in the connection string.

The feature is exercised against real public DNS records
(_postgresql._tcp.mmatvei.ru, four SRV entries at priorities 96-100)
without requiring a live PostgreSQL server, proving correct RFC 2782
priority ordering end-to-end.

New public API:
  pgconn.LookupSRVFunc type
  pgconn.Config.SRVHost field
  pgconn.Config.LookupSRVFunc field

Test coverage:
  TestParseConfigSRVScheme          — URI scheme sets SRVHost
  TestParseConfigSRVKeyword         — srvhost= keyword sets SRVHost
  TestParseConfigSRVAndHostMutuallyExclusive — error when both given
  TestConnectSRVMocked              — end-to-end via mocked LookupSRVFunc
  TestConnectSRVMockedMultipleTargets — fallthrough on dead first target
  TestConnectSRVAllTargetsDead      — error when all targets unreachable
  TestConnectSRVLookupFailure       — DNS error propagated correctly
  TestResolveSRVLive                — real internet DNS, no Postgres needed
                                     (set PGX_TEST_SRV_DNS_SERVER=<ip> to
                                     bypass a stale recursive resolver)

Made-with: Cursor
@x4m
Copy link
Copy Markdown
Contributor Author

x4m commented Apr 22, 2026

Well, technically, pgx already solves the problem at hand by iterating over all IPs obtained from DNS-A record. So It's more of a design prototype patch.

@jackc
Copy link
Copy Markdown
Owner

jackc commented Apr 25, 2026

I try to match libpq whenever possible. If this hasn't been merged there yet (it appears it's still pending review for PG 20), I'd be reluctant to add it here if it is possible that the final PG implementation may differ.

On a concrete note, changing the scheme to postgres+srv may have unintended consequences. Recent versions of Go have tightened up the URL parsing function. Our use would now has some incompatibilities, except in light of Go's strong backwards compatibility guarantees they special cased the postgresql scheme and preserve old behavior for us. If we use a different scheme then we will be get slightly different parsing behavior. I don't know what implications that would have.

@x4m
Copy link
Copy Markdown
Contributor Author

x4m commented Apr 26, 2026

Thank, Jack! That's a very important heads-up about scheme. I've chosen "postgres+srv" because I just like it. But that might be not enough... I'll raise this question in pgsql-hackers thread about libpq. Can you give me few pointers to discussion of Go folks about this?

@jackc
Copy link
Copy Markdown
Owner

jackc commented May 8, 2026

See #2404 and golang/go#75859 for more info on how the postgresql protocol is grandfathered into less strict behavior.

@x4m
Copy link
Copy Markdown
Contributor Author

x4m commented May 12, 2026

The Go regression was specifically about multi-host host:port,host:port in the authority, not commas in general. A single-name SRV URL likely parses fine. Your point still stands though: only postgres/postgresql are grandfathered in net/url, so any new scheme is a compatibility bet unless we stick to postgresql://…?srvhost= or get upstream to bless another scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants