Add DNS SRV discovery via postgres+srv:// URI scheme#2538
Conversation
When a connection string uses the postgres+srv:// or postgresql+srv://
URI scheme, or the srvhost= keyword parameter, pgconn resolves
_postgresql._tcp.{cluster} at connect time via DNS SRV (RFC 2782) and
uses the returned targets in priority/weight order as the list of hosts
to try. This allows pointing clients at a single DNS name that describes
an entire HA cluster; topology changes (failovers, node additions/
removals) require only a DNS record update with no application restart.
Design:
Config.SRVHost — cluster name for SRV lookup; set by ParseConfig
when the +srv URI scheme or srvhost= is used.
Config.LookupSRVFunc — pluggable resolver (default: net.DefaultResolver.
LookupSRV); replace in tests to mock DNS without
running a real nameserver.
Each SRV-resolved target gets its own TLS configuration with the correct
SNI ServerName (captured via a closure over the TLS settings from
ParseConfig), so sslmode=verify-full works against individual server
certificates. SRVHost is mutually exclusive with specifying hosts
directly in the connection string.
The feature is exercised against real public DNS records
(_postgresql._tcp.mmatvei.ru, four SRV entries at priorities 96-100)
without requiring a live PostgreSQL server, proving correct RFC 2782
priority ordering end-to-end.
New public API:
pgconn.LookupSRVFunc type
pgconn.Config.SRVHost field
pgconn.Config.LookupSRVFunc field
Test coverage:
TestParseConfigSRVScheme — URI scheme sets SRVHost
TestParseConfigSRVKeyword — srvhost= keyword sets SRVHost
TestParseConfigSRVAndHostMutuallyExclusive — error when both given
TestConnectSRVMocked — end-to-end via mocked LookupSRVFunc
TestConnectSRVMockedMultipleTargets — fallthrough on dead first target
TestConnectSRVAllTargetsDead — error when all targets unreachable
TestConnectSRVLookupFailure — DNS error propagated correctly
TestResolveSRVLive — real internet DNS, no Postgres needed
(set PGX_TEST_SRV_DNS_SERVER=<ip> to
bypass a stale recursive resolver)
Made-with: Cursor
|
Well, technically, pgx already solves the problem at hand by iterating over all IPs obtained from DNS-A record. So It's more of a design prototype patch. |
|
I try to match libpq whenever possible. If this hasn't been merged there yet (it appears it's still pending review for PG 20), I'd be reluctant to add it here if it is possible that the final PG implementation may differ. On a concrete note, changing the scheme to |
|
Thank, Jack! That's a very important heads-up about scheme. I've chosen "postgres+srv" because I just like it. But that might be not enough... I'll raise this question in pgsql-hackers thread about libpq. Can you give me few pointers to discussion of Go folks about this? |
|
See #2404 and golang/go#75859 for more info on how the |
|
The Go regression was specifically about multi-host host:port,host:port in the authority, not commas in general. A single-name SRV URL likely parses fine. Your point still stands though: only postgres/postgresql are grandfathered in net/url, so any new scheme is a compatibility bet unless we stick to postgresql://…?srvhost= or get upstream to bless another scheme. |
Add DNS SRV discovery via
postgres+srv://URI schemeProblem
When running a replicated PostgreSQL cluster, clients need to know the address of every node. Today the only way to express this in a connection string is an explicit comma-separated host list:
This creates operational coupling: every time a node is added, removed, or replaced, the connection string in every application must be updated and the application restarted. Managed database providers and HA tooling (Patroni, pg_auto_failover, etc.) cannot change the cluster topology without coordinating with application teams.
Solution
DNS SRV records (RFC 2782) were designed exactly for this: a single name that maps to an ordered, weighted list of
(host, port)endpoints. MongoDB adopted themongodb+srv://URI scheme for the same reason.This PR adds a
postgres+srv://URI scheme (and asrvhost=keyword parameter) that tells pgconn to resolve_postgresql._tcp.{cluster}at connect time and use the returned targets in priority/weight order as the list of hosts to attempt:DNS at the provider's side:
Adding or removing a node now requires only a DNS record change — no application config or restart.
API
Two new fields on
Config:Both connection string forms are supported:
SRVHostis mutually exclusive withhost=/ comma-separated hosts in the URI.TLS
Each SRV-resolved target gets its own TLS configuration with the correct
ServerNamefor that specific host, sosslmode=verify-fullworks correctly against individual server certificates — not just the cluster name.Testing without a DNS server
LookupSRVFunccan be replaced in tests to return any records without standing up a DNS server:Test plan
TestParseConfigSRVSchemepostgres+srv://andpostgresql+srv://setSRVHostTestParseConfigSRVKeywordsrvhost=keyword setsSRVHostTestParseConfigSRVAndHostMutuallyExclusivesrvhostandhostgivenTestConnectSRVMockedLookupSRVFuncPGX_TEST_TCP_CONN_STRING)TestConnectSRVMockedMultipleTargetsTestConnectSRVAllTargetsDeadTestConnectSRVLookupFailureTestResolveSRVLive_postgresql._tcp.mmatvei.ru), verifies RFC 2782 priority ordering end-to-endTestConnectSRVLivePGX_TEST_SRV_CONN_STRING)TestResolveSRVLiveruns against four real public SRV records and passes with no PostgreSQL server. If the system resolver has a stale negative cache, setPGX_TEST_SRV_DNS_SERVER=<nameserver-ip>to query a specific server.Prior art and related discussion