Add CancelAndDrainContextWatcherHandler#2534
Conversation
CancelRequest reads pid and secretKey, which are written during the connection handshake in connectOne. When a context watcher handler calls CancelRequest from a goroutine (as CancelRequestContextWatcherHandler does), these reads can race with the handshake writes. The 100ms delay means that this is rarely hit in practice, but it's not impossible. Fix by making pid an atomic.Uint32. The secretKey write is ordered before the atomic pid store, so a goroutine that observes a non-zero pid via Load is guaranteed to see the secretKey. CancelRequest now returns early when pid is zero (handshake incomplete, nothing to cancel).
| // in-flight results via SELECT 1 polling. Defaults to 5s. | ||
| DrainTimeout time.Duration | ||
|
|
||
| cancelFinishedChan chan struct{} |
There was a problem hiding this comment.
Why not just use the context.Context created in HandleCancel? (e.g. shutdownCtx)
There was a problem hiding this comment.
I think you're suggesting something like this:
type CancelAndDrainContextWatcherHandler struct {
Conn *PgConn
DeadlineDelay time.Duration
DrainTimeout time.Duration
doneFn context.CancelFunc // replaces cancelFinishedChan
doneCtx context.Context // replaces cancelFinishedChan
stopFn context.CancelFunc
}
func (h *CancelAndDrainContextWatcherHandler) HandleCancel(_ context.Context) {
h.doneCtx, h.doneFn = context.WithCancel(context.Background())
cancelCtx, stop := context.WithCancel(context.Background())
h.stopFn = stop
deadline := time.Now().Add(h.deadlineDelay())
h.Conn.conn.SetDeadline(deadline)
go func() {
defer h.doneFn()
reqCtx, cancel := context.WithDeadline(cancelCtx, deadline)
defer cancel()
h.Conn.CancelRequest(reqCtx)
}()
}
func (h *CancelAndDrainContextWatcherHandler) HandleUnwatchAfterCancel() {
if h.stopFn != nil {
h.stopFn()
}
if h.doneCtx != nil {
<-h.doneCtx.Done()
}
// ... rest unchanged
}this would work fine but is more allocation and diverges from the pattern in the other handler...I don't personally see much reason to prefer it, but I'm not against making that change.
There was a problem hiding this comment.
Similar to what I said below, this should all be handled in pgxpool or puddle (somehow), and not done at the PgConn layer.
|
|
||
| func (h *CancelAndDrainContextWatcherHandler) drainTimeout() time.Duration { | ||
| if h.DrainTimeout == 0 { | ||
| return 5 * time.Second |
There was a problem hiding this comment.
const defaultDrainTimeout = 5 * time.Second
return defaultDrainTimeout
There was a problem hiding this comment.
this is how I would do it if I owned this library, but I'm matching the existing style - there are no duration constants anywhere else.
|
|
||
| // HandleCancel is called when the context is cancelled. It sets a net.Conn deadline | ||
| // as a fallback and sends a PostgreSQL cancel request in a goroutine. | ||
| func (h *CancelAndDrainContextWatcherHandler) HandleCancel(_ context.Context) { |
There was a problem hiding this comment.
why is HandleCancel()'s arg being ignored and not passed to context.WithCancel()? Is this a different lifetime scope? It looks like we intend to inherit from the passed in context.
There was a problem hiding this comment.
the ctx passed here is already cancelled, this is the same pattern used by the existing handler:
Lines 2885 to 2887 in a5680bc
| reqCtx, cancel := context.WithDeadline(cancelCtx, deadline) | ||
| defer cancel() | ||
| h.Conn.CancelRequest(reqCtx) | ||
| }() |
There was a problem hiding this comment.
this pattern matches the pre-existing handler, there's only 1 goroutine here, and no meaningful error to return
| h.stopFn() | ||
| } | ||
| if h.cancelFinishedChan != nil { | ||
| <-h.cancelFinishedChan |
There was a problem hiding this comment.
defer close(doneCh) // doneCh here is h.cancelFinishedChan
reqCtx, cancel := context.WithDeadline(cancelCtx, deadline) // cancelCtx is cancelled by stopFnso if stopFn gets called, this channel is closed, and if that doesn't happen for some reason, it's still closed by the deadline
| type PgConn struct { | ||
| conn net.Conn | ||
| pid uint32 // backend pid | ||
| pid atomic.Uint32 // backend pid; atomic because CancelRequest reads it from a separate goroutine |
There was a problem hiding this comment.
Style pref: embed something like:
struct mu {
sync.RWMutex
pid uint32
secretKey []byte
}
if you want to perform dirty reads of mu.pid, then use pid atomic.Int64 to perform "dirty reads" but then it means you can only perform updates to pid while holding mu.Lock(). Recently I went down the rabbit hole of refactoring opensearch-go's connection pooling (it uses net/http under the hood for transport, however, so it gets "free" connection life cycle management from Go's stdlib - the analog would be a "Connection" in opensearch-go is a pgxpool and an actual HTTP request is a PgConn). The life of an individual PgConn is the responsibility of puddle, not PgConn itself (i.e. puddle should be able to see if a PgConn is in use or not and change the life cycle bits of the PgConn). Every pending query for a PgConn increments a counter, and every completed or errored out request decrements the counter. Whenever the count is 0, the connection is idle. A connection can have a lifecycle bit for draining to prevent new requests from being enqueued on the PgConn and telling the pooler to find a new connection. Something like:
Food for thought since there's been discussion for a while now about what would be involved in making a smart client for pgx, similar to what has been done in opensearch-go to give PostgreSQL applications read scale-out and connection load balancing without needing a truly clustered database.
There was a problem hiding this comment.
- I do personally like the
obj.mu.stuffProtectedByMustyle, but I don't see it used elsewhere in pgx and was trying to stick to conventions here. Also for this case, the atomic alone is sufficient since secretKey is protected by the happens-before on the pid store. - Client-side query counting is interesting but I'm not sure about its utility for this particular case, since cancels happen on separate connections and races serverside. Interesting in general, I think it would require breaking some of the existing pgxpool layering (i.e. drains happen before the conn is returned to the puddle layer)
There was a problem hiding this comment.
i.e. drains happen before the conn is returned to the puddle layer
imo, PgConn needs to be stupid: it needs to know if it's idle and able to accept a query, idle and shutting down, or active, that's it. Anything beyond that belongs in pgxpool or puddle, which is why I'm actually not a fan of polling in a loop like what's happening here. If pgxpool starts to shut down, it needs to wait until all outstanding active PgConn finish and return. As for shutting down and graceful lifecycle management of pool resources, if pgxpool's shutdown times out, the pgxpool can proactively cancel the contexts used by various PgConn. $0.02
There was a problem hiding this comment.
Part of the issue is that, by design in the pg wire protocol, an individual pgconn doesn't know that it's been cancelled on the client side at the time of cancellation. CancelRequest is sent on a separate TCP connection with no synchronization to the query in progress (protocol docs). When HandleUnwatchAfterCancel runs, we don't know whether the cancel signal has reached the server yet. If it arrives after ReadyForQuery, the server sets QueryCancelPending and the next query on the connection gets a spurious 57014.
But cancellation APIs are tied to the PgConn being cancelled in pgx: BuildContextWatcherHandler is on ConnConfig and ctxwatch lives in the pgconn package. If cancellation were handled at the pool level this would be a different conversation, but splitting "send the cancel" (pgconn) from "drain the cancel" (pgxpool) would be worse than either coherent design.
I did try doing this at the pgxpool level (spicedb commit). It required a sync.Map keyed by weak.Pointer[pgconn.PgConn] → weak.Pointer[cancelHandler] (both weak to avoid a GC cycle), plus a separate ready atomic.Int32 flag gated by an AfterConnect hook to guard the same pid/secretKey race that this PR fixes. That's a lot of machinery to work around the absence of the right hooks. Moving it into the cancel handler removes all of it.
Also, IMO the fact that this library already has the CancelHandler abstraction and an existing implementation (that has a race) signals to me that this is the right layer to put it, at least for how pgx is structured right now. I could see an argument that more lifecycle hooks need to be available so that external management is possible, but that seemed like a bigger change than fitting into the existing CancelHandler box.
There was a problem hiding this comment.
Thanks for digging into this. The problem is real and the drain approach is the right direction. This morning I prototyped an alternative that avoids the polling loop, and I think it addresses the same scenarios with less machinery. Before getting into the implementation, I laid out the message-level protocol flows for each case to prove to myself this would work.
All of these reference the PostgreSQL cancel protocol from src/include/libpq/pqcomm.h: the cancel request is always sent on a separate TCP connection (conn B) to the postmaster, which dispatches the SIGINT to the target backend on conn A.
Scenario 1: Cancel arrives before query completes
The 57014 is consumed by the original query. Connection is clean afterward. The drain sends ; anyway (harmless: one extra round-trip).
Client (conn A) Server Backend Client (conn B) Postmaster
|---Query(sql)--------->| | |
| | [executing] | |
| [ctx cancelled] | | |
| | |--CancelReq------>|
| | | |--SIGINT-->|
| | [interrupted] |<--close----------|
|<--ErrorResponse(57014)| | |
|<--ReadyForQuery-------| | |
| | | |
| [HandleUnwatchAfterCancel] | |
| [cancelState: sent -> idle] | |
|---Query(;)----------->| [drain] | |
|<--EmptyQueryResponse--| | |
|<--ReadyForQuery-------| | |
| | | |
v connection clean v v v
Scenario 2a: Cancel arrives at idle backend (no-op, the safe case)
Client (conn A) Server Backend Client (conn B) Postmaster
|---Query(sql)--------->| | |
| | [executing] | |
| [ctx cancelled] | | |
|<--CommandComplete-----| | |
|<--ReadyForQuery-------| | |
| | [idle] | |
| | DoingCommandRead=T |--CancelReq------>|
| | | |--SIGINT-->|
| | QueryCancelPending |<--close----------|
| | CHECK_FOR_INTERRUPTS| |
| | [flag cleared, noop]| |
| | | |
|---Query(next sql)---->| | |
| | DoingCommandRead=F | |
| | [executing] | |
|<--CommandComplete-----| | |
|<--ReadyForQuery-------| | |
v OK: cancel was a no-op, next query clean v v
Scenario 2b: Cancel arrives during next query (the race)
Client (conn A) Server Backend Client (conn B) Postmaster
|---Query(sql)--------->| | |
| | [executing] | |
| [ctx cancelled] | | |
|<--CommandComplete-----| |--CancelReq------>|
|<--ReadyForQuery-------| | |
| | [idle] | |
|---Query(next sql)---->| | |
| | CHECK_FOR_INTERRUPTS| |
| | [no cancel pending] | |
| | DoingCommandRead=F | |--SIGINT-->|
| | [executing next] |<--close----------|
| | QueryCancelPending | |
| | ProcessInterrupts | |
|<--ErrorResponse(57014)| | |
|<--ReadyForQuery-------| | |
| | | |
v WRONG: 57014 hit the next query v v
Scenario 3: Single-; drain absorbs the cancel
Client (conn A) Server Backend Client (conn B) Postmaster
|---Query(sql)--------->| | |
| | [executing] | |
| [ctx cancelled] | | |
|<--CommandComplete-----| |--CancelReq------>|
|<--ReadyForQuery-------| | |
| | [idle] | |
| [HandleUnwatchAfterCancel] | |
|---Query(;)----------->| [drain] | |
| | CHECK_FOR_INTERRUPTS| |
| | [no cancel pending] | |
| | DoingCommandRead=F | |--SIGINT-->|
| | [executing ;] |<--close----------|
| | QueryCancelPending | |
| | ProcessInterrupts | |
|<--ErrorResponse(57014)| [drain absorbed it] | |
|<--ReadyForQuery-------| | |
| | | |
|---Query(next sql)---->| [clean] | |
| | DoingCommandRead=F | |
| | [executing] | |
|<--CommandComplete-----| | |
|<--ReadyForQuery-------| | |
v OK: drain absorbed the 57014, next query clean v
Scenario 3 (alternate): Cancel already cleared, drain is clean
Client (conn A) Server Backend Client (conn B) Postmaster
|---Query(sql)--------->| | |
| | [executing] | |
| [ctx cancelled] | | |
|<--CommandComplete-----| |--CancelReq------>|
|<--ReadyForQuery-------| | |--SIGINT-->|
| | [idle] |<--close----------|
| | QueryCancelPending | |
| | CHECK_FOR_INTERRUPTS| |
| | [flag cleared, noop]| |
| | | |
| [HandleUnwatchAfterCancel] | |
|---Query(;)----------->| [drain] | |
|<--EmptyQueryResponse--| | |
|<--ReadyForQuery-------| | |
| | | |
|---Query(next sql)---->| [clean] | |
|<--CommandComplete-----| | |
|<--ReadyForQuery-------| | |
v OK: cancel was no-op, drain confirmed clean v v
One ; is sufficient because PostgreSQL sets QueryCancelPending at most once per cancel signal. After the 57014 is raised and sent, the flag is cleared. There is no mechanism for a second 57014 from the same cancel.
Scenario 4: CancelRequest fails (dial error, max conns, network issue)
The cancel never reaches the server. No 57014 is possible. CancelRequest transitions cancelState back to idle, so HandleUnwatchAfterCancel skips the drain entirely.
Client (conn A) Server Backend Client (conn B) Postmaster
|---Query(sql)--------->| | |
| | [executing] | |
| [ctx cancelled] | | |
| [net.Conn deadline] | |--dial------> X (refused)
| | | |
| | | [CancelRequest returns err]
| | | [state -> idle]
| | | |
|<--I/O timeout---------| | |
| (or query completes) | | |
| | | |
| [HandleUnwatchAfterCancel] | |
| [cancelState == idle -> skip drain] | |
| | | |
v no drain needed v v v
Scenario 5: Duplicate cancel (mutex prevents double-send)
A second CancelRequest call races with an in-flight cancel. The mutex ensures only one cancel request is ever sent. Concurrent callers block on the in-flight done context until the cancel completes, then return nil.
Client (conn A) Server Backend Client (conn B) Postmaster
|---Query(sql)--------->| | |
| | [executing] | |
| [ctx cancelled] | | |
| [CancelRequest #1: idle -> inFlight] | |
| | |--CancelReq------>|
| | | |
| [CancelRequest #2: inFlight -> blocks] | |
| [CancelRequest #1 completes -> sent] | |
| [CancelRequest #2 unblocks -> nil] | |
| | | |
v only one cancel sent v v v
Alternative approach: mutex-guarded state machine + single drain
In #2537 I prototyped an alternative that replaces the polling loop with a state machine on PgConn. The key differences are:
-
No loop.
HandleUnwatchAfterCancelsends exactly one;. PostgreSQL processes at most one cancel per backend (QueryCancelPendingis a single flag, not a queue), so one round-trip is sufficient to absorb it. -
Mutex-guarded cancel state on
CancelRequest. A three-state machine (idle/inFlight/sent) embedded onPgConntracks cancel lifecycle. The coordination lives inCancelRequestitself (not inHandleCancel) becauseCancelRequestis a public method callable from anywhere (the context watcher,asyncClose, or direct user code). Concurrent callers that hitinFlightblock on a done context; callers that hitsentreturn nil immediately. IfsendCancelRequestfails, the state transitions back toidleso the drain is skipped entirely (scenario 4). -
Atomic
backendKeyData. Instead of making justpidatomic (withsecretKeyrelying on an implicit happens-before from the atomic store ofpid), both are grouped in abackendKeyDatastruct behindatomic.Pointer. This makes the synchronization explicit and the nil check ("handshake not complete") natural. -
Context inheritance.
HandleCancel(ctx)wraps the caller's context withcontext.WithoutCancel(ctx)so values (trace IDs, etc.) propagate into the cancel request without inheriting the already-fired cancellation.
Unrelated, but I stumbled across the magic numbers 80877102 and 80877103, and couldn't resist extracting them into cancelRequestCode and negotiateSSLCode, referencing CancelRequestPacket in src/include/libpq/pqcomm.h. The cancel packet field offsets are also named constants.
The state machine has three states and three transitions:
CancelRequest (idle -> inFlight): we own the cancel, proceed to sendCancelRequest
CancelRequest (send fails): inFlight -> idle -- no 57014 possible, skip drain (scenario 4)
CancelRequest (send succeeds): inFlight -> sent -- drain needed
HandleUnwatchAfterCancel (sent): sent -> idle, drain once (scenarios 1, 2, 3)
HandleUnwatchAfterCancel (idle): skip drain (scenario 4, or cancel never fired)
CancelRequest (concurrent, inFlight): block on done context (scenario 5)
CancelRequest (concurrent, sent): return nil immediately (scenario 5)
In #2537, the handler implements the same ctxwatch.Handler interface, and the test suite covers all five scenarios above including a concurrent CancelRequest stress test that passes with -race.
Aside: ctxwatch.Handler interface design
While working through this I noticed that HandleCancel returns nothing:
type Handler interface {
HandleCancel(canceledCtx context.Context)
HandleUnwatchAfterCancel()
}I think this is an interface design bug. Cancellation in pgx involves a network operation: dialing a new TCP connection to the postmaster and writing the cancel packet. That can fail (dial refused, max connections, timeout), and the caller has no way to know. It's the same class of mistake as ignoring the error from close(2): usually fine, but when it matters (and with cancellation it always matters), you've lost the signal.
Now, when CancelRequest fails, CancelRequest transitions the state back to idle and the drain is skipped, which I believe is the correct behavior because the context watcher has no idea the cancel failed. If HandleCancel returned an error, the watcher could surface it through the tracer or take a different code path (e.g., retry or close the connection immediately instead of hoping for the best).
This would be a breaking interface change, so it's not something for this PR unless there's an appetite for doing that. At present, the current void return silently swallows failures in a network-critical path.
There was a problem hiding this comment.
I think you're right that only one ; is required after reviewing the pg spec, so I removed the loop here. If the connection gets the 57014 then the next query should succeed.
I don't personally think that the juice is worth the squeeze on the state machine added to pgconn - the cancel send failing seems rare and in case it does happen, an extra ; send won't hurt anything. Weren't you also against modifying pgconn earlier in our discussion?
I'll leave this up, I'm happy if either this or your alternative merge, hopefully we can get some maintainer eyes on this soon?
There was a problem hiding this comment.
Yeah, I can't tell you how much I despise looping like this with external resources, no jitter, etc. It works, but it's just not right, and a layering violation. I tried a few iterations of this with an atomic bool on PgConn to indicate a cancellation had been sent, but didn't like how that turned out (I forget exactly what it was, but I think it was a cancel during initial handshake that I couldn't fix, and blocking real-time event cancellation). Instead of adding a naked chan to select on, I added a context so that all lifecycle operations of a connection could be idiomatically edge-triggered with no specious sleeps masking races. It ended up being 56B per PgConn, which ... I think is fine in the greater scheme of things.
Re: "weren't you against modifying pgconn", 100% valid point. I pushed back on the earlier suggestions (query counters, lifecycle bits) because those are pool-level concerns that belong in puddle/pgxpool. The cancelMu is different: it's narrowly scoped to the one thing PgConn uniquely knows: whether a cancel request was sent on this connection. No other layer has that information.
The cancel key is generated once per connection via pg_strong_random() (postgres.c line 4360), is unique per connection, and is never shared with another process. That means the client has complete local authority over the cancel lifecycle and can safely gate it.
The other reason the mutex matters: every CancelRequest forces PostgreSQL to fork a new backend process. The postmaster accepts the cancel connection (postmaster.c line 1727 -> BackendStartup -> fork_process()), the child walks the ProcSignal shared memory array under a spinlock, does a timingsafe_bcmp on the cancel key (procsignal.c lines 764-765), sends kill(-backendPID, SIGINT) (procsignal.c line 781), then exits. DoS'ing the server with a fork(2) + shared memory init + kill(2) + process teardown per cancel request is really high, especially when these can stack if the RTT between the client and server are high. The mutex coalesces concurrent cancel attempts into exactly one server-side fork (i.e. server-resources are more valuable to me than the client-side 56B).
With reliable cancellation like this, request hedging across multiple PG replicas becomes a possibility with pgx: send the same query to N read-only cluster backends, take the first result, cancel the rest with exactly one server-side fork per cancelled connection, drain the 57014 deterministically, and reuse the connection (and the reason why I pulled on this thread to begin with).
There was a problem hiding this comment.
the one thing PgConn uniquely knows: whether a cancel request was sent on this connection. No other layer has that information.
...
That means the client has complete local authority over the cancel lifecycle and can safely gate it.
I think this is not true in the general. A layer in between (i.e. say you wanted to write a proxy like pgbouncer using pgx) may hold the network connection in a different process than the cancellation signal originates from.
I tend to think keeping it out makes more sense just with how the pg wire protocol works (separate cancel signal on new connection).
But like I said, I'm okay with either approach merging.
There was a problem hiding this comment.
A layer in between (i.e. say you wanted to write a proxy like pgbouncer using pgx) may hold the network connection in a different process than the cancellation signal originates from.
Even in a proxy, the cancel has to go through the cancel key that belongs to a specific backend connection. pgbouncer follows the same pattern: it receives the client's cancel on a new connection, looks up the PID/key for the backend connection, and opens another new connection (including TLS) to the real server to forward it. The cancel key is per-connection.
In pgx, the cancel key lives on PgConn. Any code that wants to cancel - whether it's the context watcher, asyncClose, user code, or a proxy layer - must call CancelRequest on the PgConn that owns the key. The mutex gates that single call site.
Cancel requests cannot be sent on an existing connection. This is a wire protocol constraint, not a design choice. There are exactly two client-side code paths in the PostgreSQL source that emit CANCEL_REQUEST_CODE, and both open a new TCP connection:
-
PQcancel()(fe-cancel.c:577): the old, signal-safe API. Opens a raw socket viasocket()+connect(), sends the cancel packet, waits for EOF, closes. The comment at line 573: "We need to open a temporary connection to the postmaster." -
PQcancelCreate()/PQcancelPoll()(fe-connect.c:3722): the new, non-blocking API. Creates a freshPGconnwithcancelRequest = trueand goes through the normal connection establishment state machine (including TLS). Once the connection is up, it callsPQsendCancelRequest()instead of sending a startup packet, then waits for the server to close the connection.
On the server side, backend_startup.c:567 is the only place that checks for CANCEL_REQUEST_CODE, inside ProcessStartupPacket, which runs once per new connection. There is no cancel message type in the regular wire protocol: an established backend expects normal frontend messages (Query, Parse, Bind, Execute, etc.). Every cancel forces the server to fork a new backend process, walk shared memory under a spinlock, do a timingsafe_bcmp on the cancel key, and kill() the target (procsignal.c:747-781).
That TLS and fork cost is why the mutex matters: it coalesces concurrent cancel attempts into one server-side fork instead of N.
The other option I pondered was a central "cancellation resource" in pgxpool, something like a MaxConcurrentCancels int tunable to limit the blast radius of DoS'ing PG with connection setup fork+TLS+kill cycles. Right now pgxpool doesn't have that kind of cancel-aware coordination, and PgConn is standalone: it's not tightly coupled to a pool. Since PgConn already knows its own cancel secret and is the only thing that can act on it, I pushed the state machine down into PgConn in my prototype. If pgxpool ever grows cancel-aware resource management, the per-connection mutex still composes cleanly with a pool-level semaphore.
Random aside: using pg_cancel_backend() would be a cheaper alternative and worth considering wiring in at the pgxpool layer.
pg_cancel_backend(pid) (signalfuncs.c:134-160) sends SIGINT to the target backend on an existing connection (no new TCP connection, no server-side fork, no cancel key) using an SQL function. Internally, it calls BackendPidGetProc(pid) to find the PGPROC in shared memory, does a role-based permission check, and kill(-pid, SIGINT) to cancel the running query. The backend stays alive, the running query gets a 57014, and the connection returns to ReadyForQuery.
At the pgxpool level, if there is an idle connection in the pool, we could run SELECT pg_cancel_backend($1) instead of opening a new connection with the cancel protocol. No TCP handshake, no TLS negotiation, no fork() + procsignal walk + kill() + process teardown on the server. Just a regular query on an existing connection delivering the same signal.
Caveats:
- Requires the connecting role to have
pg_signal_backendprivileges (or be the same role as the target, or be a superuser). The cancel protocol's cancel key authorization is per-connection and doesn't have this restriction. - Only works when there's an idle connection available in the pool. If every connection is busy, you're back to the cancel protocol or spinning up a new connection and adding it to the pool.
- Doesn't work for standalone
PgConnusage without a pool.
For the common case of pgxpool with connections available, pg_cancel_backend(pid) on an idle connection would be strictly better than the cancel protocol. Something to consider for a future pool-aware cancel path.
As architectural warts go, I'm not thrilled with how PostgreSQL handles query cancellation and really wish there were a cancel message in the regular wire protocol that would allow for backend reuse. We can cancel via pg_cancel_backend(pid) over a normal connection, so the server already supports the operation inline. A new frontend message type would eliminate the fork-per-cancel overhead entirely. But that's a much larger fish to fry, and I'm also not sure how to handle that cleanly with multiple active portals.
|
|
||
| outer: | ||
| for { | ||
| pgConn.frontend.Send(&pgproto3.Query{String: "SELECT 1"}) |
There was a problem hiding this comment.
I think you can just send ; for an even shorter SQL query, fwiw. I'm not sure that polling on this value is right, however. Why not use an atomic.Int64 as a gauge for the number of queries in queue for a connection and when it hits 0, then close the connection? I haven't looked at pgx recently, but shouldn't the lifecycle of a connection be handled by puddle or some other connection pooler that owns the connection object?
There was a problem hiding this comment.
I'll test out ;, that seems like it should work to me.
Why not use an atomic.Int64 as a gauge for the number of queries in queue for a connection and when it hits 0, then close the connection?
That's not really the problem this is addressing. It doesn't matter if there are 0 active queries on the connection - once we've sent the Cancel (which is async and on a different connection) we don't know if the original query finished before the cancel was processed by the server. If it did, a subsequent query on the same connection may get a spurious 57014. The only way to know (as far as I can tell) is to send a query and see what happens.
In theory we could add tracking to pgconn to skip the probe when we already saw a 57014 (meaning the cancel was consumed), but the cost of one extra ";" round trip didn't seem worth the complexity. If it weren't for addressing the (exisitng) race on pid I wouldn't have touched pgconn.go at all with this PR. I'm not against adding additional tracking but ideally only with some consensus here first.
There was a problem hiding this comment.
Switched to ; and will continue to test with that.
Add a context watcher handler that deterministically drains stale cancel signals instead of relying on a fixed 100ms sleep. When a context is cancelled, HandleCancel sends the cancel request immediately. After the query returns, HandleUnwatchAfterCancel polls SELECT 1 to absorb any in-flight SQLSTATE 57014 before releasing the connection. The drain runs inside Unwatch via HandleUnwatchAfterCancel, so every pgconn operation that watches a context (Exec, ExecParams, Prepare, Deallocate, CopyTo, CopyFrom, Pipeline, WaitForNotification) gets drain-after-cancel automatically with no per-call-site wiring.
|
A few thoughts: I don't understand how a race on When using a pool, I know this is constrained by the PG cancel protocol and the pgx architecture, but the cancel handler actually sending a query and waiting to receive the results seems a bit icky. How would this work with pipeline mode? If this can be made to work perfectly reliably, would it make sense to have this implementation replace the |
It is triggerable in the current codebase. During In #2537 this is fixed by grouping both fields into a
Agreed.
I share the instinct that a cancel handler sending a query is a layering violation. The problem is that the cancel protocol leaves no alternative: the cancel request is asynchronous, sent on a separate TCP connection, with no acknowledgement on the original connection. After The
The cancel protocol has no concept of cancelling an individual pipelined statement. That said, the drain works correctly in this context. Two outcomes:
One edge case to note: if
I agree it should replace the existing handler if it works reliably. The sleep-based handler is strictly worse: slower (100ms minimum vs one round-trip) and unreliable (the 100ms is a guess that fails under load or if someone is stuck using CockroachDB). The only argument for keeping both would be if someone wanted to avoid the extra round-trip entirely, but anyone who cares about cancel correctness needs the drain. One implementation difference worth flagging: |
|
I have a few more thoughts. First, the way you describe the PostgreSQL server's handling of cancel signals didn't match my understanding. pgx's cancel handling probably would have been designed differently had that been my understanding. However, as far as I can tell, what you describe isn't actually right. In particular, scenario 2 is wrong. I made a test specifically to check. package pgconn_test
// Investigation test for PR #2534 (https://github.com/jackc/pgx/pull/2534).
//
// PR #2534's scenario 2 claims that a cancel signal arriving at an *idle*
// PostgreSQL backend (after ReadyForQuery) sets QueryCancelPending, which the
// *next* query then consumes as a spurious SQLSTATE 57014.
//
// This file empirically verifies that claim.
//
// Run:
// go test -run TestCancelRequestToIdleBackend -v ./pgconn/
//
// Requires PGX_TEST_DATABASE to be set.
import (
"context"
"errors"
"os"
"testing"
"time"
"github.com/jackc/pgx/v5/pgconn"
"github.com/stretchr/testify/require"
)
// TestCancelRequestToIdleBackend tests scenario 2 from PR #2534. For a range
// of delays it:
//
// 1. completes a quick query so the backend is idle (ReadyForQuery received),
// 2. sends a cancel signal,
// 3. sleeps the test's delay so the SIGINT has time to be delivered,
// 4. runs another query and counts whether it returns SQLSTATE 57014.
//
// If the PR's claim were true, we'd expect the next query to return 57014
// (with sufficient delay for the SIGINT to be delivered).
func TestCancelRequestToIdleBackend(t *testing.T) {
connString := os.Getenv("PGX_TEST_DATABASE")
if connString == "" {
t.Skip("PGX_TEST_DATABASE not set")
}
delays := []time.Duration{
0,
10 * time.Millisecond,
100 * time.Millisecond,
500 * time.Millisecond,
}
for _, delay := range delays {
t.Run(delay.String(), func(t *testing.T) {
bleeds := runIdleCancelTrials(t, connString, delay)
t.Logf("delay=%s: %d/10 trials had next query return 57014", delay, bleeds)
})
}
}
func runIdleCancelTrials(t *testing.T, connString string, delay time.Duration) int {
const trials = 10
bleeds := 0
for trial := 0; trial < trials; trial++ {
pgConn, err := pgconn.Connect(context.Background(), connString)
require.NoError(t, err)
_, err = pgConn.Exec(context.Background(), "select 1").ReadAll()
require.NoError(t, err)
// Backend is now idle (ReadyForQuery received). Send a cancel.
err = pgConn.CancelRequest(context.Background())
require.NoError(t, err)
// Give the SIGINT time to be delivered to the (still-idle) backend.
time.Sleep(delay)
_, err = pgConn.Exec(context.Background(), "select 2").ReadAll()
var pgErr *pgconn.PgError
if errors.As(err, &pgErr) && pgErr.Code == "57014" {
bleeds++
}
pgConn.Close(context.Background())
}
return bleeds
}Running it yields: If the cancel arrives after the original query is complete there is no This means that it would be possible for a cancel request to be sent that never gives any response to the original backend. That is, it still is possible for a cancel request to hit the wrong query under adverse timing conditions. Lastly, the code has a strong agent smell to it. That's okay, I use agents every day, too (even to help write the test above). But it's only acceptable if it's disclosed, see our contributing guidelines. The comments on this thread also sound an awful lot like agent-speak as well. I've spent probably an hour today on this issue and I hope I'm not wasting my time talking to someone else's agent. |
You're right, the description for scenario 2 is wrong. I was sloppy describing the mechanism. I read What actually happens: if (!DoingCommandRead)
{
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to user request")));
}And the main loop clears stale cancels before processing the next command (postgres.c:4808-4818): /*
* Query cancel is supposed to be a no-op when there is no query in
* progress, so if a query cancel arrived while we were idle, just
* reset QueryCancelPending.
*/
CHECK_FOR_INTERRUPTS();
DoingCommandRead = false;The sequence is: That said, this clearing behavior has a design assumption that is not guaranteed by the protocol, namely that the query and cancel messages arrive in the order in which they were initiated. The cancel and query travel on separate TCP connections and are not guaranteed to arrive in order.
Exactly. If the original query completes normally before the cancel is delivered, the client receives The client cannot reuse the connection while a cancel is in flight. The drain addresses this: the I have also updated my earlier comment with corrected diagrams.
Yes, I use Claude Code for drafting and code generation (I was willing to dedicate one cup of coffee to this issue today and didn't want to cherry-pick and reword, and was at Google Next this week, too, so was being even more hasty in my responses). But, the investigation is mine (I traced the protocol flows, read the PostgreSQL source, ran the tests), but the output goes through an agent, and I should have disclosed that upfront. If there's interest in pursuing #2537, I'll update its description accordingly. Apologies for not doing so from the start, I didn't notice the contribution guidelines change. |
|
Thanks for taking a look at this! Responses inline.
A couple of reasons I avoided this:
But I'm not against it, I'd just want to make sure to understand what you're looking for before working on it.
It was pre-existing and I only noticed it because the go race detector did when I made the changes in this PR.
Fair enough, though to me the ickyness seems to be due to pg protocol itself.
Is there something specific that would need to happen differently? I added a test for pipeline mode and it passed as I expected so I didn't give it much extra thought.
I think arguably that's the case, but I didn't want to assume I knew all pgx user's use-cases. Since the interface was there it seemed fine to have both options but I'd defer to you. I've built and tested https://github.com/authzed/spicedb with this fork and haven't seen any issues, but it doesn't exercise every pg/crdb feature by any means.
The code is agent assisted. My read of the contribution guidelines was that I could submit without mentioning it as long as I understand and stand by the code as if it were mine (which I do, and have had to defend in comments above. I mostly used it to check that I was sticking close to conventions in the codebase and for the tests). As for the github discussion: All of my comments are just me, no agents. |
|
With regards to the race on I am generally in favor of this idea, but I am still have a few concerns.
I'm not sure about this assertion. It still seems to me that there is a potential race condition, or at least a dependency on non-guaranteed behavior. As I understand this intention, the idea is:
The I don't think the behavior we are hoping for is documented by PostgreSQL. We may be able to determine that behavior by PostgreSQL source code, but it makes me nervous to rely on undocumented behavior.
I was worried about when a pipeline could have sent multiple |
|
Just thinking a little more. I wonder if the drain path could only be called as needed? The only case it is really needed is if the original query completed before the cancel request was processed. Might not fit in the current context canceled handler pattern though. |
The pre-existing
CancelRequestContextWatcherHandlersleeps 100ms after sending a cancel request to avoid stale 57014 errors bleeding into the next query. We have seen this cause issues when we tried to use it in production: we'd see non-negligable rates of errors where the error was actually from the previous query on the connection, not the "current" one.This PR adds a new
CancelAndDrainContextWatcherHandlerwhich replaces the sleep with a deterministic drain: after the cancelled query returns,HandleUnwatchAfterCancelsends;to absorb any in-flight 57014 before releasing the connection. Because the drain runs insideUnwatch(), every pgconn operation that watches a context gets it automatically. I am still looking to validate this in more real world scenarios, but I think it's ready for some early feedback if possible.Usage:
This PR also fixes a pre-existing data race in
CancelRequest:pidandsecretKeyare written during the connection handshake but read from a concurrent goroutine by any handler that callsCancelRequestinHandleCancel. The fix makespidanatomic.Uint32and skips the cancel request when the handshake hasn't completed yet. I couldn't find a better way to avoid this race even though I was wary of adding a new atomic topgconn, but it shouldn't actually be used in any hot path.