Skip to content

Add CancelAndDrainContextWatcherHandler#2534

Open
ecordell wants to merge 2 commits into
jackc:masterfrom
ecordell:select1cancel
Open

Add CancelAndDrainContextWatcherHandler#2534
ecordell wants to merge 2 commits into
jackc:masterfrom
ecordell:select1cancel

Conversation

@ecordell
Copy link
Copy Markdown
Contributor

@ecordell ecordell commented Apr 15, 2026

The pre-existing CancelRequestContextWatcherHandler sleeps 100ms after sending a cancel request to avoid stale 57014 errors bleeding into the next query. We have seen this cause issues when we tried to use it in production: we'd see non-negligable rates of errors where the error was actually from the previous query on the connection, not the "current" one.

This PR adds a new CancelAndDrainContextWatcherHandler which replaces the sleep with a deterministic drain: after the cancelled query returns, HandleUnwatchAfterCancel sends ; to absorb any in-flight 57014 before releasing the connection. Because the drain runs inside Unwatch(), every pgconn operation that watches a context gets it automatically. I am still looking to validate this in more real world scenarios, but I think it's ready for some early feedback if possible.

Usage:

config.BuildContextWatcherHandler = func(pgConn *pgconn.PgConn) ctxwatch.Handler {
    return &pgconn.CancelAndDrainContextWatcherHandler{
        Conn:          pgConn,
        DeadlineDelay: time.Second,     // net.Conn deadline to unblock stuck reads (default 1s)
        DrainTimeout:  5 * time.Second, // max time for sending `;` drain (default 5s)
    }
}

This PR also fixes a pre-existing data race in CancelRequest: pid and secretKey are written during the connection handshake but read from a concurrent goroutine by any handler that calls CancelRequest in HandleCancel. The fix makes pid an atomic.Uint32 and skips the cancel request when the handshake hasn't completed yet. I couldn't find a better way to avoid this race even though I was wary of adding a new atomic to pgconn, but it shouldn't actually be used in any hot path.

CancelRequest reads pid and secretKey, which are written during the
connection handshake in connectOne. When a context watcher handler calls
CancelRequest from a goroutine (as CancelRequestContextWatcherHandler
does), these reads can race with the handshake writes. The 100ms delay
means that this is rarely hit in practice, but it's not impossible.

Fix by making pid an atomic.Uint32. The secretKey write is ordered before
the atomic pid store, so a goroutine that observes a non-zero pid via
Load is guaranteed to see the secretKey. CancelRequest now returns early
when pid is zero (handshake incomplete, nothing to cancel).
// in-flight results via SELECT 1 polling. Defaults to 5s.
DrainTimeout time.Duration

cancelFinishedChan chan struct{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the context.Context created in HandleCancel? (e.g. shutdownCtx)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're suggesting something like this:

type CancelAndDrainContextWatcherHandler struct {
	Conn          *PgConn
	DeadlineDelay time.Duration
	DrainTimeout  time.Duration

	doneFn context.CancelFunc   // replaces cancelFinishedChan
	doneCtx context.Context     // replaces cancelFinishedChan
	stopFn context.CancelFunc
}

func (h *CancelAndDrainContextWatcherHandler) HandleCancel(_ context.Context) {
	h.doneCtx, h.doneFn = context.WithCancel(context.Background())
	cancelCtx, stop := context.WithCancel(context.Background())
	h.stopFn = stop

	deadline := time.Now().Add(h.deadlineDelay())
	h.Conn.conn.SetDeadline(deadline)

	go func() {
		defer h.doneFn()
		reqCtx, cancel := context.WithDeadline(cancelCtx, deadline)
		defer cancel()
		h.Conn.CancelRequest(reqCtx)
	}()
}

func (h *CancelAndDrainContextWatcherHandler) HandleUnwatchAfterCancel() {
	if h.stopFn != nil {
		h.stopFn()
	}
	if h.doneCtx != nil {
		<-h.doneCtx.Done()
	}
	// ... rest unchanged
}

this would work fine but is more allocation and diverges from the pattern in the other handler...I don't personally see much reason to prefer it, but I'm not against making that change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what I said below, this should all be handled in pgxpool or puddle (somehow), and not done at the PgConn layer.


func (h *CancelAndDrainContextWatcherHandler) drainTimeout() time.Duration {
if h.DrainTimeout == 0 {
return 5 * time.Second
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const defaultDrainTimeout = 5 * time.Second
return defaultDrainTimeout

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how I would do it if I owned this library, but I'm matching the existing style - there are no duration constants anywhere else.


// HandleCancel is called when the context is cancelled. It sets a net.Conn deadline
// as a fallback and sends a PostgreSQL cancel request in a goroutine.
func (h *CancelAndDrainContextWatcherHandler) HandleCancel(_ context.Context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is HandleCancel()'s arg being ignored and not passed to context.WithCancel()? Is this a different lifetime scope? It looks like we intend to inherit from the passed in context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ctx passed here is already cancelled, this is the same pattern used by the existing handler:

pgx/pgconn/pgconn.go

Lines 2885 to 2887 in a5680bc

func (h *DeadlineContextWatcherHandler) HandleCancel(ctx context.Context) {
h.Conn.SetDeadline(time.Now().Add(h.DeadlineDelay))
}

reqCtx, cancel := context.WithDeadline(cancelCtx, deadline)
defer cancel()
h.Conn.CancelRequest(reqCtx)
}()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be replaced with a errgroup.Group?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern matches the pre-existing handler, there's only 1 goroutine here, and no meaningful error to return

h.stopFn()
}
if h.cancelFinishedChan != nil {
<-h.cancelFinishedChan
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can block indefinitely

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at https://github.com/jackc/pgx/pull/2534/changes#diff-361e75a80ec6958cfb843b7e092b63e5bbf6c6b3a20734e1a43f1f722a35c063R57-R58:

defer close(doneCh) // doneCh here is h.cancelFinishedChan
reqCtx, cancel := context.WithDeadline(cancelCtx, deadline) // cancelCtx is cancelled by stopFn

so if stopFn gets called, this channel is closed, and if that doesn't happen for some reason, it's still closed by the deadline

Comment thread pgconn/pgconn.go
type PgConn struct {
conn net.Conn
pid uint32 // backend pid
pid atomic.Uint32 // backend pid; atomic because CancelRequest reads it from a separate goroutine
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style pref: embed something like:

struct mu {
  sync.RWMutex
  pid uint32
  secretKey []byte
}

if you want to perform dirty reads of mu.pid, then use pid atomic.Int64 to perform "dirty reads" but then it means you can only perform updates to pid while holding mu.Lock(). Recently I went down the rabbit hole of refactoring opensearch-go's connection pooling (it uses net/http under the hood for transport, however, so it gets "free" connection life cycle management from Go's stdlib - the analog would be a "Connection" in opensearch-go is a pgxpool and an actual HTTP request is a PgConn). The life of an individual PgConn is the responsibility of puddle, not PgConn itself (i.e. puddle should be able to see if a PgConn is in use or not and change the life cycle bits of the PgConn). Every pending query for a PgConn increments a counter, and every completed or errored out request decrements the counter. Whenever the count is 0, the connection is idle. A connection can have a lifecycle bit for draining to prevent new requests from being enqueued on the PgConn and telling the pooler to find a new connection. Something like:

https://github.com/opensearch-project/opensearch-go/blob/83bc5821b8b9dfbb41fbf325ec1d9c8ee24b093f/opensearchtransport/connection_lifecycle.go#L183-L235

Food for thought since there's been discussion for a while now about what would be involved in making a smart client for pgx, similar to what has been done in opensearch-go to give PostgreSQL applications read scale-out and connection load balancing without needing a truly clustered database.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I do personally like the obj.mu.stuffProtectedByMu style, but I don't see it used elsewhere in pgx and was trying to stick to conventions here. Also for this case, the atomic alone is sufficient since secretKey is protected by the happens-before on the pid store.
  2. Client-side query counting is interesting but I'm not sure about its utility for this particular case, since cancels happen on separate connections and races serverside. Interesting in general, I think it would require breaking some of the existing pgxpool layering (i.e. drains happen before the conn is returned to the puddle layer)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. drains happen before the conn is returned to the puddle layer

imo, PgConn needs to be stupid: it needs to know if it's idle and able to accept a query, idle and shutting down, or active, that's it. Anything beyond that belongs in pgxpool or puddle, which is why I'm actually not a fan of polling in a loop like what's happening here. If pgxpool starts to shut down, it needs to wait until all outstanding active PgConn finish and return. As for shutting down and graceful lifecycle management of pool resources, if pgxpool's shutdown times out, the pgxpool can proactively cancel the contexts used by various PgConn. $0.02

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the issue is that, by design in the pg wire protocol, an individual pgconn doesn't know that it's been cancelled on the client side at the time of cancellation. CancelRequest is sent on a separate TCP connection with no synchronization to the query in progress (protocol docs). When HandleUnwatchAfterCancel runs, we don't know whether the cancel signal has reached the server yet. If it arrives after ReadyForQuery, the server sets QueryCancelPending and the next query on the connection gets a spurious 57014.

But cancellation APIs are tied to the PgConn being cancelled in pgx: BuildContextWatcherHandler is on ConnConfig and ctxwatch lives in the pgconn package. If cancellation were handled at the pool level this would be a different conversation, but splitting "send the cancel" (pgconn) from "drain the cancel" (pgxpool) would be worse than either coherent design.

I did try doing this at the pgxpool level (spicedb commit). It required a sync.Map keyed by weak.Pointer[pgconn.PgConn]weak.Pointer[cancelHandler] (both weak to avoid a GC cycle), plus a separate ready atomic.Int32 flag gated by an AfterConnect hook to guard the same pid/secretKey race that this PR fixes. That's a lot of machinery to work around the absence of the right hooks. Moving it into the cancel handler removes all of it.

Also, IMO the fact that this library already has the CancelHandler abstraction and an existing implementation (that has a race) signals to me that this is the right layer to put it, at least for how pgx is structured right now. I could see an argument that more lifecycle hooks need to be available so that external management is possible, but that seemed like a bigger change than fitting into the existing CancelHandler box.

Copy link
Copy Markdown
Contributor

@sean- sean- Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this. The problem is real and the drain approach is the right direction. This morning I prototyped an alternative that avoids the polling loop, and I think it addresses the same scenarios with less machinery. Before getting into the implementation, I laid out the message-level protocol flows for each case to prove to myself this would work.

All of these reference the PostgreSQL cancel protocol from src/include/libpq/pqcomm.h: the cancel request is always sent on a separate TCP connection (conn B) to the postmaster, which dispatches the SIGINT to the target backend on conn A.

Scenario 1: Cancel arrives before query completes

The 57014 is consumed by the original query. Connection is clean afterward. The drain sends ; anyway (harmless: one extra round-trip).

Client (conn A)            Server Backend        Client (conn B)   Postmaster
    |---Query(sql)--------->|                     |                  |
    |                       | [executing]         |                  |
    |  [ctx cancelled]      |                     |                  |
    |                       |                     |--CancelReq------>|
    |                       |                     |                  |--SIGINT-->|
    |                       | [interrupted]       |<--close----------|
    |<--ErrorResponse(57014)|                     |                  |
    |<--ReadyForQuery-------|                     |                  |
    |                       |                     |                  |
    |  [HandleUnwatchAfterCancel]                 |                  |
    |  [cancelState: sent -> idle]                |                  |
    |---Query(;)----------->| [drain]             |                  |
    |<--EmptyQueryResponse--|                     |                  |
    |<--ReadyForQuery-------|                     |                  |
    |                       |                     |                  |
    v connection clean      v                     v                  v

Scenario 2a: Cancel arrives at idle backend (no-op, the safe case)

Client (conn A)            Server Backend        Client (conn B)   Postmaster
    |---Query(sql)--------->|                     |                  |
    |                       | [executing]         |                  |
    |  [ctx cancelled]      |                     |                  |
    |<--CommandComplete-----|                     |                  |
    |<--ReadyForQuery-------|                     |                  |
    |                       | [idle]              |                  |
    |                       | DoingCommandRead=T  |--CancelReq------>|
    |                       |                     |                  |--SIGINT-->|
    |                       | QueryCancelPending  |<--close----------|
    |                       | CHECK_FOR_INTERRUPTS|                  |
    |                       | [flag cleared, noop]|                  |
    |                       |                     |                  |
    |---Query(next sql)---->|                     |                  |
    |                       | DoingCommandRead=F  |                  |
    |                       | [executing]         |                  |
    |<--CommandComplete-----|                     |                  |
    |<--ReadyForQuery-------|                     |                  |
    v OK: cancel was a no-op, next query clean    v                  v

Scenario 2b: Cancel arrives during next query (the race)

Client (conn A)            Server Backend        Client (conn B)   Postmaster
    |---Query(sql)--------->|                     |                  |
    |                       | [executing]         |                  |
    |  [ctx cancelled]      |                     |                  |
    |<--CommandComplete-----|                     |--CancelReq------>|
    |<--ReadyForQuery-------|                     |                  |
    |                       | [idle]              |                  |
    |---Query(next sql)---->|                     |                  |
    |                       | CHECK_FOR_INTERRUPTS|                  |
    |                       | [no cancel pending] |                  |
    |                       | DoingCommandRead=F  |                  |--SIGINT-->|
    |                       | [executing next]    |<--close----------|
    |                       | QueryCancelPending  |                  |
    |                       | ProcessInterrupts   |                  |
    |<--ErrorResponse(57014)|                     |                  |
    |<--ReadyForQuery-------|                     |                  |
    |                       |                     |                  |
    v WRONG: 57014 hit the next query             v                  v

Scenario 3: Single-; drain absorbs the cancel

Client (conn A)            Server Backend        Client (conn B)   Postmaster
    |---Query(sql)--------->|                     |                  |
    |                       | [executing]         |                  |
    |  [ctx cancelled]      |                     |                  |
    |<--CommandComplete-----|                     |--CancelReq------>|
    |<--ReadyForQuery-------|                     |                  |
    |                       | [idle]              |                  |
    |  [HandleUnwatchAfterCancel]                 |                  |
    |---Query(;)----------->| [drain]             |                  |
    |                       | CHECK_FOR_INTERRUPTS|                  |
    |                       | [no cancel pending] |                  |
    |                       | DoingCommandRead=F  |                  |--SIGINT-->|
    |                       | [executing ;]       |<--close----------|
    |                       | QueryCancelPending  |                  |
    |                       | ProcessInterrupts   |                  |
    |<--ErrorResponse(57014)| [drain absorbed it] |                  |
    |<--ReadyForQuery-------|                     |                  |
    |                       |                     |                  |
    |---Query(next sql)---->| [clean]             |                  |
    |                       | DoingCommandRead=F  |                  |
    |                       | [executing]         |                  |
    |<--CommandComplete-----|                     |                  |
    |<--ReadyForQuery-------|                     |                  |
    v OK: drain absorbed the 57014, next query clean                 v

Scenario 3 (alternate): Cancel already cleared, drain is clean

Client (conn A)            Server Backend        Client (conn B)   Postmaster
    |---Query(sql)--------->|                     |                  |
    |                       | [executing]         |                  |
    |  [ctx cancelled]      |                     |                  |
    |<--CommandComplete-----|                     |--CancelReq------>|
    |<--ReadyForQuery-------|                     |                  |--SIGINT-->|
    |                       | [idle]              |<--close----------|
    |                       | QueryCancelPending  |                  |
    |                       | CHECK_FOR_INTERRUPTS|                  |
    |                       | [flag cleared, noop]|                  |
    |                       |                     |                  |
    |  [HandleUnwatchAfterCancel]                 |                  |
    |---Query(;)----------->| [drain]             |                  |
    |<--EmptyQueryResponse--|                     |                  |
    |<--ReadyForQuery-------|                     |                  |
    |                       |                     |                  |
    |---Query(next sql)---->| [clean]             |                  |
    |<--CommandComplete-----|                     |                  |
    |<--ReadyForQuery-------|                     |                  |
    v OK: cancel was no-op, drain confirmed clean v                  v

One ; is sufficient because PostgreSQL sets QueryCancelPending at most once per cancel signal. After the 57014 is raised and sent, the flag is cleared. There is no mechanism for a second 57014 from the same cancel.

Scenario 4: CancelRequest fails (dial error, max conns, network issue)

The cancel never reaches the server. No 57014 is possible. CancelRequest transitions cancelState back to idle, so HandleUnwatchAfterCancel skips the drain entirely.

Client (conn A)            Server Backend        Client (conn B)   Postmaster
    |---Query(sql)--------->|                     |                  |
    |                       | [executing]         |                  |
    |  [ctx cancelled]      |                     |                  |
    |  [net.Conn deadline]  |                     |--dial------> X (refused)
    |                       |                     |                  |
    |                       |                     |  [CancelRequest returns err]
    |                       |                     |  [state -> idle]
    |                       |                     |                  |
    |<--I/O timeout---------|                     |                  |
    |  (or query completes) |                     |                  |
    |                       |                     |                  |
    |  [HandleUnwatchAfterCancel]                 |                  |
    |  [cancelState == idle -> skip drain]        |                  |
    |                       |                     |                  |
    v no drain needed       v                     v                  v

Scenario 5: Duplicate cancel (mutex prevents double-send)

A second CancelRequest call races with an in-flight cancel. The mutex ensures only one cancel request is ever sent. Concurrent callers block on the in-flight done context until the cancel completes, then return nil.

Client (conn A)            Server Backend        Client (conn B)   Postmaster
    |---Query(sql)--------->|                     |                  |
    |                       | [executing]         |                  |
    |  [ctx cancelled]      |                     |                  |
    |  [CancelRequest #1: idle -> inFlight]       |                  |
    |                       |                     |--CancelReq------>|
    |                       |                     |                  |
    |  [CancelRequest #2: inFlight -> blocks]     |                  |
    |  [CancelRequest #1 completes -> sent]       |                  |
    |  [CancelRequest #2 unblocks -> nil]         |                  |
    |                       |                     |                  |
    v only one cancel sent  v                     v                  v

Alternative approach: mutex-guarded state machine + single drain

In #2537 I prototyped an alternative that replaces the polling loop with a state machine on PgConn. The key differences are:

  1. No loop. HandleUnwatchAfterCancel sends exactly one ;. PostgreSQL processes at most one cancel per backend (QueryCancelPending is a single flag, not a queue), so one round-trip is sufficient to absorb it.

  2. Mutex-guarded cancel state on CancelRequest. A three-state machine (idle / inFlight / sent) embedded on PgConn tracks cancel lifecycle. The coordination lives in CancelRequest itself (not in HandleCancel) because CancelRequest is a public method callable from anywhere (the context watcher, asyncClose, or direct user code). Concurrent callers that hit inFlight block on a done context; callers that hit sent return nil immediately. If sendCancelRequest fails, the state transitions back to idle so the drain is skipped entirely (scenario 4).

  3. Atomic backendKeyData. Instead of making just pid atomic (with secretKey relying on an implicit happens-before from the atomic store of pid), both are grouped in a backendKeyData struct behind atomic.Pointer. This makes the synchronization explicit and the nil check ("handshake not complete") natural.

  4. Context inheritance. HandleCancel(ctx) wraps the caller's context with context.WithoutCancel(ctx) so values (trace IDs, etc.) propagate into the cancel request without inheriting the already-fired cancellation.

Unrelated, but I stumbled across the magic numbers 80877102 and 80877103, and couldn't resist extracting them into cancelRequestCode and negotiateSSLCode, referencing CancelRequestPacket in src/include/libpq/pqcomm.h. The cancel packet field offsets are also named constants.

The state machine has three states and three transitions:

CancelRequest (idle -> inFlight):      we own the cancel, proceed to sendCancelRequest
CancelRequest (send fails):           inFlight -> idle -- no 57014 possible, skip drain (scenario 4)
CancelRequest (send succeeds):        inFlight -> sent -- drain needed
HandleUnwatchAfterCancel (sent):      sent -> idle, drain once (scenarios 1, 2, 3)
HandleUnwatchAfterCancel (idle):      skip drain (scenario 4, or cancel never fired)
CancelRequest (concurrent, inFlight): block on done context (scenario 5)
CancelRequest (concurrent, sent):     return nil immediately (scenario 5)

In #2537, the handler implements the same ctxwatch.Handler interface, and the test suite covers all five scenarios above including a concurrent CancelRequest stress test that passes with -race.

Aside: ctxwatch.Handler interface design

While working through this I noticed that HandleCancel returns nothing:

type Handler interface {
    HandleCancel(canceledCtx context.Context)
    HandleUnwatchAfterCancel()
}

I think this is an interface design bug. Cancellation in pgx involves a network operation: dialing a new TCP connection to the postmaster and writing the cancel packet. That can fail (dial refused, max connections, timeout), and the caller has no way to know. It's the same class of mistake as ignoring the error from close(2): usually fine, but when it matters (and with cancellation it always matters), you've lost the signal.

Now, when CancelRequest fails, CancelRequest transitions the state back to idle and the drain is skipped, which I believe is the correct behavior because the context watcher has no idea the cancel failed. If HandleCancel returned an error, the watcher could surface it through the tracer or take a different code path (e.g., retry or close the connection immediately instead of hoping for the best).

This would be a breaking interface change, so it's not something for this PR unless there's an appetite for doing that. At present, the current void return silently swallows failures in a network-critical path.

Copy link
Copy Markdown
Contributor Author

@ecordell ecordell Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that only one ; is required after reviewing the pg spec, so I removed the loop here. If the connection gets the 57014 then the next query should succeed.

I don't personally think that the juice is worth the squeeze on the state machine added to pgconn - the cancel send failing seems rare and in case it does happen, an extra ; send won't hurt anything. Weren't you also against modifying pgconn earlier in our discussion?

I'll leave this up, I'm happy if either this or your alternative merge, hopefully we can get some maintainer eyes on this soon?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can't tell you how much I despise looping like this with external resources, no jitter, etc. It works, but it's just not right, and a layering violation. I tried a few iterations of this with an atomic bool on PgConn to indicate a cancellation had been sent, but didn't like how that turned out (I forget exactly what it was, but I think it was a cancel during initial handshake that I couldn't fix, and blocking real-time event cancellation). Instead of adding a naked chan to select on, I added a context so that all lifecycle operations of a connection could be idiomatically edge-triggered with no specious sleeps masking races. It ended up being 56B per PgConn, which ... I think is fine in the greater scheme of things.

Re: "weren't you against modifying pgconn", 100% valid point. I pushed back on the earlier suggestions (query counters, lifecycle bits) because those are pool-level concerns that belong in puddle/pgxpool. The cancelMu is different: it's narrowly scoped to the one thing PgConn uniquely knows: whether a cancel request was sent on this connection. No other layer has that information.

The cancel key is generated once per connection via pg_strong_random() (postgres.c line 4360), is unique per connection, and is never shared with another process. That means the client has complete local authority over the cancel lifecycle and can safely gate it.

The other reason the mutex matters: every CancelRequest forces PostgreSQL to fork a new backend process. The postmaster accepts the cancel connection (postmaster.c line 1727 -> BackendStartup -> fork_process()), the child walks the ProcSignal shared memory array under a spinlock, does a timingsafe_bcmp on the cancel key (procsignal.c lines 764-765), sends kill(-backendPID, SIGINT) (procsignal.c line 781), then exits. DoS'ing the server with a fork(2) + shared memory init + kill(2) + process teardown per cancel request is really high, especially when these can stack if the RTT between the client and server are high. The mutex coalesces concurrent cancel attempts into exactly one server-side fork (i.e. server-resources are more valuable to me than the client-side 56B).

With reliable cancellation like this, request hedging across multiple PG replicas becomes a possibility with pgx: send the same query to N read-only cluster backends, take the first result, cancel the rest with exactly one server-side fork per cancelled connection, drain the 57014 deterministically, and reuse the connection (and the reason why I pulled on this thread to begin with).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one thing PgConn uniquely knows: whether a cancel request was sent on this connection. No other layer has that information.
...
That means the client has complete local authority over the cancel lifecycle and can safely gate it.

I think this is not true in the general. A layer in between (i.e. say you wanted to write a proxy like pgbouncer using pgx) may hold the network connection in a different process than the cancellation signal originates from.

I tend to think keeping it out makes more sense just with how the pg wire protocol works (separate cancel signal on new connection).

But like I said, I'm okay with either approach merging.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A layer in between (i.e. say you wanted to write a proxy like pgbouncer using pgx) may hold the network connection in a different process than the cancellation signal originates from.

Even in a proxy, the cancel has to go through the cancel key that belongs to a specific backend connection. pgbouncer follows the same pattern: it receives the client's cancel on a new connection, looks up the PID/key for the backend connection, and opens another new connection (including TLS) to the real server to forward it. The cancel key is per-connection.

In pgx, the cancel key lives on PgConn. Any code that wants to cancel - whether it's the context watcher, asyncClose, user code, or a proxy layer - must call CancelRequest on the PgConn that owns the key. The mutex gates that single call site.

Cancel requests cannot be sent on an existing connection. This is a wire protocol constraint, not a design choice. There are exactly two client-side code paths in the PostgreSQL source that emit CANCEL_REQUEST_CODE, and both open a new TCP connection:

  1. PQcancel() (fe-cancel.c:577): the old, signal-safe API. Opens a raw socket via socket() + connect(), sends the cancel packet, waits for EOF, closes. The comment at line 573: "We need to open a temporary connection to the postmaster."

  2. PQcancelCreate()/PQcancelPoll() (fe-connect.c:3722): the new, non-blocking API. Creates a fresh PGconn with cancelRequest = true and goes through the normal connection establishment state machine (including TLS). Once the connection is up, it calls PQsendCancelRequest() instead of sending a startup packet, then waits for the server to close the connection.

On the server side, backend_startup.c:567 is the only place that checks for CANCEL_REQUEST_CODE, inside ProcessStartupPacket, which runs once per new connection. There is no cancel message type in the regular wire protocol: an established backend expects normal frontend messages (Query, Parse, Bind, Execute, etc.). Every cancel forces the server to fork a new backend process, walk shared memory under a spinlock, do a timingsafe_bcmp on the cancel key, and kill() the target (procsignal.c:747-781).

That TLS and fork cost is why the mutex matters: it coalesces concurrent cancel attempts into one server-side fork instead of N.

The other option I pondered was a central "cancellation resource" in pgxpool, something like a MaxConcurrentCancels int tunable to limit the blast radius of DoS'ing PG with connection setup fork+TLS+kill cycles. Right now pgxpool doesn't have that kind of cancel-aware coordination, and PgConn is standalone: it's not tightly coupled to a pool. Since PgConn already knows its own cancel secret and is the only thing that can act on it, I pushed the state machine down into PgConn in my prototype. If pgxpool ever grows cancel-aware resource management, the per-connection mutex still composes cleanly with a pool-level semaphore.

Random aside: using pg_cancel_backend() would be a cheaper alternative and worth considering wiring in at the pgxpool layer.

pg_cancel_backend(pid) (signalfuncs.c:134-160) sends SIGINT to the target backend on an existing connection (no new TCP connection, no server-side fork, no cancel key) using an SQL function. Internally, it calls BackendPidGetProc(pid) to find the PGPROC in shared memory, does a role-based permission check, and kill(-pid, SIGINT) to cancel the running query. The backend stays alive, the running query gets a 57014, and the connection returns to ReadyForQuery.

At the pgxpool level, if there is an idle connection in the pool, we could run SELECT pg_cancel_backend($1) instead of opening a new connection with the cancel protocol. No TCP handshake, no TLS negotiation, no fork() + procsignal walk + kill() + process teardown on the server. Just a regular query on an existing connection delivering the same signal.

Caveats:

  • Requires the connecting role to have pg_signal_backend privileges (or be the same role as the target, or be a superuser). The cancel protocol's cancel key authorization is per-connection and doesn't have this restriction.
  • Only works when there's an idle connection available in the pool. If every connection is busy, you're back to the cancel protocol or spinning up a new connection and adding it to the pool.
  • Doesn't work for standalone PgConn usage without a pool.

For the common case of pgxpool with connections available, pg_cancel_backend(pid) on an idle connection would be strictly better than the cancel protocol. Something to consider for a future pool-aware cancel path.

As architectural warts go, I'm not thrilled with how PostgreSQL handles query cancellation and really wish there were a cancel message in the regular wire protocol that would allow for backend reuse. We can cancel via pg_cancel_backend(pid) over a normal connection, so the server already supports the operation inline. A new frontend message type would eliminate the fork-per-cancel overhead entirely. But that's a much larger fish to fry, and I'm also not sure how to handle that cleanly with multiple active portals.

Comment thread pgconn/cancel_and_drain.go Outdated

outer:
for {
pgConn.frontend.Send(&pgproto3.Query{String: "SELECT 1"})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just send ; for an even shorter SQL query, fwiw. I'm not sure that polling on this value is right, however. Why not use an atomic.Int64 as a gauge for the number of queries in queue for a connection and when it hits 0, then close the connection? I haven't looked at pgx recently, but shouldn't the lifecycle of a connection be handled by puddle or some other connection pooler that owns the connection object?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test out ;, that seems like it should work to me.

Why not use an atomic.Int64 as a gauge for the number of queries in queue for a connection and when it hits 0, then close the connection?

That's not really the problem this is addressing. It doesn't matter if there are 0 active queries on the connection - once we've sent the Cancel (which is async and on a different connection) we don't know if the original query finished before the cancel was processed by the server. If it did, a subsequent query on the same connection may get a spurious 57014. The only way to know (as far as I can tell) is to send a query and see what happens.

In theory we could add tracking to pgconn to skip the probe when we already saw a 57014 (meaning the cancel was consumed), but the cost of one extra ";" round trip didn't seem worth the complexity. If it weren't for addressing the (exisitng) race on pid I wouldn't have touched pgconn.go at all with this PR. I'm not against adding additional tracking but ideally only with some consensus here first.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to ; and will continue to test with that.

Add a context watcher handler that deterministically drains stale cancel
signals instead of relying on a fixed 100ms sleep. When a context is
cancelled, HandleCancel sends the cancel request immediately. After the
query returns, HandleUnwatchAfterCancel polls SELECT 1 to absorb any
in-flight SQLSTATE 57014 before releasing the connection.

The drain runs inside Unwatch via HandleUnwatchAfterCancel, so every
pgconn operation that watches a context (Exec, ExecParams, Prepare,
Deallocate, CopyTo, CopyFrom, Pipeline, WaitForNotification) gets
drain-after-cancel automatically with no per-call-site wiring.
@jackc
Copy link
Copy Markdown
Owner

jackc commented Apr 25, 2026

A few thoughts:

I don't understand how a race on pid and secretKey could be triggered in the current code base. This is something pre-existing?

When using a pool, pg_cancel_backend(pid) is definitely the way to go. I've had that in the back of my mind for years, but never got around to it.

I know this is constrained by the PG cancel protocol and the pgx architecture, but the cancel handler actually sending a query and waiting to receive the results seems a bit icky.

How would this work with pipeline mode?

If this can be made to work perfectly reliably, would it make sense to have this implementation replace the CancelRequestContextWatcherHandler rather than introduce a new handler? It seems like it would be strictly superior if it works.

@sean-
Copy link
Copy Markdown
Contributor

sean- commented Apr 25, 2026

I don't understand how a race on pid and secretKey could be triggered in the current code base. This is something pre-existing?

It is triggerable in the current codebase. During connect(), the real context watcher starts watching at pgconn.go:401-402, but pid and secretKey are not written until the handshake loop receives BackendKeyData (line ~449). If the connect context cancels in that window, the watcher goroutine fires HandleCancel, which calls CancelRequest from a separate goroutine, reading pid and secretKey while the handshake goroutine may still be writing them. The window is narrow (requires cancellation during the handshake), but it is a real data race in Go's memory model, not just theoretical (observed in #2537's checks). In practice the cancel packet would have PID 0 and an empty key, which PostgreSQL silently ignores, so the consequence is a silent no-op rather than corruption.

In #2537 this is fixed by grouping both fields into a backendKeyData struct behind atomic.Pointer. CancelRequest loads the pointer, gets nil if the handshake has not completed, and returns early. No split-read risk.

When using a pool, pg_cancel_backend(pid) is definitely the way to go. I've had that in the back of my mind for years, but never got around to it.

Agreed. pg_cancel_backend sends SIGINT via shared memory on an existing connection: no new TCP connection, no TLS, no server-side fork. Strictly better than the cancel protocol when an idle pool connection is available. The tradeoffs are role-based permission requirements (pg_signal_backend or same role), needing an idle connection, and not working for standalone PgConn usage. I think it's worth pursuing as a pool-level optimization separate from the per-connection cancel handler.

I know this is constrained by the PG cancel protocol and the pgx architecture, but the cancel handler actually sending a query and waiting to receive the results seems a bit icky.

I share the instinct that a cancel handler sending a query is a layering violation. The problem is that the cancel protocol leaves no alternative: the cancel request is asynchronous, sent on a separate TCP connection, with no acknowledgement on the original connection. After ReadyForQuery, there is no way to know whether QueryCancelPending is set on the server without probing. The existing 100ms sleep is also a probe, just a probabilistic one that is routinely insufficient (especially against CockroachDB, where cancel signal propagation to distributed execution nodes takes longer than a local SIGINT).

The ; is the cheapest possible probe: EmptyQueryResponse + ReadyForQuery, one round-trip. If we accept that the connection must be probed before reuse, the question is whether the probe is deterministic (;) or probabilistic (sleep).

How would this work with pipeline mode?

The cancel protocol has no concept of cancelling an individual pipelined statement. SIGINT targets the backend process: it interrupts whatever is currently executing, and the server discards all remaining pipeline messages until the next Sync (protocol docs). A cancel during pipeline mode is all-or-nothing, killing the entire pipeline from the current statement onward. There is no way to cancel statement N while preserving statements N+1 through M.

That said, the drain works correctly in this context. Two outcomes:

  1. 57014 arrives within the net.Conn deadline: Pipeline.Close() consumes all expected ReadyForQuery messages via the loop at lines 2954-2972. The pipeline state machine (ExtractFrontRequestType) correctly skips entries while pgErr is set, mirroring the server's "discard until Sync" behavior. After Close() finishes, Unwatch() fires at line 2975, HandleUnwatchAfterCancel sends the drain ;, and the connection is clean for reuse.

  2. Deadline expires before the server responds: Pipeline.receiveMessage() gets an I/O timeout (line 2916-2919), calls asyncClose(), and the connection is destroyed. This is the correct outcome: the wire protocol state is indeterminate, so the connection must be closed.

Pipeline.Flush() also calls Unwatch() on error (line 2600), but only after asyncClose(), so drainOnce skips the drain via the IsClosed() check.

One edge case to note: if ResultReader.receiveMessage() hits an I/O error during pipeline execution, it calls Unwatch() (line 1912) before asyncClose() (line 1915). The drain could fire on a connection with unread pipeline messages. But asyncClose() follows immediately in the same goroutine, so the connection is destroyed regardless.

If this can be made to work perfectly reliably, would it make sense to have this implementation replace the CancelRequestContextWatcherHandler rather than introduce a new handler? It seems like it would be strictly superior if it works.

I agree it should replace the existing handler if it works reliably. The sleep-based handler is strictly worse: slower (100ms minimum vs one round-trip) and unreliable (the 100ms is a guess that fails under load or if someone is stuck using CockroachDB). The only argument for keeping both would be if someone wanted to avoid the extra round-trip entirely, but anyone who cares about cancel correctness needs the drain.

One implementation difference worth flagging: CancelRequest is callable from the context watcher, asyncClose, and direct user code, so concurrent calls are possible. Each cancel causes the server to set QueryCancelPending, producing exactly one 57014. If two cancel requests are sent, two 57014s could arrive, and a single ; drain cannot reconcile that. The prototype in #2537 makes CancelRequest reentrant: a mutex-guarded state machine blocks concurrent callers until the in-flight cancel completes, then returns nil. At most one cancel packet hits the wire per cancel cycle, which is what makes the single-; drain sufficient.

@jackc
Copy link
Copy Markdown
Owner

jackc commented Apr 25, 2026

I have a few more thoughts.

First, the way you describe the PostgreSQL server's handling of cancel signals didn't match my understanding. pgx's cancel handling probably would have been designed differently had that been my understanding. However, as far as I can tell, what you describe isn't actually right. In particular, scenario 2 is wrong. I made a test specifically to check.

package pgconn_test

// Investigation test for PR #2534 (https://github.com/jackc/pgx/pull/2534).
//
// PR #2534's scenario 2 claims that a cancel signal arriving at an *idle*
// PostgreSQL backend (after ReadyForQuery) sets QueryCancelPending, which the
// *next* query then consumes as a spurious SQLSTATE 57014.
//
// This file empirically verifies that claim.
//
// Run:
//   go test -run TestCancelRequestToIdleBackend -v ./pgconn/
//
// Requires PGX_TEST_DATABASE to be set.

import (
	"context"
	"errors"
	"os"
	"testing"
	"time"

	"github.com/jackc/pgx/v5/pgconn"
	"github.com/stretchr/testify/require"
)

// TestCancelRequestToIdleBackend tests scenario 2 from PR #2534. For a range
// of delays it:
//
//  1. completes a quick query so the backend is idle (ReadyForQuery received),
//  2. sends a cancel signal,
//  3. sleeps the test's delay so the SIGINT has time to be delivered,
//  4. runs another query and counts whether it returns SQLSTATE 57014.
//
// If the PR's claim were true, we'd expect the next query to return 57014
// (with sufficient delay for the SIGINT to be delivered).
func TestCancelRequestToIdleBackend(t *testing.T) {
	connString := os.Getenv("PGX_TEST_DATABASE")
	if connString == "" {
		t.Skip("PGX_TEST_DATABASE not set")
	}

	delays := []time.Duration{
		0,
		10 * time.Millisecond,
		100 * time.Millisecond,
		500 * time.Millisecond,
	}

	for _, delay := range delays {
		t.Run(delay.String(), func(t *testing.T) {
			bleeds := runIdleCancelTrials(t, connString, delay)
			t.Logf("delay=%s: %d/10 trials had next query return 57014", delay, bleeds)
		})
	}
}

func runIdleCancelTrials(t *testing.T, connString string, delay time.Duration) int {
	const trials = 10
	bleeds := 0
	for trial := 0; trial < trials; trial++ {
		pgConn, err := pgconn.Connect(context.Background(), connString)
		require.NoError(t, err)

		_, err = pgConn.Exec(context.Background(), "select 1").ReadAll()
		require.NoError(t, err)

		// Backend is now idle (ReadyForQuery received). Send a cancel.
		err = pgConn.CancelRequest(context.Background())
		require.NoError(t, err)

		// Give the SIGINT time to be delivered to the (still-idle) backend.
		time.Sleep(delay)

		_, err = pgConn.Exec(context.Background(), "select 2").ReadAll()
		var pgErr *pgconn.PgError
		if errors.As(err, &pgErr) && pgErr.Code == "57014" {
			bleeds++
		}

		pgConn.Close(context.Background())
	}
	return bleeds
}

Running it yields:

vscode ➜ /workspaces/pgx-a/pgconn (master) $ go test -run TestCancelRequestToIdleBackend -v            
Alias tip: got -run TestCancelRequestToIdleBackend -v
=== RUN   TestCancelRequestToIdleBackend
=== RUN   TestCancelRequestToIdleBackend/0s
    cancel_idle_investigation_test.go:62: delay=0s: 0/10 trials had next query return 57014
=== RUN   TestCancelRequestToIdleBackend/10ms
    cancel_idle_investigation_test.go:62: delay=10ms: 0/10 trials had next query return 57014
=== RUN   TestCancelRequestToIdleBackend/100ms
    cancel_idle_investigation_test.go:62: delay=100ms: 0/10 trials had next query return 57014
=== RUN   TestCancelRequestToIdleBackend/500ms
    cancel_idle_investigation_test.go:62: delay=500ms: 0/10 trials had next query return 57014
--- PASS: TestCancelRequestToIdleBackend (6.32s)
    --- PASS: TestCancelRequestToIdleBackend/0s (0.05s)
    --- PASS: TestCancelRequestToIdleBackend/10ms (0.15s)
    --- PASS: TestCancelRequestToIdleBackend/100ms (1.07s)
    --- PASS: TestCancelRequestToIdleBackend/500ms (5.06s)
PASS
ok      github.com/jackc/pgx/v5/pgconn  6.321s

If the cancel arrives after the original query is complete there is no 57014 error.

This means that it would be possible for a cancel request to be sent that never gives any response to the original backend. That is, it still is possible for a cancel request to hit the wrong query under adverse timing conditions.

Lastly, the code has a strong agent smell to it. That's okay, I use agents every day, too (even to help write the test above). But it's only acceptable if it's disclosed, see our contributing guidelines. The comments on this thread also sound an awful lot like agent-speak as well. I've spent probably an hour today on this issue and I hope I'm not wasting my time talking to someone else's agent.

@sean-
Copy link
Copy Markdown
Contributor

sean- commented Apr 25, 2026

First, the way you describe the PostgreSQL server's handling of cancel signals didn't match my understanding. pgx's cancel handling probably would have been designed differently had that been my understanding. However, as far as I can tell, what you describe isn't actually right. In particular, scenario 2 is wrong. I made a test specifically to check.

You're right, the description for scenario 2 is wrong. I was sloppy describing the mechanism. I read StatementCancelHandler (postgres.c:3064), saw it sets QueryCancelPending unconditionally, and stopped there because the guard further down introduces a bug, so I didn't assume it would exist. I didn't trace through ProcessInterrupts or the main loop's clearing path. The concern (cancel hitting the wrong query) is real, but the mechanism I described is not how PostgreSQL handles it.

What actually happens: ProcessInterrupts() gates the 57014 on DoingCommandRead (postgres.c:3535):

if (!DoingCommandRead)
{
    LockErrorCleanup();
    ereport(ERROR,
            (errcode(ERRCODE_QUERY_CANCELED),
             errmsg("canceling statement due to user request")));
}

And the main loop clears stale cancels before processing the next command (postgres.c:4808-4818):

/*
 * Query cancel is supposed to be a no-op when there is no query in
 * progress, so if a query cancel arrived while we were idle, just
 * reset QueryCancelPending.
 */
CHECK_FOR_INTERRUPTS();
DoingCommandRead = false;

The sequence is: DoingCommandRead = true (line 4782), ReadCommand blocks, command arrives, CHECK_FOR_INTERRUPTS clears any pending cancel (line 4817, DoingCommandRead still true), then DoingCommandRead = false (line 4818). A cancel delivered to an idle backend is a no-op. Your test confirms this.

That said, this clearing behavior has a design assumption that is not guaranteed by the protocol, namely that the query and cancel messages arrive in the order in which they were initiated. The cancel and query travel on separate TCP connections and are not guaranteed to arrive in order. CHECK_FOR_INTERRUPTS at line 4817 runs after ReadCommand returns but before DoingCommandRead = false. If the SIGINT is delivered before the query arrives, or between CHECK_FOR_INTERRUPTS and DoingCommandRead = false, the cancel is silently discarded. For local PostgreSQL, this is nearly impossible to trigger because the cancel path (new TCP connection, optional TLS, postmaster fork, shared memory walk, kill()) is almost always slower than a query arriving on an established connection. But the protocol offers no ordering guarantee, and for systems like CockroachDB where cancel propagation goes through distributed coordination, the timing is less predictable.

If the cancel arrives after the original query is complete there is no 57014 error.

This means that it would be possible for a cancel request to be sent that never gives any response to the original backend. That is, it still is possible for a cancel request to hit the wrong query under adverse timing conditions.

Exactly. CancelRequest is fire-and-forget: the client opens a separate TCP connection, sends the cancel packet, and closes. There is no acknowledgment on either connection. The only indication the cancel worked is a 57014 on the original query connection.

If the original query completes normally before the cancel is delivered, the client receives ReadyForQuery and the connection looks clean. But the cancel may still be in transit through the postmaster (procsignal.c:781). If the client issues a new query and the SIGINT arrives after the backend starts executing it (DoingCommandRead = false), ProcessInterrupts raises 57014 on the wrong query.

The client cannot reuse the connection while a cancel is in flight. The drain addresses this: the ; occupies the connection as a sacrificial query until the cancel is resolved. If the SIGINT arrives during ; execution, it absorbs the 57014. If the cancel was already cleared while the backend was idle, the ; gets a clean EmptyQueryResponse. Either way, the connection is safe for the real next query. The existing 100ms sleep is the same idea, just probabilistic, and there is nothing that guarantees 100ms is sufficient (as observed when using CrotchroachDB).

I have also updated my earlier comment with corrected diagrams.

Lastly, the code has a strong agent smell to it. That's okay, I use agents every day, too (even to help write the test above). But it's only acceptable if it's disclosed, see our contributing guidelines. The comments on this thread also sound an awful lot like agent-speak as well. I've spent probably an hour today on this issue and I hope I'm not wasting my time talking to someone else's agent.

Yes, I use Claude Code for drafting and code generation (I was willing to dedicate one cup of coffee to this issue today and didn't want to cherry-pick and reword, and was at Google Next this week, too, so was being even more hasty in my responses). But, the investigation is mine (I traced the protocol flows, read the PostgreSQL source, ran the tests), but the output goes through an agent, and I should have disclosed that upfront. If there's interest in pursuing #2537, I'll update its description accordingly. Apologies for not doing so from the start, I didn't notice the contribution guidelines change.

@ecordell
Copy link
Copy Markdown
Contributor Author

ecordell commented Apr 26, 2026

Thanks for taking a look at this! Responses inline.

When using a pool, pg_cancel_backend(pid) is definitely the way to go. I've had that in the back of my mind for years, but never got around to it.

A couple of reasons I avoided this:

  1. I tried to keep close to the existing cancellation logic thinking that would make this change easier to accept (since I didn't make an issue to discuss first)
  2. We use pgx heavily against both pg and crdb, and crdb doesn't support pg_cancel_backend. We'd probably want something that can be specialized for each?

But I'm not against it, I'd just want to make sure to understand what you're looking for before working on it.

I don't understand how a race on pid and secretKey could be triggered in the current code base. This is something pre-existing?

It was pre-existing and I only noticed it because the go race detector did when I made the changes in this PR.

I know this is constrained by the PG cancel protocol and the pgx architecture, but the cancel handler actually sending a query and waiting to receive the results seems a bit icky.

Fair enough, though to me the ickyness seems to be due to pg protocol itself.

How would this work with pipeline mode?

Is there something specific that would need to happen differently? I added a test for pipeline mode and it passed as I expected so I didn't give it much extra thought.

If this can be made to work perfectly reliably, would it make sense to have this implementation replace the CancelRequestContextWatcherHandler rather than introduce a new handler? It seems like it would be strictly superior if it works.

I think arguably that's the case, but I didn't want to assume I knew all pgx user's use-cases. Since the interface was there it seemed fine to have both options but I'd defer to you.

I've built and tested https://github.com/authzed/spicedb with this fork and haven't seen any issues, but it doesn't exercise every pg/crdb feature by any means.

Lastly, the code has a strong agent smell to it. That's okay, I use agents every day, too (even to help write the test above). But it's only acceptable if it's disclosed, see our contributing guidelines. The comments on this thread also sound an awful lot like agent-speak as well. I've spent probably an hour today on this issue and I hope I'm not wasting my time talking to someone else's agent.

The code is agent assisted. My read of the contribution guidelines was that I could submit without mentioning it as long as I understand and stand by the code as if it were mine (which I do, and have had to defend in comments above. I mostly used it to check that I was sticking close to conventions in the codebase and for the tests).

As for the github discussion: All of my comments are just me, no agents.

@jackc
Copy link
Copy Markdown
Owner

jackc commented May 9, 2026

With regards to the race on pid and secretKey, the underlying issue was that user provided context cancellation handling shouldn't be used while the connection is being established. It doesn't make sense to call CancelQuery while the connection is being established. I've corrected this in d2ed749. With that out of the way it shouldn't be possible for there to be a race.

I am generally in favor of this idea, but I am still have a few concerns.

If the cancel arrives after the original query is complete there is no 57014 error.

This means that it would be possible for a cancel request to be sent that never gives any response to the original backend. That is, it still is possible for a cancel request to hit the wrong query under adverse timing conditions.

Exactly. CancelRequest is fire-and-forget: the client opens a separate TCP connection, sends the cancel packet, and closes. There is no acknowledgment on either connection. The only indication the cancel worked is a 57014 on the original query connection.

If the original query completes normally before the cancel is delivered, the client receives ReadyForQuery and the connection looks clean. But the cancel may still be in transit through the postmaster (procsignal.c:781). If the client issues a new query and the SIGINT arrives after the backend starts executing it (DoingCommandRead = false), ProcessInterrupts raises 57014 on the wrong query.

The client cannot reuse the connection while a cancel is in flight. The drain addresses this: the ; occupies the connection as a sacrificial query until the cancel is resolved. If the SIGINT arrives during ; execution, it absorbs the 57014. If the cancel was already cleared while the backend was idle, the ; gets a clean EmptyQueryResponse. Either way, the connection is safe for the real next query. The existing 100ms sleep is the same idea, just probabilistic, and there is nothing that guarantees 100ms is sufficient (as observed when using CrotchroachDB).

I'm not sure about this assertion. It still seems to me that there is a potential race condition, or at least a dependency on non-guaranteed behavior. As I understand this intention, the idea is:

  1. Original query has been sent
  2. CancelRequest called
  3. ; empty query sent
  4. Messages read until ReadyForQuery

The CancelRequest should either cancel the original query or the empty query. But do we actually know that is the case? Are we actually guaranteed that by the time CancelRequest has completed the signal has been received by the original process? Is it possible for it to be received after the ; query? If so, it's possible the connection has already sent another query and that one gets canceled.

I don't think the behavior we are hoping for is documented by PostgreSQL. We may be able to determine that behavior by PostgreSQL source code, but it makes me nervous to rely on undocumented behavior.

How would this work with pipeline mode?

Is there something specific that would need to happen differently? I added a test for pipeline mode and it passed as I expected so I didn't give it much extra thought.

I was worried about when a pipeline could have sent multiple Sync messages. That is, it could be expecting multiple ReadyForQuery messages which would confuse the drain cancel code. But on further review, pipeline errors always close the connection. So the only downside would be it would be a little slower.

@jackc
Copy link
Copy Markdown
Owner

jackc commented May 9, 2026

Just thinking a little more. I wonder if the drain path could only be called as needed? The only case it is really needed is if the original query completed before the cancel request was processed. Might not fit in the current context canceled handler pattern though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants