Skip to content

T-17092: Fix PgBouncer rejecting cluster-agent statement_timeout startup parameter#126

Merged
paweljw merged 1 commit into
mainfrom
pawel/t-17092-cluster-agent-pgbouncer-statement-timeout
Mar 30, 2026
Merged

T-17092: Fix PgBouncer rejecting cluster-agent statement_timeout startup parameter#126
paweljw merged 1 commit into
mainfrom
pawel/t-17092-cluster-agent-pgbouncer-statement-timeout

Conversation

@paweljw
Copy link
Copy Markdown
Member

@paweljw paweljw commented Mar 28, 2026

  • Build coroot-cluster-agent from source with patches (same pattern as node-agent) instead of pulling a pre-built Docker image
  • Patch removes statement_timeout from PostgreSQL DSN parameters — lib/pq sends these as startup parameters during the connection handshake, and PgBouncer (since v1.20.0) rejects unrecognized ones
  • Upgrades cluster-agent from v1.2.4 to v1.6.1
  • Query timeouts are still enforced client-side via Go's context.WithTimeout

Context

Customers using PgBouncer as a connection pooler see metrics collection fail because coroot-cluster-agent hardcodes statement_timeout=<ms> in the PostgreSQL DSN. PgBouncer rejects it as an unrecognized startup parameter (pgbouncer/pgbouncer#907).

The issue is not fixed upstream — statement_timeout is still present in the latest cluster-agent release (v1.6.1).

Changes

  • ebpf/Dockerfile: Replace pre-built ghcr.io/coroot/coroot-cluster-agent:1.2.4 image with a from-source build stage (mirrors the existing node-agent pattern)
  • ebpf/patches/cluster-agent/001-remove-statement-timeout-from-dsn.patch: Removes statement_timeout from DSN query params and the now-unused strconv import

…ut from DSN

coroot-cluster-agent hardcodes statement_timeout=<ms> as a PostgreSQL DSN
parameter. lib/pq sends DSN parameters as startup parameters during the
connection handshake. PgBouncer (since v1.20.0) rejects unrecognized
startup parameters, breaking metrics collection for customers using
PgBouncer as a connection pooler.

The fix removes statement_timeout from the DSN. Query timeouts are still
enforced client-side via Go's context.WithTimeout in the collector's
Collect() method.

To apply this patch, we now build cluster-agent from source (like we
already do for node-agent) instead of pulling a pre-built image.
Also upgrades cluster-agent from v1.2.4 to v1.6.1.

Adds a CI workflow (cluster-agent-patch-ci.yml) to verify patches apply
cleanly and pass upstream tests, mirroring the node-agent-patch-ci pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@paweljw paweljw force-pushed the pawel/t-17092-cluster-agent-pgbouncer-statement-timeout branch from 91e47be to cc18c2e Compare March 28, 2026 13:32
@paweljw
Copy link
Copy Markdown
Member Author

paweljw commented Mar 28, 2026

I’m 50/50 on whether we need to fix this on our side. Workaround exists (linked pgbouncer issue), and it seems this also has been resolved in pgbouncer 1.20.1+, so mid-2023.

Thoughts @curusarn ? The fix is involved, ie another “soft fork”.

@paweljw paweljw requested a review from curusarn March 28, 2026 13:41
Copy link
Copy Markdown
Contributor

@curusarn curusarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'm okay with the patch + build from source. I care more about less issues for customers. 👍

@paweljw paweljw merged commit 16da227 into main Mar 30, 2026
6 checks passed
@paweljw paweljw deleted the pawel/t-17092-cluster-agent-pgbouncer-statement-timeout branch March 30, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants