PgBatchConnection: preparedStatementsMap_ grows unbounded; PostgreSQL backend memory exhaustion under long-lived connections

### Summary

When `LIBPQ_BATCH_MODE` is enabled (the default when libpq supports pipeline mode), `PgBatchConnection.cc` caches one named prepared statement per unique SQL text in `preparedStatementsMap_` for the lifetime of every connection, with **no eviction, no size limit, and no DEALLOCATE**. Workloads that legitimately produce a high number of distinct SQL texts (for example, multi-row `INSERT ... VALUES (..),(..),..` statements where the row count varies) cause the per-connection map and the corresponding server-side `CachedPlan` / `CachedPlanSource` / `CachedPlanQuery` contexts to grow without bound. Each PostgreSQL backend's resident memory eventually reaches several hundred MB and can exhaust system memory under long-lived pooled connections.

### Environment

- drogon: confirmed on `1.9.11`, `1.9.12`, `1.9.13`, and current `master` — the relevant code path in `PgBatchConnection.cc` is unchanged across all of them
- libpq: pipeline mode supported (PostgreSQL 14+)
- Build flag: `LIBPQ_BATCH_MODE=ON` (default), so `PgBatchConnection.cc` is compiled in place of `PgConnection.cc`
- PostgreSQL: 17.x (also expected to reproduce on 14+)
- Connection lifetime: long (hours to days, common with `DbClient` connection pools)

### Reproduction

A minimal reproducer (note: `preparedStatementsMap_` is per physical connection, so `connNum=1` keeps the observation trivially simple):

1. Configure `DbClient` with `connNum=1`, with `LIBPQ_BATCH_MODE=ON`. (With `connNum>1` the SQL stream gets sharded across backends; you can still observe the leak but you have to aggregate per-backend stats.)
2. Submit a stream of unique SQL texts on that single connection. The simplest pattern is a multi-row `INSERT` whose `VALUES (..),(..),..` row count varies between calls — every distinct row count produces a distinct SQL string.
3. Watch the PostgreSQL server-side memory of the backend process (e.g., `pg_log_backend_memory_contexts(pid)` for PostgreSQL 14+) grow over time, with hundreds to thousands of `CachedPlan` / `CachedPlanSource` / `CachedPlanQuery` contexts accumulating, each labelled with the user SQL text.
4. The client-side `preparedStatementsMap_` grows in lockstep — `preparedStatementsID_` keeps incrementing and the map size matches the number of distinct SQL texts ever submitted on that connection.
5. The growth stops only when the connection is closed.

#### Minimal C++ reproducer sketch

```cpp
#include <drogon/orm/DbClient.h>
#include <chrono>
#include <thread>

int main() {
    auto db = drogon::orm::DbClient::newPgClient(
        "host=127.0.0.1 port=5432 dbname=test user=postgres password=postgres",
        /*connNum=*/1);

    // SELECT-from-VALUES form: no rows written / no WAL / no table dependency —
    // only the SQL-text variation matters. Every distinct row count yields a
    // distinct SQL text, which becomes a permanent entry in preparedStatementsMap_.
    constexpr int N_distinct = 2000;
    for (int iter = 0; iter < N_distinct; ++iter) {
        int rows = 1 + iter;  // distinct row count -> distinct SQL text
        std::string sql = "SELECT 1 FROM (VALUES ";
        for (int r = 0; r < rows; ++r) {
            if (r) sql += ",";
            sql += "(0)";
        }
        sql += ") AS t(c)";
        db->execSqlSync(sql);
    }

    // Hold the connection open. From psql:
    //   SELECT pg_log_backend_memory_contexts(<pid_of_this_backend>);
    // Then look in the server log: hundreds of CachedPlan / CachedPlanSource /
    // CachedPlanQuery contexts will be present, one per unique SQL text.
    std::this_thread::sleep_for(std::chrono::hours(1));
    return 0;
}
```

### Root Cause

In `orm_lib/src/postgresql_impl/PgBatchConnection.cc`:

- `sendBatchedSql()` looks up `cmd->sql_` in `preparedStatementsMap_`. On miss, it generates a new statement name via `newStmtName()` (a monotonic counter `++preparedStatementsID_` formatted as a decimal string) and calls `PQsendPrepare(...)`.
- The map is recorded once `PQsendPrepare` succeeds and is otherwise never modified during the connection's lifetime. There is no eviction policy, no maximum size, no LRU, no `DEALLOCATE` issued to the server, and no shrinking on memory pressure.
- The simple-protocol path (`PgConnection.cc`, used when `LIBPQ_BATCH_MODE=OFF`) bypasses the cache only for `paraNum == 0` (`PQsendQuery`); its `paraNum > 0` branch has the same unbounded `preparedStatementsMap_` shape.

(Verified still present on current `master` as of 2026-06-16: the map is populated but no eviction / DEALLOCATE / `PQclosePrepared` path exists.)

### Why it matters

Some workloads legitimately vary the SQL text: multi-row INSERT batching where batch size depends on a runtime queue snapshot, dynamically composed optional `WHERE` filters, etc. In each case the unbounded server-side `CachedPlan` accumulation eventually translates to backend RSS exhaustion; connection pools with long idle timeouts amplify it because only closing the connection frees the memory. This matches general PostgreSQL guidance that a client-side prepared-statement cache must be bounded and must `DEALLOCATE` evicted entries server-side (Rails, JDBC, etc. ship bounded caches).

### Proposed Fix

1. Bound the **whole prepared-statement cache** (map entry + the backing SQL string + per-entry metadata removed atomically — an LRU on the map alone leaves the string container growing).
2. Default cap (e.g. 256), configurable via a `DbClient` parameter; `0` = unlimited to preserve current behaviour.
3. On eviction, issue a safe server-side close for the statement:
   - libpq ≥ PG17: `PQsendClosePrepared` (async `Close`, fits pipeline mode).
   - libpq < PG17: compile-time gate (`#if PG_VERSION_NUM >= 170000`); fallback uses an extended-protocol `DEALLOCATE "<name>"` (note: `newStmtName()` yields a numeric-leading name, so it must be double-quoted) via `PQsendQueryParams`, or exit/re-enter pipeline mode around a synchronous deallocate.
   - Evict only at a safe pipeline point where the name is not referenced by an in-flight `EXECUTE` (queue the close after the next `PQpipelineSync` results drain).
   - Maintenance commands emit their own `PGresult`; tag the in-flight command (e.g. an `isMaintenance_` flag on `SqlCmd`) so the dispatcher routes maintenance results to an internal path instead of user callbacks, preserving FIFO ordering.
4. Apply the same fix to the `paraNum > 0` branch of `PgConnection.cc`.
5. Document the new option.

### Workarounds in the meantime

- Build with `-DLIBPQ_BATCH_MODE=OFF` and use only `paraNum == 0` SQL paths (simple query path does not populate the cache).
- Set a server-side `idle_session_timeout` so backends recycle periodically (containment, not a fix).

### Pull Request

I am preparing a PR that implements the bounded-cache + safe deallocate fix and will link it to this issue. Open to maintainer input on: default cap size (256 is a starting guess); whether to expose a stats/metrics surface (evictions counter, current size) on `DbClient`; and whether the extended-protocol `Close` is preferred over `DEALLOCATE` SQL for the eviction path in pipeline mode.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PgBatchConnection: preparedStatementsMap_ grows unbounded; PostgreSQL backend memory exhaustion under long-lived connections #2533

Summary

Environment

Reproduction

Minimal C++ reproducer sketch

Root Cause

Why it matters

Proposed Fix

Workarounds in the meantime

Pull Request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

PgBatchConnection: preparedStatementsMap_ grows unbounded; PostgreSQL backend memory exhaustion under long-lived connections #2533

Description

Summary

Environment

Reproduction

Minimal C++ reproducer sketch

Root Cause

Why it matters

Proposed Fix

Workarounds in the meantime

Pull Request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions