Skip to content

Commit 9a28fc2

Browse files
Nik SamokhvalovNik Samokhvalov
authored andcommitted
docs: three-latencies explanation (producer / subscriber / end-to-end)
Name the three distinct latencies in any Postgres queue and explain why PgQue's batch-ticker model makes #1 and #2 sub-ms while bounding #3 by the tick cadence (not by load). Addresses recurring confusion about the apparent contradiction between sub-ms consumer-path latency and the ~1 s end-to-end delivery bound. - README.md: brief paragraph + 3-bullet list, new "Three latencies" subsection between "Latency trade-off" and "Comparison". - docs/pgq-concepts.md: detailed version with per-latency physics, tick-frequency trade-off table, comparison to pgmq's poll-on-demand model, when-to-pick guidance, provenance link. Uses actual pgque.sql column names: queue_ticker_max_lag (3s), queue_ticker_idle_period (1min idle-decelerator), queue_ticker_max_count (500). The 1-second cadence comes from the pg_cron schedule set by pgque.start(), not from queue_ticker_idle_period.
1 parent a159a74 commit 9a28fc2

2 files changed

Lines changed: 94 additions & 0 deletions

File tree

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
- [Why PgQue](#why-pgque)
1818
- [Latency trade-off](#latency-trade-off)
19+
- [Three latencies](#three-latencies)
1920
- [Comparison](#comparison)
2021
- [Installation](#installation)
2122
- [Roles and grants](#roles-and-grants)
@@ -65,6 +66,14 @@ Ways to reduce delivery latency: tune tick frequency and queue thresholds; use `
6566

6667
If your top priority is single-digit-millisecond dispatch, PgQue is the wrong tool. If your priority is **stability under load without bloat**, that is where PgQue fits.
6768

69+
## Three latencies
70+
71+
"Queue latency" is three numbers, not one. PgQue makes #1 and #2 sub-ms and bounds #3 by whatever tick cadence you configure:
72+
73+
1. **Producer latency**`send` / `insert_event`. Sub-ms.
74+
2. **Subscriber latency**`next_batch` + `get_batch_events`. Sub-ms.
75+
3. **End-to-end delivery**`send` → consumer visibility. ≈ tick period. **Tunable, not floored.** Default `pg_cron` at 1 s → ~500 ms average; sub-ms e2e is achievable with aggressive ticking (staggered `pg_cron` jobs, in-tick `pg_sleep` loop — see [concept doc](docs/pgq-concepts.md#three-latencies)). Trade-off: more ticks mean more `tick`/`subscription` metadata churn ([#61](https://github.com/NikolayS/pgque/issues/61)). Under sustained load the ticker keeps firing at its configured rate — batch size absorbs the load, e2e does not inflate.
76+
6877
## Comparison
6978

7079
| Feature | PgQue | PgQ | PGMQ | River | Que | pg-boss |

docs/pgq-concepts.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,88 @@ below; the function auto-prefixes `queue_` internally.
6666
> produce huge batches consumers can't handle.
6767
6868
— Kreen & Pihlak, PgCon 2009
69+
70+
## Three latencies
71+
72+
"Queue latency" is three numbers, not one. Conflating them confuses
73+
design discussion — each reflects a different bottleneck, and PgQue's
74+
trade-offs only make sense once they are separated.
75+
76+
| # | Name | What it is | PgQue | Bottleneck |
77+
|---|---|---|---|---|
78+
| 1 | Producer | `send` / `insert_event` → durable | sub-ms (~high-µs; ~86k ev/s PL/pgSQL single-INSERT in prelim bench) | WAL flush, triggers |
79+
| 2 | Subscriber | `next_batch` + `get_batch_events` returning an already-built batch | sub-ms (snapshot SELECT, no SKIP LOCKED scan; ~2.4M ev/s consumer read) | how "next work" is located |
80+
| 3 | End-to-end | `send` → consumer visibility | ≈ tick period + consumer poll interval | ticker cadence (tunable) |
81+
82+
#3 is the one application behavior depends on (SLAs, retries, perceived
83+
staleness). You can have #1 and #2 in microseconds and still have #3 in
84+
seconds — or vice versa. They are independent.
85+
86+
### End-to-end is tunable, not floored
87+
88+
**The default 1-second tick is a `pg_cron` schedule, not a design floor.**
89+
PgQue's e2e is bounded by whatever tick cadence you configure. Sub-ms
90+
e2e is achievable with more aggressive ticking:
91+
92+
- **Staggered `pg_cron` jobs.** Schedule N jobs at `1 second` each, offset
93+
by `1/N` via a shared coordinating lock, to get effective tick periods
94+
down to ~10 ms (N=100) or ~1 ms (N=1000).
95+
- **In-tick sleep loop.** Single cron callout that internally does
96+
`pg_sleep(0.01)` ×100 inside one invocation — same effective cadence,
97+
fewer scheduler wakeups.
98+
- **Native sub-second cron.** Future `pg_cron` may support sub-second
99+
schedules directly, removing the workaround.
100+
101+
Trade-off at very high tick rates: every tick UPDATEs `pgque.tick` and
102+
`pgque.subscription`, so more ticks = more dead tuples on those metadata
103+
tables under held-xmin conditions. The event tables stay bloat-free
104+
(TRUNCATE rotation); the metadata-table bloat is a separate story and
105+
is tracked at [#61](https://github.com/NikolayS/pgque/issues/61) with a
106+
rotation-based fix in PR [#62](https://github.com/NikolayS/pgque/pull/62).
107+
108+
Rough guidance:
109+
110+
| `pg_cron` schedule | Average e2e | Notes |
111+
|---|---|---|
112+
| `1 second` (default) | ~500 ms | pgqd-compatible, minimal metadata churn |
113+
| `250 ms` | ~125 ms | 4× metadata writes, still cheap |
114+
| `10 ms` staggered | ~5 ms | needs coordinated jobs or in-tick sleep |
115+
| `1 ms` staggered | sub-ms | kHz-range; metadata bloat dominates without #62 |
116+
117+
Per-queue thresholds (`queue_ticker_max_lag` default `3 seconds`,
118+
`queue_ticker_max_count` default 500, `queue_ticker_idle_period` default
119+
`1 minute` idle-decelerator) go through `pgque.set_queue_config()`.
120+
121+
### Load behavior: PgQue vs. pgmq-style polling
122+
123+
The key property of the tick model: **e2e does not grow with load.** The
124+
ticker fires at its configured rate regardless of backlog, so under
125+
pressure batch size grows (up to `queue_ticker_max_count`) — not e2e.
126+
127+
pgmq uses a different model: `pgmq.read()` returns messages immediately,
128+
no ticker. So pgmq e2e ≈ consumer poll interval — sub-ms when actively
129+
polling, up to the poll interval otherwise. Drain rate is
130+
`batch_size / poll_interval`; if producers outrun that, queue depth grows
131+
and e2e grows unbounded until consumers scale out. Separately, pgmq's
132+
SKIP LOCKED + visibility-timeout design UPDATEs on claim and DELETEs on
133+
ack, accumulating dead tuples that autovacuum cannot reclaim under MVCC
134+
pressure (long-running tx, idle-in-transaction, lagging logical
135+
replication slot, physical standby with `hot_standby_feedback=on`) —
136+
the bloat failure mode [PgQue avoids by construction](../README.md#why-pgque).
137+
138+
### When to pick which
139+
140+
Pick PgQue if you want batching efficiency and bloat immunity and can
141+
configure a tick cadence that meets your SLA (the default 1 s or a faster
142+
one). Pick pgmq if you need always-hot single-digit-ms dispatch, MVCC
143+
pressure is low in your system, and the SQS-shaped API fits.
144+
145+
### Provenance
146+
147+
Framing developed in public discussion, 2026-04-18:
148+
[HN thread](https://news.ycombinator.com/threads?id=samokhvalov) and
149+
Hannu Krosing's LinkedIn notes (ex-Skype, original PgQ context). The
150+
split is implicit in the 2009 Kreen & Pihlak design but worth naming
151+
explicitly — the apparent contradiction between sub-ms consumer path
152+
and ~1 s end-to-end recurs every time someone benchmarks PgQ without
153+
separating the three numbers.

0 commit comments

Comments
 (0)