Skip to content

docs: partition-keys spec + brief, idempotency design rationale#295

Draft
NikolayS wants to merge 14 commits into
mainfrom
claude/pgque-idempotent-partitions-d7emjj
Draft

docs: partition-keys spec + brief, idempotency design rationale#295
NikolayS wants to merge 14 commits into
mainfrom
claude/pgque-idempotent-partitions-d7emjj

Conversation

@NikolayS

Copy link
Copy Markdown
Owner

What

Design artifacts from working through Fabrizio's (@fenos, Supabase Storage) request on #293, plus the publishable brief.

Three commits:

  1. blueprints/IDEMPOTENCY_DESIGN.md — the decision rationale. pgque is an ordered log, not a job queue, so "free once processed" (pg-boss singletonKey) can't be a producer feature: in a log "processed" is a per-consumer fact the producer can't see. Producer idempotency therefore = a TTL window (SQS/NATS model), append-only, GC'd by rotation. Includes a primary-source prior-art survey (logs do TTL windows; job queues do state-based free-once-processed via per-row UPDATE — the bloat pgque avoids; pgmq has neither).

  2. blueprints/partition-keys/SPEC.md — SamoSpec-format spec for the consumer-side partition-key feature (Fabrizio's "partitions first"). Kafka-partition model: order within a key, parallelism across keys, one worker per key — achieved by hash(key) % N routing over cooperative consumers, with no per-event locks or mutable state, and no edits to batch_event_sql (the engine stays sacred). Covers the head-of-line-on-failure decision (pause vs skip), test plan (red/green TDD), and sprint plan.

  3. web/public/briefs/partition-keys.html — self-contained, on-brand HTML brief. Once merged to main, the Pages workflow serves it at https://pgque.dev/briefs/partition-keys.html.

Status / notes

  • Draft. This is design + docs only — no SQL/engine changes, no client changes.
  • The earlier contributor-guide commit (52f2450) rides along on the branch.
  • The spec was drafted in SamoSpec format as a single-pass lead draft; the live multi-model review loop was not run.

Verification

Docs/HTML only. Brief is a static file in web/public/ (Astro copies it verbatim into web/dist); HTML tags balance, renders standalone. No code paths touched.

🤖 Generated with Claude Code


Generated by Claude Code

claude added 7 commits June 18, 2026 00:20
Design guidance for external PRs adding pg-boss-style idempotency
keys (refs #293) and per-partition serialization. Maps both features
onto PgQue's existing layering: sidecar tables (rotation-safe), send
wrappers that reduce to insert_event(), consumer-side gating instead
of engine changes, maint-cycle expiry to bound bloat.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Records why pgque's producer idempotency is a TTL window (log model, not
job queue) and why free-once-processed belongs on the consumer side.
Includes prior-art survey (SQS/NATS/Rabbit vs pg-boss/Oban/River/
Graphile/Hatchet, pgmq gap) and the producer GC fork. Internal blueprint;
basis for the reply to #293. Not yet pushed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
SamoSpec-format spec (blueprints/partition-keys/SPEC.md) for consumer-side
ordered, parallel consumption by partition key (Kafka-partition model:
order within a key, parallelism across keys, no per-event state). Adds a
self-contained on-brand HTML brief at web/public/briefs/partition-keys.html
(served by Pages at /briefs/partition-keys.html on merge to main).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Re-ground the consumer mechanism after ops/security + QA/testability
review: drop the (impossible) cooperative-consumer overlay for N
independent slot subscriptions filtering via get_batch_cursor
extra_where; restate the guarantee as testable G1/G2/G3; correct the
retry rationale (ev_id preserved, ev_txid changes); derive pause from
existing retry_queue (no new table); fix send-signature collision,
ev_extra1/trigger collision, unstable hashtext, fixed-N invariant,
slot/owner definition. Add decisions.md and refresh the HTML brief.
Remove superseded IDEMPOTENCY_AND_PARTITIONS.md contributor guide.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Relocate IDEMPOTENCY_DESIGN.md -> blueprints/idempotency/DESIGN.md to
match the partition-keys/ slug layout; update the cross-reference in the
partition spec and refresh the note's footer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Round 2 verified the model against the engine: G1 ev_id ordering is
real (order by 1, preserved through get_batch_cursor) and the G2
single-owner lock is the tested #97 guard. Fixes folded in:
- security: receive_partitioned/subscribe_slot are SECURITY DEFINER
  over the admin-only get_batch_cursor; validated integer-only filter
  (corrects the "injection-safe" framing).
- correctness: pause blocked-set moved off the transient retry_queue
  to a durable compact partition_block marker (per failing key, not
  per event); DLQ-unblock predicate made explicit.
- bug: modulo sign-normalized to (h%N+N)%N.
- R7 rotation wedge (+ pause must not hold the batch open); N
  persistence + teardown + DLQ-cascade caveat.
- tests: retry-affinity, security, N-invariant; split G2 block/parallel;
  get_batch_cursor in engine-untouched guard.
Update decisions.md (round-2 scorecard) and refresh the brief.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Round 3 convergence: Phase 1 (skip-default partition consumption)
declared implementation-ready; pause split into Phase 2 with explicit
open items (O1 defer-without-retry-increment primitive; O2 hot-blocked-
key cost). Corrected the SECURITY DEFINER model to the co-ownership
invariant (not pgque_admin) + non-superuser-owner security test. Fixed
DLQ-unblock sub_id<->co_id join; partition_block FK-cascade/index/
revoked-from-roles, created empty in Phase 1; tightened tests
(engine-untouched /4 overload, in-order-after-unblock, marker-clear-via-
DLQ, marker durability, hot-blocked-key). Update decisions.md and brief.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Resolve two follow-up design questions on the partition-keys brief:
fixed-N rebalancing and how partitions map to workers.

- Add SPEC §15 + brief §07: claim-based assignment via per-slot
  pg_try_advisory_lock — no leader, no PartitionAssignor, no rebalance
  protocol. The DB arbitrates; scale-up/down is lock acquire/release.
- Separate the two locks: G2 blocking receive lock = correctness backstop;
  advisory slot lock = distribution/liveness only.
- D8 decision row; T-claim assignment-liveness test.
- Correct the over-provisioning framing: fixed N is also the read-amp
  multiplier, so inflating N is not free; online resize breaks G1
  mid-flight. Expand R4.
- Log the Q&A in decisions.md; bump to v0.5 (draft).
Lead with Tier A (mutual exclusion via cooperative consumers + per-key
advisory lock, near-zero engine code) — it covers the migration workload
and folds in the idempotency ask. Reposition the hash-slot design as
Tier B (ordered per key), the inherent price of strict ordering, where
fixed N / read-amp / slot assignment live. Correct advisory-lock framing:
per-key in Tier A, per-slot distribution in Tier B.
claude and others added 4 commits June 25, 2026 16:20
Self-contained spike under blueprints/partition-keys/repro/ that installs
pgque on a fresh VM, drives both workloads concurrently, measures
throughput, and checks the guarantees empirically. No engine changes —
both tiers are thin recipes over existing primitives.

- Tier A (mutual exclusion): cooperative consumers + per-key advisory
  lock; verifies G2 (no overlapping runs per key) and idempotency
  collapse (1 run per tenant despite duplicates).
- Tier B (ordered): N hash-routed slot subscriptions via get_batch_cursor
  extra_where; verifies G1 affinity + FIFO + exactly-once, and measures
  read amplification.

Measured (this VM, PG16): Tier A 8k events -> exactly 2000 runs, all
invariants PASS at 4/8/16 workers. Tier B read amplification = N exactly
(4.00/8.00/16.00x) with aggregate throughput ~inverse to N (87k/54k/30k
ev/s) — strict ordering's cost, quantified.
Updated README.md to reflect changes in the partition-keys reproduction spike, including installation instructions, workload details, and caveats.
Port the reproduction driver from Python to TypeScript on bun + pg, to
match clients/typescript/ and Fabrizio's stack.

Align the repro to the two cases from the thread:
- Case 1 (migrations) now models BOTH guarantees that were conflated:
  L1 producer-side idempotency (TTL dedup window, demo.send_idem) that
  prevents duplicate INSERTs, and L2 consumer-side mutual exclusion
  (advisory lock) that prevents duplicate WORK. Concurrent producers +
  background ticker exercise the real race.
- Case 2 (lifecycle) unchanged: ordered slots, read-amp measured.

Correct the brief's overreach: idempotency is a complementary layer, not
the same requirement as partition keys.

Measured (PG16): dedup OFF -> 12000 attempts/12000 inserted/1000 runs;
dedup ON -> 12000 attempts/1000 inserted/1000 runs. Both: all invariants
PASS. Case 2 read-amp = N exactly.
…ardrail

Act on review (Max): partition-keys is the ordered-slot feature only;
producer TTL dedup becomes a separate send-layer feature.

- New blueprints/idempotency/SPEC.md: producer TTL dedup feature spec with
  the key-scope rule in hard language (key must encode the desired EFFECT,
  not just the entity), atomic claim+append, (queue, idem_key) scoping,
  rotation/maintenance GC, orthogonality to partition keys.
- Repro: add --tier hazard guardrail proving the version-suppression bug —
  entity-only key drops the v2 migration (0 inserted); effect-scoped key
  delivers both waves. Reframe Case 1 as 'producer idempotency + consumer
  mutual exclusion on a plain queue', not partition keys.
- partition-keys SPEC: promote rotation pinning to a first-class risk (R7);
  point §12 at the new feature spec; roadmap 1A/1B/2/3; v0.6.
- Brief: scope to ordered slots, separate-feature framing, key-scope footgun
  callout, rotation-pin note, roadmap pills; v0.7.

Guardrail measured (PG16): tenant key -> v2 0 inserted; tenant:version ->
v2 N inserted. PASS.
…s-style

bench.ts compares PgQue (append + rotation) against a pg-boss-style
mutable job table (insert->update->delete) in the same DB.

Measured (PG16): consume throughput pgque ~208k ev/s vs jobq ~40k ev/s
(~5x — the per-message UPDATE+DELETE churn is the tax). Under backlog,
pgque holds ZERO dead tuples and needs ZERO vacuum; the mutable table
accumulates ~2 dead tuples per processed job and, after draining a ~150k
backlog, sits at ~300k dead tuples / ~34 MiB (grew while draining) — the
pg-boss bloat, reproduced.

Adds demo.jobq baseline table. Honest gaps documented: produce is
round-trip-bound, and the full rotation reclaim-to-zero curve is a
follow-up (PgQ rotation is a multi-period state machine).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants