docs: partition-keys spec + brief, idempotency design rationale#295
Draft
NikolayS wants to merge 14 commits into
Draft
docs: partition-keys spec + brief, idempotency design rationale#295NikolayS wants to merge 14 commits into
NikolayS wants to merge 14 commits into
Conversation
Design guidance for external PRs adding pg-boss-style idempotency keys (refs #293) and per-partition serialization. Maps both features onto PgQue's existing layering: sidecar tables (rotation-safe), send wrappers that reduce to insert_event(), consumer-side gating instead of engine changes, maint-cycle expiry to bound bloat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Records why pgque's producer idempotency is a TTL window (log model, not job queue) and why free-once-processed belongs on the consumer side. Includes prior-art survey (SQS/NATS/Rabbit vs pg-boss/Oban/River/ Graphile/Hatchet, pgmq gap) and the producer GC fork. Internal blueprint; basis for the reply to #293. Not yet pushed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
SamoSpec-format spec (blueprints/partition-keys/SPEC.md) for consumer-side ordered, parallel consumption by partition key (Kafka-partition model: order within a key, parallelism across keys, no per-event state). Adds a self-contained on-brand HTML brief at web/public/briefs/partition-keys.html (served by Pages at /briefs/partition-keys.html on merge to main). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Re-ground the consumer mechanism after ops/security + QA/testability review: drop the (impossible) cooperative-consumer overlay for N independent slot subscriptions filtering via get_batch_cursor extra_where; restate the guarantee as testable G1/G2/G3; correct the retry rationale (ev_id preserved, ev_txid changes); derive pause from existing retry_queue (no new table); fix send-signature collision, ev_extra1/trigger collision, unstable hashtext, fixed-N invariant, slot/owner definition. Add decisions.md and refresh the HTML brief. Remove superseded IDEMPOTENCY_AND_PARTITIONS.md contributor guide. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Relocate IDEMPOTENCY_DESIGN.md -> blueprints/idempotency/DESIGN.md to match the partition-keys/ slug layout; update the cross-reference in the partition spec and refresh the note's footer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Round 2 verified the model against the engine: G1 ev_id ordering is real (order by 1, preserved through get_batch_cursor) and the G2 single-owner lock is the tested #97 guard. Fixes folded in: - security: receive_partitioned/subscribe_slot are SECURITY DEFINER over the admin-only get_batch_cursor; validated integer-only filter (corrects the "injection-safe" framing). - correctness: pause blocked-set moved off the transient retry_queue to a durable compact partition_block marker (per failing key, not per event); DLQ-unblock predicate made explicit. - bug: modulo sign-normalized to (h%N+N)%N. - R7 rotation wedge (+ pause must not hold the batch open); N persistence + teardown + DLQ-cascade caveat. - tests: retry-affinity, security, N-invariant; split G2 block/parallel; get_batch_cursor in engine-untouched guard. Update decisions.md (round-2 scorecard) and refresh the brief. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Round 3 convergence: Phase 1 (skip-default partition consumption) declared implementation-ready; pause split into Phase 2 with explicit open items (O1 defer-without-retry-increment primitive; O2 hot-blocked- key cost). Corrected the SECURITY DEFINER model to the co-ownership invariant (not pgque_admin) + non-superuser-owner security test. Fixed DLQ-unblock sub_id<->co_id join; partition_block FK-cascade/index/ revoked-from-roles, created empty in Phase 1; tightened tests (engine-untouched /4 overload, in-order-after-unblock, marker-clear-via- DLQ, marker durability, hot-blocked-key). Update decisions.md and brief. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WuaYcu1XXsVEpsnLhF1FFu
Resolve two follow-up design questions on the partition-keys brief: fixed-N rebalancing and how partitions map to workers. - Add SPEC §15 + brief §07: claim-based assignment via per-slot pg_try_advisory_lock — no leader, no PartitionAssignor, no rebalance protocol. The DB arbitrates; scale-up/down is lock acquire/release. - Separate the two locks: G2 blocking receive lock = correctness backstop; advisory slot lock = distribution/liveness only. - D8 decision row; T-claim assignment-liveness test. - Correct the over-provisioning framing: fixed N is also the read-amp multiplier, so inflating N is not free; online resize breaks G1 mid-flight. Expand R4. - Log the Q&A in decisions.md; bump to v0.5 (draft).
Lead with Tier A (mutual exclusion via cooperative consumers + per-key advisory lock, near-zero engine code) — it covers the migration workload and folds in the idempotency ask. Reposition the hash-slot design as Tier B (ordered per key), the inherent price of strict ordering, where fixed N / read-amp / slot assignment live. Correct advisory-lock framing: per-key in Tier A, per-slot distribution in Tier B.
Self-contained spike under blueprints/partition-keys/repro/ that installs pgque on a fresh VM, drives both workloads concurrently, measures throughput, and checks the guarantees empirically. No engine changes — both tiers are thin recipes over existing primitives. - Tier A (mutual exclusion): cooperative consumers + per-key advisory lock; verifies G2 (no overlapping runs per key) and idempotency collapse (1 run per tenant despite duplicates). - Tier B (ordered): N hash-routed slot subscriptions via get_batch_cursor extra_where; verifies G1 affinity + FIFO + exactly-once, and measures read amplification. Measured (this VM, PG16): Tier A 8k events -> exactly 2000 runs, all invariants PASS at 4/8/16 workers. Tier B read amplification = N exactly (4.00/8.00/16.00x) with aggregate throughput ~inverse to N (87k/54k/30k ev/s) — strict ordering's cost, quantified.
Updated README.md to reflect changes in the partition-keys reproduction spike, including installation instructions, workload details, and caveats.
Port the reproduction driver from Python to TypeScript on bun + pg, to match clients/typescript/ and Fabrizio's stack. Align the repro to the two cases from the thread: - Case 1 (migrations) now models BOTH guarantees that were conflated: L1 producer-side idempotency (TTL dedup window, demo.send_idem) that prevents duplicate INSERTs, and L2 consumer-side mutual exclusion (advisory lock) that prevents duplicate WORK. Concurrent producers + background ticker exercise the real race. - Case 2 (lifecycle) unchanged: ordered slots, read-amp measured. Correct the brief's overreach: idempotency is a complementary layer, not the same requirement as partition keys. Measured (PG16): dedup OFF -> 12000 attempts/12000 inserted/1000 runs; dedup ON -> 12000 attempts/1000 inserted/1000 runs. Both: all invariants PASS. Case 2 read-amp = N exactly.
…ardrail Act on review (Max): partition-keys is the ordered-slot feature only; producer TTL dedup becomes a separate send-layer feature. - New blueprints/idempotency/SPEC.md: producer TTL dedup feature spec with the key-scope rule in hard language (key must encode the desired EFFECT, not just the entity), atomic claim+append, (queue, idem_key) scoping, rotation/maintenance GC, orthogonality to partition keys. - Repro: add --tier hazard guardrail proving the version-suppression bug — entity-only key drops the v2 migration (0 inserted); effect-scoped key delivers both waves. Reframe Case 1 as 'producer idempotency + consumer mutual exclusion on a plain queue', not partition keys. - partition-keys SPEC: promote rotation pinning to a first-class risk (R7); point §12 at the new feature spec; roadmap 1A/1B/2/3; v0.6. - Brief: scope to ordered slots, separate-feature framing, key-scope footgun callout, rotation-pin note, roadmap pills; v0.7. Guardrail measured (PG16): tenant key -> v2 0 inserted; tenant:version -> v2 N inserted. PASS.
…s-style bench.ts compares PgQue (append + rotation) against a pg-boss-style mutable job table (insert->update->delete) in the same DB. Measured (PG16): consume throughput pgque ~208k ev/s vs jobq ~40k ev/s (~5x — the per-message UPDATE+DELETE churn is the tax). Under backlog, pgque holds ZERO dead tuples and needs ZERO vacuum; the mutable table accumulates ~2 dead tuples per processed job and, after draining a ~150k backlog, sits at ~300k dead tuples / ~34 MiB (grew while draining) — the pg-boss bloat, reproduced. Adds demo.jobq baseline table. Honest gaps documented: produce is round-trip-bound, and the full rotation reclaim-to-zero curve is a follow-up (PgQ rotation is a multi-period state machine).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Design artifacts from working through Fabrizio's (@fenos, Supabase Storage) request on #293, plus the publishable brief.
Three commits:
blueprints/IDEMPOTENCY_DESIGN.md— the decision rationale. pgque is an ordered log, not a job queue, so "free once processed" (pg-bosssingletonKey) can't be a producer feature: in a log "processed" is a per-consumer fact the producer can't see. Producer idempotency therefore = a TTL window (SQS/NATS model), append-only, GC'd by rotation. Includes a primary-source prior-art survey (logs do TTL windows; job queues do state-based free-once-processed via per-row UPDATE — the bloat pgque avoids; pgmq has neither).blueprints/partition-keys/SPEC.md— SamoSpec-format spec for the consumer-side partition-key feature (Fabrizio's "partitions first"). Kafka-partition model: order within a key, parallelism across keys, one worker per key — achieved byhash(key) % Nrouting over cooperative consumers, with no per-event locks or mutable state, and no edits tobatch_event_sql(the engine stays sacred). Covers the head-of-line-on-failure decision (pause vs skip), test plan (red/green TDD), and sprint plan.web/public/briefs/partition-keys.html— self-contained, on-brand HTML brief. Once merged tomain, the Pages workflow serves it athttps://pgque.dev/briefs/partition-keys.html.Status / notes
52f2450) rides along on the branch.Verification
Docs/HTML only. Brief is a static file in
web/public/(Astro copies it verbatim intoweb/dist); HTML tags balance, renders standalone. No code paths touched.🤖 Generated with Claude Code
Generated by Claude Code