refactor: [OM-387] governance entitlement performance improvement#4635
refactor: [OM-387] governance entitlement performance improvement#4635gergely-kurucz-konghq wants to merge 8 commits into
Conversation
Add GOV_BENCH_KIND (boolean|metered|mixed|all) so the benchmark can exercise the metered GetAccess balance path, not just the boolean short-circuit. Default stays boolean (baselines unchanged). Metered features share one SUM meter; no events ingested (the per-entitlement ClickHouse query fires regardless). Add bench-governance-metered and a bench-governance-trace target scoped to one small cell.
The metered grant owner adapter re-fetched the same immutable entitlement row 6 times per balance calculation (DescribeOwner, GetStartOfMeasurement, GetUsagePeriodStartAt x3, GetResetTimelineInclusive), each issuing ~2 SQL queries. Add an opt-in, context-scoped memo (WithEntitlementCache) installed at GetEntitlementBalance so those methods share a single GetEntitlement fetch. Callers that do not install the cache fall back to a direct fetch, so the balance worker and one-off value reads are unchanged. Cuts ~43% of the Postgres queries per metered entitlement in the governance access path (~23 -> ~13 SQL/ent; verified via traces on a 10x10 metered benchmark).
Replace the serial per-customer GetAccess loop in resolveAccess with a bounded errgroup (maxCustomerConcurrency=5), writing results into pre-sized indexed slices to preserve page sort order. Whole-request error semantics are unchanged. Conservative bound: GetAccess already fans a customer's entitlements out ~10-wide, so peak concurrent DB ops is ~K x 10 against a ~NumCPU-sized pgx pool; a higher value mostly queues on connection acquire. Cuts metered latency ~40% (100x100 ~25s -> ~14s), pool-bound.
snapshotEngineResult took an owner advisory-lock transaction before checking whether any history segment qualified for a snapshot, so every metered balance read paid a begin/lock/commit even when nothing was written. Move the LATEST check and segment scan (both in-memory) ahead of the lock; take it only when a snapshot will be saved. Save-path behavior is unchanged (the lock was already released before saveSnapshot). Verified via traces: on a 10x10 metered governance query LockOwnerForTx/begin_tx/commit drop from 100 to 0.
Rename WithEntitlementCache to withEntitlementCache so the memo can only be installed from within the metered package, at vetted read-only entry points. Document the invariant: cached entries are not invalidated within scope, so installing it around a stack that mutates the entitlement would serve stale reads. Safe today — the sole installer is GetEntitlementBalance, whose call tree performs no entitlement writes.
Replace the fixed maxCustomerConcurrency=5 with max(2, GOMAXPROCS/2) so the bound tracks the machine (and the ~NumCPU pgx pool). Halved rather than full GOMAXPROCS because GetAccess already fans out ~10-wide per customer, so peak DB demand is ~this×10; halving keeps pool headroom for concurrent requests.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe governance benchmark now supports selectable entitlement kinds through ChangesGovernance benchmark entitlement kinds
Estimated code review effort: 3 (Moderate) | ~20 minutes Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR improves
Confidence Score: 5/5All three production changes are semantically correct and behaviour-preserving; the benchmark improvements address the previously flagged anchor-date drift. The advisory-lock refactoring preserves the original save-outside-lock semantics that existed before this PR. The entitlement cache is correctly scoped per GetEntitlementBalance call via a fresh sync.Map, uses sync.Once for safe concurrent initialisation, and is deliberately unexported to prevent installation around write paths. The errgroup parallelism writes results by index into pre-allocated slices (no shared-state mutation), and loop variable capture is safe under Go 1.25. No correctness regressions were identified. No files require special attention. Important Files Changed
|
A fixed 2024 anchor let the balance engine accrue a monthly reset period per elapsed month, drifting benchmark cost over time. Anchor at time.Now() so only the current period is evaluated and runs stay comparable.
chrisgacsal
left a comment
There was a problem hiding this comment.
LGTM! I'd wait with merge until @GAlexIHU also reviewed it.
Governance query performance improvements
What
Speeds up
POST /governance/queryfor metered entitlements (~40% lower latency; 100×100 ≈25s → ≈12.5s) and extends the benchmark to cover the metered path.Why
The endpoint resolves access per customer, and each metered entitlement's
GetAccessdid far more Postgres work than necessary: the same entitlement row was re-fetched ~6× per balance calc, the customer loop ran serially, and every read took an advisory-lock transaction even when no snapshot was written. The existing benchmark only exercised boolean entitlements, so none of this was visible.How
GOV_BENCH_KIND=boolean|metered|mixed|all) plus abench-governance-tracetarget for capturing a single small cell's trace safely.metered/grant_owner_adapter.go): memoize the owner-entitlement lookup within one balance calculation via a context-scoped cache, collapsing ~6GetEntitlementfetches to 1 (~23 → ~13 SQL/entitlement). Installer is unexported and read-only-scoped by design (documented invariant).governance/service/service.go): resolve a page's customers concurrently with a boundederrgroup, limitmax(2, GOMAXPROCS/2)— scales with the box and the ~NumCPU pgx pool, halved becauseGetAccessalready fans out ~10-wide internally. Results written by index to preserve sort order.credit/helper.go): check whether a snapshot segment actually qualifies before taking the owner lock, skipping the lock transaction (and its held connection) on steady-state metered reads.Behavior-preserving and verified via traces. The credit/entitlement changes also benefit other metered balance reads (e.g. the balance worker), not just governance.
Summary by CodeRabbit