refactor: [OM-387] governance entitlement performance improvement 2#4637
Conversation
Generic ctx-scoped, keyed, once-per-key memo (Install + GetOrLoad) with a no-cache fallback for opt-in callers. Non-zero-size context key so distinct memos never collide.
DescribeOwner re-fetched the same customer once per entitlement; GetAccess resolves many entitlements of one customer, so it re-fetched N times. Add a GetAccess-scoped customer memo (syncx.ContextMemo) and route DescribeOwner through it, collapsing GetCustomer to one per customer. Also refactor the existing entitlement memo onto ContextMemo. Verified: metered ~12.2 -> ~10.4 SQL/entitlement; GetCustomer 100->10 at 10x10 (2500->50 at 50x50).
Decouples entitlements seeded per customer from features the query requests, exposing that GetAccess resolves all N regardless of M. GOV_BENCH_OVERCOMPUTE cells + bench-governance-overcompute[-trace].
GetAccess resolved every entitlement of a customer even when the caller (governance's filtered query) needed only a few, computing balances and ClickHouse usage queries for entitlements it discarded — cost scaled with the customer's total entitlements, not the request. Add an optional variadic feature filter (empty = all, unchanged for existing callers); when set, filter entitlements before the balance fan-out. Governance passes its query's feature keys. Verified: 10-of-50 query drops per-customer balances/ClickHouse queries 50->10 (~100x at 10-of-1000).
📝 WalkthroughWalkthroughAdds a generic ChangesSelective GetAccess and memoized owner adapter
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant GovernanceService
participant EntitlementService
participant OwnerAdapter
participant ContextMemo
GovernanceService->>EntitlementService: GetAccess(ctx, ns, customerId, featureKeys...)
EntitlementService->>EntitlementService: filter entitlements by featureKeys
EntitlementService->>ContextMemo: WithCustomerCache(ctx)
loop fan-out per entitlement
EntitlementService->>OwnerAdapter: DescribeOwner(ctx, ...)
OwnerAdapter->>ContextMemo: getCustomer via ownerCustomerMemo.GetOrLoad
ContextMemo-->>OwnerAdapter: cached or fresh customer
OwnerAdapter-->>EntitlementService: entitlement value
end
EntitlementService-->>GovernanceService: filtered Access map
Possibly related PRs
Suggested labels: Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR adds a
Confidence Score: 4/5Safe to merge — the core filtering and memoization logic is correct and all existing callers are backward-compatible with the variadic signature. The implementation correctly scopes ContextMemo stores to request contexts, the sync.Once pattern preserves the previous per-entry dedup guarantee, and the governance path handles both the all-org and selective-filter branches correctly. Two quality gaps hold back a clean score: the new featureKeys filtering path in GetAccess has no unit test (a regression in the lo.Filter block would go undetected), and the concurrent assertions in TestContextMemo_ConcurrentSingleLoad use require from non-test goroutines in a way the testing package does not formally support. openmeter/entitlement/service/access_test.go (missing filter coverage) and pkg/syncx/contextmemo_test.go (goroutine assertion pattern). Important Files Changed
|
| const goroutines = 50 | ||
| var wg sync.WaitGroup | ||
| wg.Add(goroutines) | ||
| for range goroutines { | ||
| go func() { | ||
| defer wg.Done() | ||
| v, err := memo.GetOrLoad(ctx, "k", load) | ||
| require.NoError(t, err) | ||
| require.Equal(t, 99, v) | ||
| }() | ||
| } | ||
| wg.Wait() | ||
|
|
||
| require.Equal(t, int64(1), calls.Load(), "concurrent GetOrLoad for one key loads exactly once") |
There was a problem hiding this comment.
require called from non-test goroutines
require.NoError and require.Equal (which both call t.FailNow) are invoked from spawned goroutines, but the testing.TB docs state that FailNow must be called from the goroutine running the test function. If either assertion fires in a goroutine, runtime.Goexit exits that goroutine (running its deferred wg.Done), the test is marked failed via t.Fail, but the test function is not interrupted — execution continues past wg.Wait() and the top-level assertion at the end may produce a misleading pass/fail signal.
Consider using a channel or sync/errgroup to propagate assertion failures back to the test goroutine, or replace the inner require calls with assert so they mark the test failed without calling FailNow.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/syncx/contextmemo_test.go
Line: 118-131
Comment:
**`require` called from non-test goroutines**
`require.NoError` and `require.Equal` (which both call `t.FailNow`) are invoked from spawned goroutines, but the `testing.TB` docs state that `FailNow` must be called from the goroutine running the test function. If either assertion fires in a goroutine, `runtime.Goexit` exits that goroutine (running its deferred `wg.Done`), the test is marked failed via `t.Fail`, but the test function is not interrupted — execution continues past `wg.Wait()` and the top-level assertion at the end may produce a misleading pass/fail signal.
Consider using a channel or `sync/errgroup` to propagate assertion failures back to the test goroutine, or replace the inner `require` calls with `assert` so they mark the test failed without calling `FailNow`.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
pkg/syncx/contextmemo_test.go (1)
118-129: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winLine up the goroutines before calling
GetOrLoad.This test is meant to prove the contended path, but the goroutines can effectively serialize after the first cache fill. A start gate makes the coverage much less scheduler-dependent. As per path instructions, “Make sure the tests are comprehensive and cover the changes.”
🧪 Suggested tweak
const goroutines = 50 + start := make(chan struct{}) var wg sync.WaitGroup wg.Add(goroutines) for range goroutines { go func() { defer wg.Done() + <-start v, err := memo.GetOrLoad(ctx, "k", load) require.NoError(t, err) require.Equal(t, 99, v) }() } + close(start) wg.Wait()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/syncx/contextmemo_test.go` around lines 118 - 129, The concurrency test in ContextMemo is not reliably exercising the contended GetOrLoad path because the goroutines can start and serialize after the first load completes. Add a start gate/barrier in the test around the goroutines that call memo.GetOrLoad so all workers begin at the same time, keeping the assertions on load and the “k” key in contextmemo_test.go meaningful under contention.Source: Path instructions
openmeter/entitlement/metered/grant_owner_adapter.go (1)
60-65: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick winUse structured memo keys instead of joined strings.
Namespace + "/" + IDcan alias if either component ever contains/, which would serve the wrong cached entitlement/customer. A typed key also avoids the join allocation.🛡️ Suggested tweak
+type ownerMemoKey struct { + namespace string + id string +} + -var ownerEntitlementMemo = syncx.NewContextMemo[string, *entitlement.Entitlement]() +var ownerEntitlementMemo = syncx.NewContextMemo[ownerMemoKey, *entitlement.Entitlement]() // ownerCustomerMemo dedups the owner-customer lookup made by DescribeOwner across an // entitlement.Service.GetAccess fan-out, which resolves many entitlements of the SAME customer and // would otherwise fetch that customer once per entitlement. Installed by WithCustomerCache. -var ownerCustomerMemo = syncx.NewContextMemo[string, *customer.Customer]() +var ownerCustomerMemo = syncx.NewContextMemo[ownerMemoKey, *customer.Customer]() @@ func (e *entitlementGrantOwner) getEntitlement(ctx context.Context, owner models.NamespacedID) (*entitlement.Entitlement, error) { - return ownerEntitlementMemo.GetOrLoad(ctx, owner.Namespace+"/"+owner.ID, func(ctx context.Context) (*entitlement.Entitlement, error) { + return ownerEntitlementMemo.GetOrLoad(ctx, ownerMemoKey{namespace: owner.Namespace, id: owner.ID}, func(ctx context.Context) (*entitlement.Entitlement, error) { return e.entitlementRepo.GetEntitlement(ctx, owner) }) } @@ func (e *entitlementGrantOwner) getCustomer(ctx context.Context, id customer.CustomerID) (*customer.Customer, error) { - return ownerCustomerMemo.GetOrLoad(ctx, id.Namespace+"/"+id.ID, func(ctx context.Context) (*customer.Customer, error) { + return ownerCustomerMemo.GetOrLoad(ctx, ownerMemoKey{namespace: id.Namespace, id: id.ID}, func(ctx context.Context) (*customer.Customer, error) { return e.customerService.GetCustomer(ctx, customer.GetCustomerInput{CustomerID: &id}) }) }Also applies to: 91-101
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openmeter/entitlement/metered/grant_owner_adapter.go` around lines 60 - 65, The memo caches in ownerEntitlementMemo and ownerCustomerMemo are using joined string keys that can alias if Namespace or ID contains “/”. Update the keying in the owner entitlement/customer lookup paths to use a structured typed key instead of concatenated strings, and apply the same change wherever those memo lookups are built in the related code around DescribeOwner and WithCustomerCache.pkg/syncx/contextmemo.go (1)
63-64: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winAvoid allocating a memo entry on every cache hit.
LoadOrStore(k, &contextMemoEntry[V]{})builds the entry before knowing whetherkalready exists, so hot memo hits still allocate. ALoadfast path keeps hits allocation-free.♻️ Suggested tweak
- v, _ := store.LoadOrStore(k, &contextMemoEntry[V]{}) + v, ok := store.Load(k) + if !ok { + v, _ = store.LoadOrStore(k, &contextMemoEntry[V]{}) + } entry := v.(*contextMemoEntry[V])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/syncx/contextmemo.go` around lines 63 - 64, Avoid allocating a memo entry on cache hits in the context memo logic. Update the LoadOrStore usage in contextMemo to use a Load fast path first, and only create a new contextMemoEntry[V] when the key is actually missing; keep the existing entry retrieval and storage behavior in the contextMemo methods so hot hits remain allocation-free.openmeter/entitlement/service/service.go (1)
349-355: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider switching this small filter to stdlib helpers.
featureKeyscan be collected into a plainmap[string]struct{}and the slice filtered withslices.DeleteFunc, which keeps this hot-path code in line with the repo’sslices/mapspreference.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openmeter/entitlement/service/service.go` around lines 349 - 355, The filter in entitlement service should use stdlib helpers instead of lo-based collection/filtering. Update the logic around featureKeys and the entitlements filtering in service.go to build a plain map[string]struct{} for membership checks and replace lo.Filter with slices.DeleteFunc on entitlements, keeping the behavior identical while aligning with the repo’s slices/maps preference.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e/Makefile`:
- Around line 54-57: Update the benchmark comment in the selective query target
to describe the fixed selective path rather than claiming GetAccess still
computes all 50 balances. Adjust the wording near the selective query notes so
it reflects the current behavior after this PR and matches future trace
comparisons, while keeping the reference to the selective benchmark and
GOV_BENCH_OVERCOMPUTE-based setup.
In `@openmeter/entitlement/service/service.go`:
- Around line 357-359: The early return in Entitlement handling currently
returns a zero-value entitlement.Access, which makes Entitlements nil instead of
preserving an empty map. Update the empty-entitlements path in service.go’s
entitlement access construction so it returns an initialized Access with an
empty Entitlements map, matching the previous behavior and keeping the API/JSON
shape stable.
---
Nitpick comments:
In `@openmeter/entitlement/metered/grant_owner_adapter.go`:
- Around line 60-65: The memo caches in ownerEntitlementMemo and
ownerCustomerMemo are using joined string keys that can alias if Namespace or ID
contains “/”. Update the keying in the owner entitlement/customer lookup paths
to use a structured typed key instead of concatenated strings, and apply the
same change wherever those memo lookups are built in the related code around
DescribeOwner and WithCustomerCache.
In `@openmeter/entitlement/service/service.go`:
- Around line 349-355: The filter in entitlement service should use stdlib
helpers instead of lo-based collection/filtering. Update the logic around
featureKeys and the entitlements filtering in service.go to build a plain
map[string]struct{} for membership checks and replace lo.Filter with
slices.DeleteFunc on entitlements, keeping the behavior identical while aligning
with the repo’s slices/maps preference.
In `@pkg/syncx/contextmemo_test.go`:
- Around line 118-129: The concurrency test in ContextMemo is not reliably
exercising the contended GetOrLoad path because the goroutines can start and
serialize after the first load completes. Add a start gate/barrier in the test
around the goroutines that call memo.GetOrLoad so all workers begin at the same
time, keeping the assertions on load and the “k” key in contextmemo_test.go
meaningful under contention.
In `@pkg/syncx/contextmemo.go`:
- Around line 63-64: Avoid allocating a memo entry on cache hits in the context
memo logic. Update the LoadOrStore usage in contextMemo to use a Load fast path
first, and only create a new contextMemoEntry[V] when the key is actually
missing; keep the existing entry retrieval and storage behavior in the
contextMemo methods so hot hits remain allocation-free.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 91b5fa94-f7f0-4708-85ec-dac1b6a7f351
📒 Files selected for processing (9)
e2e/Makefilee2e/governance_bench_test.goopenmeter/entitlement/connector.goopenmeter/entitlement/metered/grant_owner_adapter.goopenmeter/entitlement/service/service.goopenmeter/governance/service/service.goopenmeter/server/server_test.gopkg/syncx/contextmemo.gopkg/syncx/contextmemo_test.go
| # Selective query: seed 50 entitlements/customer but request only 10. Reveals that GetAccess | ||
| # computes all 50 balances (50 ClickHouse queries) while the result uses 10 — cost scales with a | ||
| # customer's TOTAL entitlements, not what the query asks for. The regular cells hide this (they | ||
| # request every seeded feature). Requires GOV_BENCH_OVERCOMPUTE=1 (set by these targets). |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Refresh the over-compute description.
After this PR, this benchmark should exercise the fixed selective path, not say GetAccess still computes all 50 balances. That wording will confuse future trace comparisons.
📝 Suggested wording
-# Selective query: seed 50 entitlements/customer but request only 10. Reveals that GetAccess
-# computes all 50 balances (50 ClickHouse queries) while the result uses 10 — cost scales with a
-# customer's TOTAL entitlements, not what the query asks for. The regular cells hide this (they
-# request every seeded feature). Requires GOV_BENCH_OVERCOMPUTE=1 (set by these targets).
+# Selective query: seed 50 entitlements/customer but request only 10. Exercises the path that used
+# to over-compute all 50 balances, so traces can validate that cost now scales with requested
+# features rather than total seeded entitlements. Requires GOV_BENCH_OVERCOMPUTE=1 (set by these
+# targets).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Selective query: seed 50 entitlements/customer but request only 10. Reveals that GetAccess | |
| # computes all 50 balances (50 ClickHouse queries) while the result uses 10 — cost scales with a | |
| # customer's TOTAL entitlements, not what the query asks for. The regular cells hide this (they | |
| # request every seeded feature). Requires GOV_BENCH_OVERCOMPUTE=1 (set by these targets). | |
| # Selective query: seed 50 entitlements/customer but request only 10. Exercises the path that used | |
| # to over-compute all 50 balances, so traces can validate that cost now scales with requested | |
| # features rather than total seeded entitlements. Requires GOV_BENCH_OVERCOMPUTE=1 (set by these | |
| # targets). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@e2e/Makefile` around lines 54 - 57, Update the benchmark comment in the
selective query target to describe the fixed selective path rather than claiming
GetAccess still computes all 50 balances. Adjust the wording near the selective
query notes so it reflects the current behavior after this PR and matches future
trace comparisons, while keeping the reference to the selective benchmark and
GOV_BENCH_OVERCOMPUTE-based setup.
| if len(entitlements) == 0 { | ||
| return entitlement.Access{}, nil | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Preserve the empty access map shape.
This new early return yields a zero-value Access, so Entitlements becomes nil. Falling through used to return an initialized empty map, which is safer for API/JSON shape and callers that distinguish {} from null.
🐛 Suggested fix
if len(entitlements) == 0 {
- return entitlement.Access{}, nil
+ return entitlement.Access{
+ Entitlements: map[string]entitlement.EntitlementValueWithId{},
+ }, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if len(entitlements) == 0 { | |
| return entitlement.Access{}, nil | |
| } | |
| if len(entitlements) == 0 { | |
| return entitlement.Access{ | |
| Entitlements: map[string]entitlement.EntitlementValueWithId{}, | |
| }, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openmeter/entitlement/service/service.go` around lines 357 - 359, The early
return in Entitlement handling currently returns a zero-value
entitlement.Access, which makes Entitlements nil instead of preserving an empty
map. Update the empty-entitlements path in service.go’s entitlement access
construction so it returns an initialized Access with an empty Entitlements map,
matching the previous behavior and keeping the API/JSON shape stable.
| // don't compute (and issue ClickHouse usage queries for) balances that would be discarded. | ||
| // Empty featureKeys means "all" — the fetch is a single query regardless, so filtering its | ||
| // result in memory is enough; the per-entitlement value calc is the cost this avoids. | ||
| if len(featureKeys) > 0 { |
There was a problem hiding this comment.
I would extend the GetEntitlementsOfCustomer method to allow fitlering by feature keys. It will require side-loading features in the adapter by it is way more efficient then overfetching from db and filtering in memory.
There was a problem hiding this comment.
It seems the ListEntitlements on the adapter does have support for filtering by feature keys/ids so it would require a minimal effort.
Resolve only requested features in governance access queries
What
entitlement.Service.GetAccessnow accepts an optional feature filter; the governance query passes it so only the requested features' entitlements are resolved, instead of every entitlement the customer has.Why
GetAccessfanned out over all of a customer's entitlements — computing a balance (and a ClickHouse usage query) for each — even when the caller only needed a few. So a query asking about 10 features on a customer with 1000 entitlements did ~1000 entitlements' worth of work and discarded 990. Cost scaled with the customer's total entitlement count, not the request — a latent cliff for orgs with many features. (Every prior benchmark hid this by querying all seeded features.)How
GetAccess(ctx, ns, customerID, featureKeys ...string)— variadic; empty = all (unchanged for the customer-access API and the all-features governance path). When set, entitlements are filtered to those feature keys before the balance fan-out, so balances and ClickHouse queries shrink to the requested count.resolveAccesspasses the query'sFeatureKeys.e2ebenchmark variant (GOV_BENCH_OVERCOMPUTE) that decouples seeded entitlements (N) from queried features (M) to expose/guard this, plusmake bench-governance-overcompute[-trace].Verified via traces (1 customer, 50 seeded, 10 queried): per-customer
GetEntitlementBalanceand ClickHouse usage queries 50 → 10, latency 68ms → 20.5ms; ~100× at 10-of-1000. Non-filtering callers unchanged; non-bench tests green.Summary by CodeRabbit
New Features
Bug Fixes
Tests