Restrict Secret informer to bbr-managed labeled Secrets only#226
Restrict Secret informer to bbr-managed labeled Secrets only#226abdallahsamabd wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: abdallahsamabd The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThe diff standardizes the managed Secret label to inference.llm-d.ai/ipp-managed: "true" across docs, tests, and Helm values (defaulting to single-namespace RBAC). It documents RBAC modes in deploy/README. The apikey-injection factory now passes a context into NewAPIKeyInjectionPlugin; NewAPIKeyInjectionPlugin builds a label-filtered controller-runtime cache for Secrets, starts it asynchronously, waits up to 2 minutes for cache sync, and wires reconciliation via WatchesRawSource (replacing the predicate-based event filter). The reconciler removes the predicate and uses the updated managed-label constant. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Issues
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
| rules: | ||
| # apikey-injection plugin: watches labeled Secrets | ||
| # apikey-injection plugin: watches Secrets that carry the label | ||
| # "inference.networking.k8s.io/bbr-managed=true" to inject provider API keys. |
There was a problem hiding this comment.
we're changing the label since we migrated code to llm-d.
| # "inference.networking.k8s.io/bbr-managed=true" to inject provider API keys. | |
| # "inference.llm-d.ai/ipp-managed=true" to inject provider API keys. |
| # A ClusterRole is required because Kubernetes RBAC does not support | ||
| # label-based scoping; the informer restricts the actual data fetched | ||
| # via a label-selector at the cache level (see newFilteredSecretCache). |
There was a problem hiding this comment.
this is a bit too much information and fragile, e.g., if we rename the function this will point to unexisting function name.
| # A ClusterRole is required because Kubernetes RBAC does not support | |
| # label-based scoping; the informer restricts the actual data fetched | |
| # via a label-selector at the cache level (see newFilteredSecretCache). |
| rules: | ||
| # apikey-injection plugin: watches labeled Secrets | ||
| # apikey-injection plugin: watches Secrets labeled | ||
| # "inference.networking.k8s.io/bbr-managed=true" in this namespace. |
There was a problem hiding this comment.
| # "inference.networking.k8s.io/bbr-managed=true" in this namespace. | |
| # "inference.llm-d.ai/ipp-managed=true" in this namespace. |
|
|
||
| The `apikey-injection` plugin watches Kubernetes Secrets to inject provider API | ||
| keys (e.g. OpenAI, Anthropic) into outbound inference requests. Only Secrets | ||
| labeled `inference.networking.k8s.io/bbr-managed: "true"` are processed. |
There was a problem hiding this comment.
| labeled `inference.networking.k8s.io/bbr-managed: "true"` are processed. | |
| labeled `inference.llm-d.ai/ipp-managed: "true"` are processed. |
| RBAC does not support label-based scoping, so the ClusterRole is broader than | ||
| what the application actually needs. To mitigate this: | ||
|
|
||
| - The informer cache uses a **label selector** so that only `bbr-managed` |
There was a problem hiding this comment.
| - The informer cache uses a **label selector** so that only `bbr-managed` | |
| - The informer cache uses a **label selector** so that only `ipp-managed` |
|
|
||
| - The informer cache uses a **label selector** so that only `bbr-managed` | ||
| Secrets are ever fetched from the API server or stored in process memory. | ||
| - The reconciler applies a **predicate filter** as a second layer of defense. |
There was a problem hiding this comment.
does it need to?
predicate filter was initially implemented as the client side filter. if we introduce server side filtering we don't need predicate..
34f12a2 to
24d5580
Compare
Signed-off-by: Abdallah Samara <abdallahsamabd@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/plugins/apikey-injection/plugin.go`:
- Around line 73-85: The constructor starts a background Secret cache via
newFilteredSecretCache and assigns it to filteredCache but never stops it if
reconciler registration (Complete) fails, leaking the watch; modify
NewAPIKeyInjectionPlugin to stop/close the filteredCache (e.g. call
filteredCache.Stop() or filteredCache.Close()/Cancel() as provided by the cache
type) before returning the fmt.Errorf when the
reconcilerBuilder().Complete(reconciler) error path is taken so the background
goroutine is cancelled on constructor failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c31d82a5-3128-4e71-98c4-4ac729be0d3d
📒 Files selected for processing (13)
deploy/README.mddeploy/payload-processing/templates/rbac.yamldeploy/payload-processing/values.yamldocs/providers/anthropic.mddocs/providers/azure-openai.mddocs/providers/bedrock-openai.mddocs/providers/openai.mddocs/providers/vertex-openai.mdpkg/plugins/apikey-injection/plugin.gopkg/plugins/apikey-injection/reconciler.gotest/e2e/e2e_test.gotest/e2e/reports/3.4/external-model-e2e-report.mdtest/e2e/scripts/e2e-values.yaml
✅ Files skipped from review due to trivial changes (8)
- docs/providers/anthropic.md
- deploy/payload-processing/templates/rbac.yaml
- docs/providers/vertex-openai.md
- docs/providers/azure-openai.md
- test/e2e/scripts/e2e-values.yaml
- docs/providers/bedrock-openai.md
- test/e2e/reports/3.4/external-model-e2e-report.md
- docs/providers/openai.md
🚧 Files skipped from review as they are similar to previous changes (3)
- deploy/README.md
- deploy/payload-processing/values.yaml
- pkg/plugins/apikey-injection/reconciler.go
| filteredCache, err := newFilteredSecretCache(ctx) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| var secretObj client.Object = &corev1.Secret{} | ||
| if err := reconcilerBuilder(). | ||
| Named("apikey-injection-secret-watcher"). | ||
| WatchesRawSource( | ||
| source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}), | ||
| ). | ||
| Complete(reconciler); err != nil { | ||
| return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err) |
There was a problem hiding this comment.
Cancel the filtered cache on constructor failure.
Line 73 starts a background Secret cache on the long-lived plugin context before the controller is fully registered. If Complete(...) fails on Line 84, NewAPIKeyInjectionPlugin returns an error but the cache goroutine keeps listing/watching Secrets until process shutdown. That leaves a leaked watch on the error path and can also pollute tests that construct the plugin repeatedly.
Proposed fix
func NewAPIKeyInjectionPlugin(ctx context.Context, reconcilerBuilder func() *builder.Builder, clientReader client.Reader) (*ApiKeyInjectionPlugin, error) {
+ cacheCtx, cancel := context.WithCancel(ctx)
+
store := newSecretStore()
reconciler := &secretReconciler{
Reader: clientReader,
store: store,
}
- filteredCache, err := newFilteredSecretCache(ctx)
+ filteredCache, err := newFilteredSecretCache(cacheCtx)
if err != nil {
+ cancel()
return nil, err
}
+ cacheStarted := true
+ defer func() {
+ if cacheStarted {
+ cancel()
+ }
+ }()
var secretObj client.Object = &corev1.Secret{}
if err := reconcilerBuilder().
Named("apikey-injection-secret-watcher").
WatchesRawSource(
source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}),
).
Complete(reconciler); err != nil {
return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err)
}
+ cacheStarted = false
return (&ApiKeyInjectionPlugin{📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| filteredCache, err := newFilteredSecretCache(ctx) | |
| if err != nil { | |
| return nil, err | |
| } | |
| var secretObj client.Object = &corev1.Secret{} | |
| if err := reconcilerBuilder(). | |
| Named("apikey-injection-secret-watcher"). | |
| WatchesRawSource( | |
| source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}), | |
| ). | |
| Complete(reconciler); err != nil { | |
| return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err) | |
| cacheCtx, cancel := context.WithCancel(ctx) | |
| store := newSecretStore() | |
| reconciler := &secretReconciler{ | |
| Reader: clientReader, | |
| store: store, | |
| } | |
| filteredCache, err := newFilteredSecretCache(cacheCtx) | |
| if err != nil { | |
| cancel() | |
| return nil, err | |
| } | |
| cacheStarted := true | |
| defer func() { | |
| if cacheStarted { | |
| cancel() | |
| } | |
| }() | |
| var secretObj client.Object = &corev1.Secret{} | |
| if err := reconcilerBuilder(). | |
| Named("apikey-injection-secret-watcher"). | |
| WatchesRawSource( | |
| source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}), | |
| ). | |
| Complete(reconciler); err != nil { | |
| return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err) | |
| } | |
| cacheStarted = false | |
| return (&ApiKeyInjectionPlugin{ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/plugins/apikey-injection/plugin.go` around lines 73 - 85, The constructor
starts a background Secret cache via newFilteredSecretCache and assigns it to
filteredCache but never stops it if reconciler registration (Complete) fails,
leaking the watch; modify NewAPIKeyInjectionPlugin to stop/close the
filteredCache (e.g. call filteredCache.Stop() or filteredCache.Close()/Cancel()
as provided by the cache type) before returning the fmt.Errorf when the
reconcilerBuilder().Complete(reconciler) error path is taken so the background
goroutine is cancelled on constructor failure.
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Description
Addresses #201: with
multiNamespace: true, the chart grants a ClusterRole onsecretswithget/list/watchcluster-wide, because Kubernetes RBAC cannot scopesecretsby label. The workload only needs credential Secrets labeledinference.llm-d.ai/ipp-managed: "true".Previously,
apikey-injectionregisteredFor(&Secret{})on the manager’s default cache, so the Secret informer could list/watch and retain a broad set of Secrets in memory even though reconciliation only used labeled ones.This PR:
Least privilege by default (Helm) —
upstreamBbr.bbr.multiNamespacedefaults tofalse, so the chart installs a namespacedRole/RoleBindingforsecretsandexternalmodelsin the release namespace.multiNamespace: trueremains an explicit opt-in that still uses a ClusterRole when models/credentials span namespaces.Defense in depth (runtime) —
apikey-injectionuses a dedicated controller-runtime cache whose Secret list/watch uses a label selector forinference.llm-d.ai/ipp-managed=true, so only those Secrets are fetched into that cache.Reconcilestill checks the label afterGetbefore updating the in-memory credential store. The old event predicate was removed as redundant once list/watch is label-scoped.Docs / operator clarity — RBAC template comments and
deploy/README.mddescribe default vs opt-in ClusterRole behavior, link to RBAC good practices / [Security scans] RBAC Analysis: ClusterRole has access to secrets resource #201, and document the ipp-managed label (including migration from the previousbbr-managedlabel in examples).Changes
pkg/plugins/apikey-injection/plugin.go: ReplaceFor(&Secret{})withWatchesRawSource+ label-filtered Secret cache; bounded wait for cache sync; nomanagedLabelPredicate.pkg/plugins/apikey-injection/reconciler.go: Managed label constant →inference.llm-d.ai/ipp-managed; remove predicate; keep post-Getlabel check inReconcile.deploy/payload-processing/values.yaml: DefaultmultiNamespace: falsewith short comments.deploy/payload-processing/templates/rbac.yaml: Comments for Role vs ClusterRole paths and informer label scoping (no fragile Go symbol references).deploy/README.md: RBAC / Secrets access section aligned with defaults, opt-in ClusterRole, and ipp-managed label.docs/providers/*.md,test/e2e/e2e_test.go,test/e2e/reports/3.4/external-model-e2e-report.md: Example Secrets useinference.llm-d.ai/ipp-managed: "true".How Has This Been Tested?
go test ./pkg/...— unit tests pass (includingapikey-injection; tests that usenewTestPlugin()do not exercise cache startup).go build ./...— compiles cleanly.multiNamespace: true, confirm ClusterRole path; confirm BBR still loads credentials for Secrets labeledinference.llm-d.ai/ipp-managed: "true"(existing clusters must re-label secrets that still use the oldbbr-managedkey).Merge criteria:
Summary by CodeRabbit
Documentation
Chores
Tests