Skip to content

Restrict Secret informer to bbr-managed labeled Secrets only#226

Draft
abdallahsamabd wants to merge 1 commit into
opendatahub-io:mainfrom
abdallahsamabd:feat/201
Draft

Restrict Secret informer to bbr-managed labeled Secrets only#226
abdallahsamabd wants to merge 1 commit into
opendatahub-io:mainfrom
abdallahsamabd:feat/201

Conversation

@abdallahsamabd
Copy link
Copy Markdown
Contributor

@abdallahsamabd abdallahsamabd commented May 3, 2026

Description

Addresses #201: with multiNamespace: true, the chart grants a ClusterRole on secrets with get/list/watch cluster-wide, because Kubernetes RBAC cannot scope secrets by label. The workload only needs credential Secrets labeled inference.llm-d.ai/ipp-managed: "true".

Previously, apikey-injection registered For(&Secret{}) on the manager’s default cache, so the Secret informer could list/watch and retain a broad set of Secrets in memory even though reconciliation only used labeled ones.

This PR:

  1. Least privilege by default (Helm)upstreamBbr.bbr.multiNamespace defaults to false, so the chart installs a namespaced Role/RoleBinding for secrets and externalmodels in the release namespace. multiNamespace: true remains an explicit opt-in that still uses a ClusterRole when models/credentials span namespaces.

  2. Defense in depth (runtime)apikey-injection uses a dedicated controller-runtime cache whose Secret list/watch uses a label selector for inference.llm-d.ai/ipp-managed=true, so only those Secrets are fetched into that cache. Reconcile still checks the label after Get before updating the in-memory credential store. The old event predicate was removed as redundant once list/watch is label-scoped.

  3. Docs / operator clarity — RBAC template comments and deploy/README.md describe default vs opt-in ClusterRole behavior, link to RBAC good practices / [Security scans] RBAC Analysis: ClusterRole has access to secrets resource #201, and document the ipp-managed label (including migration from the previous bbr-managed label in examples).

Changes

  • pkg/plugins/apikey-injection/plugin.go: Replace For(&Secret{}) with WatchesRawSource + label-filtered Secret cache; bounded wait for cache sync; no managedLabelPredicate.
  • pkg/plugins/apikey-injection/reconciler.go: Managed label constant → inference.llm-d.ai/ipp-managed; remove predicate; keep post-Get label check in Reconcile.
  • deploy/payload-processing/values.yaml: Default multiNamespace: false with short comments.
  • deploy/payload-processing/templates/rbac.yaml: Comments for Role vs ClusterRole paths and informer label scoping (no fragile Go symbol references).
  • deploy/README.md: RBAC / Secrets access section aligned with defaults, opt-in ClusterRole, and ipp-managed label.
  • docs/providers/*.md, test/e2e/e2e_test.go, test/e2e/reports/3.4/external-model-e2e-report.md: Example Secrets use inference.llm-d.ai/ipp-managed: "true".

How Has This Been Tested?

  • go test ./pkg/... — unit tests pass (including apikey-injection; tests that use newTestPlugin() do not exercise cache startup).
  • go build ./... — compiles cleanly.
  • Manual / cluster: install or upgrade with default values → confirm only Role is created for this chart’s externalmodels reader; with multiNamespace: true, confirm ClusterRole path; confirm BBR still loads credentials for Secrets labeled inference.llm-d.ai/ipp-managed: "true" (existing clusters must re-label secrets that still use the old bbr-managed key).

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Documentation

    • Added RBAC and Secrets access guidance for API key injection.
    • Clarified single-namespace (Role) vs multi-namespace (ClusterRole) modes.
    • Updated provider and example Secrets to use the new label key: inference.llm-d.ai/ipp-managed: "true".
  • Chores

    • Clarified RBAC template comments to explain Secrets access patterns.
    • Default deployment config switched to single-namespace mode (least privilege).
  • Tests

    • Updated e2e values and reports to reflect the new label and multi-namespace test option.

@openshift-ci openshift-ci Bot requested review from jland-redhat and szedan-rh May 3, 2026 12:27
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: abdallahsamabd
Once this PR has been reviewed and has the lgtm label, please assign noyitz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

📝 Walkthrough

Walkthrough

The diff standardizes the managed Secret label to inference.llm-d.ai/ipp-managed: "true" across docs, tests, and Helm values (defaulting to single-namespace RBAC). It documents RBAC modes in deploy/README. The apikey-injection factory now passes a context into NewAPIKeyInjectionPlugin; NewAPIKeyInjectionPlugin builds a label-filtered controller-runtime cache for Secrets, starts it asynchronously, waits up to 2 minutes for cache sync, and wires reconciliation via WatchesRawSource (replacing the predicate-based event filter). The reconciler removes the predicate and uses the updated managed-label constant.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Issues

  • Context deadline: NewAPIKeyInjectionPlugin blocks up to 2 minutes for cache sync. Ensure the supplied ctx has a deadline/timeout and fail fast if absent. (Action: require/derive ctx with timeout at factory call site.) — CWE-672.

  • Goroutine lifecycle: filteredCache.Start(ctx) runs in a goroutine; verify ctx cancellation reliably stops the cache and that no goroutines or client connections leak. Add unit tests that cancel ctx and assert termination. (Action: add tests and runtime shutdown checks.) — CWE-401.

  • Broad ClusterRole for secrets: opting into multi-namespace creates a ClusterRole granting get/list/watch on secrets. Audit which ServiceAccount receives this, restrict to minimal privileges, and document operational risk. (Action: bind to least-privileged SA or require admission.) — CWE-269, CWE-200.

  • Label trust boundary: granting behavior based solely on a mutable Secret label allows privilege escalation if untrusted actors can change labels. Enforce additional checks in Reconcile (owner/creator, namespace provenance, immutable annotation) or require admission controls to prevent unauthorized label mutation. (Action: add server-side verification.) — CWE-639.

  • Blocking startup semantics: synchronous cache sync in plugin construction can delay/stall controller startup. Consider non-blocking construction with readiness gating or make timeout configurable and documented. (Action: expose timeout and add readiness probe.) — CWE-664.

  • API surface change: NewAPIKeyInjectionPlugin now accepts context.Context. Audit and update call sites, add changelog entry and consider semver impact. (Action: update callers and release notes.)

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title refers to restricting the Secret informer to labeled Secrets, but the PR actually changes the label name and defaults multiNamespace to false—broader scope than the title suggests. Update title to reflect the full scope: e.g., 'Migrate to ipp-managed label and default to single-namespace RBAC' or 'Restrict Secret informer to ipp-managed label with single-namespace default.'
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

rules:
# apikey-injection plugin: watches labeled Secrets
# apikey-injection plugin: watches Secrets that carry the label
# "inference.networking.k8s.io/bbr-managed=true" to inject provider API keys.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're changing the label since we migrated code to llm-d.

Suggested change
# "inference.networking.k8s.io/bbr-managed=true" to inject provider API keys.
# "inference.llm-d.ai/ipp-managed=true" to inject provider API keys.

Comment on lines +11 to +13
# A ClusterRole is required because Kubernetes RBAC does not support
# label-based scoping; the informer restricts the actual data fetched
# via a label-selector at the cache level (see newFilteredSecretCache).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit too much information and fragile, e.g., if we rename the function this will point to unexisting function name.

Suggested change
# A ClusterRole is required because Kubernetes RBAC does not support
# label-based scoping; the informer restricts the actual data fetched
# via a label-selector at the cache level (see newFilteredSecretCache).

rules:
# apikey-injection plugin: watches labeled Secrets
# apikey-injection plugin: watches Secrets labeled
# "inference.networking.k8s.io/bbr-managed=true" in this namespace.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# "inference.networking.k8s.io/bbr-managed=true" in this namespace.
# "inference.llm-d.ai/ipp-managed=true" in this namespace.

Comment thread deploy/README.md Outdated

The `apikey-injection` plugin watches Kubernetes Secrets to inject provider API
keys (e.g. OpenAI, Anthropic) into outbound inference requests. Only Secrets
labeled `inference.networking.k8s.io/bbr-managed: "true"` are processed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
labeled `inference.networking.k8s.io/bbr-managed: "true"` are processed.
labeled `inference.llm-d.ai/ipp-managed: "true"` are processed.

Comment thread deploy/README.md Outdated
RBAC does not support label-based scoping, so the ClusterRole is broader than
what the application actually needs. To mitigate this:

- The informer cache uses a **label selector** so that only `bbr-managed`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The informer cache uses a **label selector** so that only `bbr-managed`
- The informer cache uses a **label selector** so that only `ipp-managed`

Comment thread deploy/README.md Outdated

- The informer cache uses a **label selector** so that only `bbr-managed`
Secrets are ever fetched from the API server or stored in process memory.
- The reconciler applies a **predicate filter** as a second layer of defense.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to?
predicate filter was initially implemented as the client side filter. if we introduce server side filtering we don't need predicate..

@abdallahsamabd abdallahsamabd force-pushed the feat/201 branch 2 times, most recently from 34f12a2 to 24d5580 Compare May 4, 2026 13:45
Signed-off-by: Abdallah Samara <abdallahsamabd@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/plugins/apikey-injection/plugin.go`:
- Around line 73-85: The constructor starts a background Secret cache via
newFilteredSecretCache and assigns it to filteredCache but never stops it if
reconciler registration (Complete) fails, leaking the watch; modify
NewAPIKeyInjectionPlugin to stop/close the filteredCache (e.g. call
filteredCache.Stop() or filteredCache.Close()/Cancel() as provided by the cache
type) before returning the fmt.Errorf when the
reconcilerBuilder().Complete(reconciler) error path is taken so the background
goroutine is cancelled on constructor failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c31d82a5-3128-4e71-98c4-4ac729be0d3d

📥 Commits

Reviewing files that changed from the base of the PR and between 24d5580 and 058721b.

📒 Files selected for processing (13)
  • deploy/README.md
  • deploy/payload-processing/templates/rbac.yaml
  • deploy/payload-processing/values.yaml
  • docs/providers/anthropic.md
  • docs/providers/azure-openai.md
  • docs/providers/bedrock-openai.md
  • docs/providers/openai.md
  • docs/providers/vertex-openai.md
  • pkg/plugins/apikey-injection/plugin.go
  • pkg/plugins/apikey-injection/reconciler.go
  • test/e2e/e2e_test.go
  • test/e2e/reports/3.4/external-model-e2e-report.md
  • test/e2e/scripts/e2e-values.yaml
✅ Files skipped from review due to trivial changes (8)
  • docs/providers/anthropic.md
  • deploy/payload-processing/templates/rbac.yaml
  • docs/providers/vertex-openai.md
  • docs/providers/azure-openai.md
  • test/e2e/scripts/e2e-values.yaml
  • docs/providers/bedrock-openai.md
  • test/e2e/reports/3.4/external-model-e2e-report.md
  • docs/providers/openai.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • deploy/README.md
  • deploy/payload-processing/values.yaml
  • pkg/plugins/apikey-injection/reconciler.go

Comment on lines +73 to 85
filteredCache, err := newFilteredSecretCache(ctx)
if err != nil {
return nil, err
}

var secretObj client.Object = &corev1.Secret{}
if err := reconcilerBuilder().
Named("apikey-injection-secret-watcher").
WatchesRawSource(
source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}),
).
Complete(reconciler); err != nil {
return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cancel the filtered cache on constructor failure.

Line 73 starts a background Secret cache on the long-lived plugin context before the controller is fully registered. If Complete(...) fails on Line 84, NewAPIKeyInjectionPlugin returns an error but the cache goroutine keeps listing/watching Secrets until process shutdown. That leaves a leaked watch on the error path and can also pollute tests that construct the plugin repeatedly.

Proposed fix
 func NewAPIKeyInjectionPlugin(ctx context.Context, reconcilerBuilder func() *builder.Builder, clientReader client.Reader) (*ApiKeyInjectionPlugin, error) {
+	cacheCtx, cancel := context.WithCancel(ctx)
+
 	store := newSecretStore()
 	reconciler := &secretReconciler{
 		Reader: clientReader,
 		store:  store,
 	}
 
-	filteredCache, err := newFilteredSecretCache(ctx)
+	filteredCache, err := newFilteredSecretCache(cacheCtx)
 	if err != nil {
+		cancel()
 		return nil, err
 	}
+	cacheStarted := true
+	defer func() {
+		if cacheStarted {
+			cancel()
+		}
+	}()
 
 	var secretObj client.Object = &corev1.Secret{}
 	if err := reconcilerBuilder().
 		Named("apikey-injection-secret-watcher").
 		WatchesRawSource(
 			source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}),
 		).
 		Complete(reconciler); err != nil {
 		return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err)
 	}
+	cacheStarted = false
 
 	return (&ApiKeyInjectionPlugin{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
filteredCache, err := newFilteredSecretCache(ctx)
if err != nil {
return nil, err
}
var secretObj client.Object = &corev1.Secret{}
if err := reconcilerBuilder().
Named("apikey-injection-secret-watcher").
WatchesRawSource(
source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}),
).
Complete(reconciler); err != nil {
return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err)
cacheCtx, cancel := context.WithCancel(ctx)
store := newSecretStore()
reconciler := &secretReconciler{
Reader: clientReader,
store: store,
}
filteredCache, err := newFilteredSecretCache(cacheCtx)
if err != nil {
cancel()
return nil, err
}
cacheStarted := true
defer func() {
if cacheStarted {
cancel()
}
}()
var secretObj client.Object = &corev1.Secret{}
if err := reconcilerBuilder().
Named("apikey-injection-secret-watcher").
WatchesRawSource(
source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}),
).
Complete(reconciler); err != nil {
return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err)
}
cacheStarted = false
return (&ApiKeyInjectionPlugin{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/apikey-injection/plugin.go` around lines 73 - 85, The constructor
starts a background Secret cache via newFilteredSecretCache and assigns it to
filteredCache but never stops it if reconciler registration (Complete) fails,
leaking the watch; modify NewAPIKeyInjectionPlugin to stop/close the
filteredCache (e.g. call filteredCache.Stop() or filteredCache.Close()/Cancel()
as provided by the cache type) before returning the fmt.Errorf when the
reconcilerBuilder().Complete(reconciler) error path is taken so the background
goroutine is cancelled on constructor failure.

@abdallahsamabd abdallahsamabd marked this pull request as draft May 5, 2026 11:37
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2026
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 14, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants