Restrict Secret informer to bbr-managed labeled Secrets only by abdallahsamabd · Pull Request #226 · opendatahub-io/ai-gateway-payload-processing

abdallahsamabd · 2026-05-03T12:27:52Z

Description

Addresses #201: with multiNamespace: true, the chart grants a ClusterRole on secrets with get/list/watch cluster-wide, because Kubernetes RBAC cannot scope secrets by label. The workload only needs credential Secrets labeled inference.llm-d.ai/ipp-managed: "true".

Previously, apikey-injection registered For(&Secret{}) on the manager’s default cache, so the Secret informer could list/watch and retain a broad set of Secrets in memory even though reconciliation only used labeled ones.

This PR:

Least privilege by default (Helm) — upstreamBbr.bbr.multiNamespace defaults to false, so the chart installs a namespaced Role/RoleBinding for secrets and externalmodels in the release namespace. multiNamespace: true remains an explicit opt-in that still uses a ClusterRole when models/credentials span namespaces.
Defense in depth (runtime) — apikey-injection uses a dedicated controller-runtime cache whose Secret list/watch uses a label selector for inference.llm-d.ai/ipp-managed=true, so only those Secrets are fetched into that cache. Reconcile still checks the label after Get before updating the in-memory credential store. The old event predicate was removed as redundant once list/watch is label-scoped.
Docs / operator clarity — RBAC template comments and deploy/README.md describe default vs opt-in ClusterRole behavior, link to RBAC good practices / [Security scans] RBAC Analysis: ClusterRole has access to secrets resource #201, and document the ipp-managed label (including migration from the previous bbr-managed label in examples).

Changes

pkg/plugins/apikey-injection/plugin.go: Replace For(&Secret{}) with WatchesRawSource + label-filtered Secret cache; bounded wait for cache sync; no managedLabelPredicate.
pkg/plugins/apikey-injection/reconciler.go: Managed label constant → inference.llm-d.ai/ipp-managed; remove predicate; keep post-Get label check in Reconcile.
deploy/payload-processing/values.yaml: Default multiNamespace: false with short comments.
deploy/payload-processing/templates/rbac.yaml: Comments for Role vs ClusterRole paths and informer label scoping (no fragile Go symbol references).
deploy/README.md: RBAC / Secrets access section aligned with defaults, opt-in ClusterRole, and ipp-managed label.
docs/providers/*.md, test/e2e/e2e_test.go, test/e2e/reports/3.4/external-model-e2e-report.md: Example Secrets use inference.llm-d.ai/ipp-managed: "true".

How Has This Been Tested?

go test ./pkg/... — unit tests pass (including apikey-injection; tests that use newTestPlugin() do not exercise cache startup).
go build ./... — compiles cleanly.
Manual / cluster: install or upgrade with default values → confirm only Role is created for this chart’s externalmodels reader; with multiNamespace: true, confirm ClusterRole path; confirm BBR still loads credentials for Secrets labeled inference.llm-d.ai/ipp-managed: "true" (existing clusters must re-label secrets that still use the old bbr-managed key).

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

Documentation
- Added RBAC and Secrets access guidance for API key injection.
- Clarified single-namespace (Role) vs multi-namespace (ClusterRole) modes.
- Updated provider and example Secrets to use the new label key: inference.llm-d.ai/ipp-managed: "true".
Chores
- Clarified RBAC template comments to explain Secrets access patterns.
- Default deployment config switched to single-namespace mode (least privilege).
Tests
- Updated e2e values and reports to reflect the new label and multi-namespace test option.

openshift-ci · 2026-05-03T12:27:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: abdallahsamabd
Once this PR has been reviewed and has the lgtm label, please assign noyitz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-03T12:31:00Z

📝 Walkthrough

Walkthrough

The diff standardizes the managed Secret label to inference.llm-d.ai/ipp-managed: "true" across docs, tests, and Helm values (defaulting to single-namespace RBAC). It documents RBAC modes in deploy/README. The apikey-injection factory now passes a context into NewAPIKeyInjectionPlugin; NewAPIKeyInjectionPlugin builds a label-filtered controller-runtime cache for Secrets, starts it asynchronously, waits up to 2 minutes for cache sync, and wires reconciliation via WatchesRawSource (replacing the predicate-based event filter). The reconciler removes the predicate and uses the updated managed-label constant.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Issues

Context deadline: NewAPIKeyInjectionPlugin blocks up to 2 minutes for cache sync. Ensure the supplied ctx has a deadline/timeout and fail fast if absent. (Action: require/derive ctx with timeout at factory call site.) — CWE-672.
Goroutine lifecycle: filteredCache.Start(ctx) runs in a goroutine; verify ctx cancellation reliably stops the cache and that no goroutines or client connections leak. Add unit tests that cancel ctx and assert termination. (Action: add tests and runtime shutdown checks.) — CWE-401.
Broad ClusterRole for secrets: opting into multi-namespace creates a ClusterRole granting get/list/watch on secrets. Audit which ServiceAccount receives this, restrict to minimal privileges, and document operational risk. (Action: bind to least-privileged SA or require admission.) — CWE-269, CWE-200.
Label trust boundary: granting behavior based solely on a mutable Secret label allows privilege escalation if untrusted actors can change labels. Enforce additional checks in Reconcile (owner/creator, namespace provenance, immutable annotation) or require admission controls to prevent unauthorized label mutation. (Action: add server-side verification.) — CWE-639.
Blocking startup semantics: synchronous cache sync in plugin construction can delay/stall controller startup. Consider non-blocking construction with readiness gating or make timeout configurable and documented. (Action: expose timeout and add readiness probe.) — CWE-664.
API surface change: NewAPIKeyInjectionPlugin now accepts context.Context. Audit and update call sites, add changelog entry and consider semver impact. (Action: update callers and release notes.)

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title refers to restricting the Secret informer to labeled Secrets, but the PR actually changes the label name and defaults multiNamespace to false—broader scope than the title suggests.	Update title to reflect the full scope: e.g., 'Migrate to ipp-managed label and default to single-namespace RBAC' or 'Restrict Secret informer to ipp-managed label with single-namespace default.'

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

nirrozenbaum · 2026-05-03T15:15:28Z

 rules:
-  # apikey-injection plugin: watches labeled Secrets
+  # apikey-injection plugin: watches Secrets that carry the label
+  # "inference.networking.k8s.io/bbr-managed=true" to inject provider API keys.


we're changing the label since we migrated code to llm-d.

Suggested change

# "inference.networking.k8s.io/bbr-managed=true" to inject provider API keys.

# "inference.llm-d.ai/ipp-managed=true" to inject provider API keys.

nirrozenbaum · 2026-05-03T15:16:28Z

+  # A ClusterRole is required because Kubernetes RBAC does not support
+  # label-based scoping; the informer restricts the actual data fetched
+  # via a label-selector at the cache level (see newFilteredSecretCache).


this is a bit too much information and fragile, e.g., if we rename the function this will point to unexisting function name.

Suggested change

# A ClusterRole is required because Kubernetes RBAC does not support

# label-based scoping; the informer restricts the actual data fetched

# via a label-selector at the cache level (see newFilteredSecretCache).

nirrozenbaum · 2026-05-03T15:16:48Z

 rules:
-  # apikey-injection plugin: watches labeled Secrets
+  # apikey-injection plugin: watches Secrets labeled
+  # "inference.networking.k8s.io/bbr-managed=true" in this namespace.


Suggested change

# "inference.networking.k8s.io/bbr-managed=true" in this namespace.

# "inference.llm-d.ai/ipp-managed=true" in this namespace.

nirrozenbaum · 2026-05-03T15:18:41Z

+
+The `apikey-injection` plugin watches Kubernetes Secrets to inject provider API
+keys (e.g. OpenAI, Anthropic) into outbound inference requests. Only Secrets
+labeled `inference.networking.k8s.io/bbr-managed: "true"` are processed.


Suggested change

labeled `inference.networking.k8s.io/bbr-managed: "true"` are processed.

labeled `inference.llm-d.ai/ipp-managed: "true"` are processed.

nirrozenbaum · 2026-05-03T15:19:01Z

+RBAC does not support label-based scoping, so the ClusterRole is broader than
+what the application actually needs. To mitigate this:
+
+- The informer cache uses a **label selector** so that only `bbr-managed`


Suggested change

- The informer cache uses a **label selector** so that only `bbr-managed`

- The informer cache uses a **label selector** so that only `ipp-managed`

nirrozenbaum · 2026-05-03T15:19:50Z

+
+- The informer cache uses a **label selector** so that only `bbr-managed`
+  Secrets are ever fetched from the API server or stored in process memory.
+- The reconciler applies a **predicate filter** as a second layer of defense.


does it need to?
predicate filter was initially implemented as the client side filter. if we introduce server side filtering we don't need predicate..

Signed-off-by: Abdallah Samara <abdallahsamabd@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/plugins/apikey-injection/plugin.go`:
- Around line 73-85: The constructor starts a background Secret cache via
newFilteredSecretCache and assigns it to filteredCache but never stops it if
reconciler registration (Complete) fails, leaking the watch; modify
NewAPIKeyInjectionPlugin to stop/close the filteredCache (e.g. call
filteredCache.Stop() or filteredCache.Close()/Cancel() as provided by the cache
type) before returning the fmt.Errorf when the
reconcilerBuilder().Complete(reconciler) error path is taken so the background
goroutine is cancelled on constructor failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c31d82a5-3128-4e71-98c4-4ac729be0d3d

📥 Commits

Reviewing files that changed from the base of the PR and between 24d5580 and 058721b.

📒 Files selected for processing (13)

deploy/README.md
deploy/payload-processing/templates/rbac.yaml
deploy/payload-processing/values.yaml
docs/providers/anthropic.md
docs/providers/azure-openai.md
docs/providers/bedrock-openai.md
docs/providers/openai.md
docs/providers/vertex-openai.md
pkg/plugins/apikey-injection/plugin.go
pkg/plugins/apikey-injection/reconciler.go
test/e2e/e2e_test.go
test/e2e/reports/3.4/external-model-e2e-report.md
test/e2e/scripts/e2e-values.yaml

✅ Files skipped from review due to trivial changes (8)

docs/providers/anthropic.md
deploy/payload-processing/templates/rbac.yaml
docs/providers/vertex-openai.md
docs/providers/azure-openai.md
test/e2e/scripts/e2e-values.yaml
docs/providers/bedrock-openai.md
test/e2e/reports/3.4/external-model-e2e-report.md
docs/providers/openai.md

🚧 Files skipped from review as they are similar to previous changes (3)

deploy/README.md
deploy/payload-processing/values.yaml
pkg/plugins/apikey-injection/reconciler.go

coderabbitai · 2026-05-04T17:34:35Z

+	filteredCache, err := newFilteredSecretCache(ctx)
+	if err != nil {
+		return nil, err
+	}
+
+	var secretObj client.Object = &corev1.Secret{}
+	if err := reconcilerBuilder().
+		Named("apikey-injection-secret-watcher").
+		WatchesRawSource(
+			source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}),
+		).
+		Complete(reconciler); err != nil {
 		return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cancel the filtered cache on constructor failure.

Line 73 starts a background Secret cache on the long-lived plugin context before the controller is fully registered. If Complete(...) fails on Line 84, NewAPIKeyInjectionPlugin returns an error but the cache goroutine keeps listing/watching Secrets until process shutdown. That leaves a leaked watch on the error path and can also pollute tests that construct the plugin repeatedly.

Proposed fix

func NewAPIKeyInjectionPlugin(ctx context.Context, reconcilerBuilder func() *builder.Builder, clientReader client.Reader) (*ApiKeyInjectionPlugin, error) { + cacheCtx, cancel := context.WithCancel(ctx) + store := newSecretStore() reconciler := &secretReconciler{ Reader: clientReader, store: store, } - filteredCache, err := newFilteredSecretCache(ctx) + filteredCache, err := newFilteredSecretCache(cacheCtx) if err != nil { + cancel() return nil, err } + cacheStarted := true + defer func() { + if cacheStarted { + cancel() + } + }() var secretObj client.Object = &corev1.Secret{} if err := reconcilerBuilder(). Named("apikey-injection-secret-watcher"). WatchesRawSource( source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}), ). Complete(reconciler); err != nil { return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err) } + cacheStarted = false return (&ApiKeyInjectionPlugin{

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

filteredCache, err := newFilteredSecretCache(ctx)

if err != nil {

return nil, err

}

var secretObj client.Object = &corev1.Secret{}

if err := reconcilerBuilder().

Named("apikey-injection-secret-watcher").

WatchesRawSource(

source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}),

).

Complete(reconciler); err != nil {

return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err)

cacheCtx, cancel := context.WithCancel(ctx)

store := newSecretStore()

reconciler := &secretReconciler{

Reader: clientReader,

store: store,

}

filteredCache, err := newFilteredSecretCache(cacheCtx)

if err != nil {

cancel()

return nil, err

}

cacheStarted := true

defer func() {

if cacheStarted {

cancel()

}

}()

var secretObj client.Object = &corev1.Secret{}

if err := reconcilerBuilder().

Named("apikey-injection-secret-watcher").

WatchesRawSource(

source.Kind(filteredCache, secretObj, &handler.EnqueueRequestForObject{}),

).

Complete(reconciler); err != nil {

return nil, fmt.Errorf("failed to register Secret reconciler for plugin '%s' - %w", APIKeyInjectionPluginType, err)

}

cacheStarted = false

return (&ApiKeyInjectionPlugin{

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/apikey-injection/plugin.go` around lines 73 - 85, The constructor starts a background Secret cache via newFilteredSecretCache and assigns it to filteredCache but never stops it if reconciler registration (Complete) fails, leaking the watch; modify NewAPIKeyInjectionPlugin to stop/close the filteredCache (e.g. call filteredCache.Stop() or filteredCache.Close()/Cancel() as provided by the cache type) before returning the fmt.Errorf when the reconcilerBuilder().Complete(reconciler) error path is taken so the background goroutine is cancelled on constructor failure.

openshift-ci · 2026-05-14T09:00:43Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci Bot requested review from jland-redhat and szedan-rh May 3, 2026 12:27

nirrozenbaum reviewed May 3, 2026

View reviewed changes

abdallahsamabd force-pushed the feat/201 branch 2 times, most recently from 34f12a2 to 24d5580 Compare May 4, 2026 13:45

Restrict Secret informer to bbr-managed labeled Secrets only

058721b

Signed-off-by: Abdallah Samara <abdallahsamabd@gmail.com>

abdallahsamabd force-pushed the feat/201 branch from 24d5580 to 058721b Compare May 4, 2026 17:29

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

abdallahsamabd marked this pull request as draft May 5, 2026 11:37

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2026

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026

	# "inference.networking.k8s.io/bbr-managed=true" to inject provider API keys.
	# "inference.llm-d.ai/ipp-managed=true" to inject provider API keys.

	# A ClusterRole is required because Kubernetes RBAC does not support
	# label-based scoping; the informer restricts the actual data fetched
	# via a label-selector at the cache level (see newFilteredSecretCache).

	labeled `inference.networking.k8s.io/bbr-managed: "true"` are processed.
	labeled `inference.llm-d.ai/ipp-managed: "true"` are processed.

	- The informer cache uses a label selector so that only `bbr-managed`
	- The informer cache uses a label selector so that only `ipp-managed`

Conversation

abdallahsamabd commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented May 3, 2026

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Issues

❌ Failed checks (1 warning)

Uh oh!

nirrozenbaum May 3, 2026

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum May 3, 2026

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum May 3, 2026

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum May 3, 2026

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum May 3, 2026

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abdallahsamabd commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading