Skip to content

WIP: OCPCLOUD-3465: feat: implement cluster-wide proxy support via new ProxyController#586

Open
tmshort wants to merge 10 commits into
openshift:mainfrom
tmshort:fix-OCPCLOUD-3465
Open

WIP: OCPCLOUD-3465: feat: implement cluster-wide proxy support via new ProxyController#586
tmshort wants to merge 10 commits into
openshift:mainfrom
tmshort:fix-OCPCLOUD-3465

Conversation

@tmshort

@tmshort tmshort commented Jun 9, 2026

Copy link
Copy Markdown

Add a ProxyController to the capi-installer manager that watches managed
workloads opting into proxy injection via the new annotation
capi-operator.openshift.io/inject-proxy: <container,...>
and applies HTTP_PROXY/HTTPS_PROXY/NO_PROXY environment variables to the
named containers via Server-Side Apply whenever the cluster-wide Proxy CR
changes.

Design (Option B from OCPCLOUD-3465 Jira):

  • A dedicated controller in the capi-installer manager watches the Proxy
    singleton (proxies/cluster) and all managed objects via the existing
    trackingCache, so it never creates a second informer for those objects.
  • Env vars are applied with a separate SSA field manager (capi-operator-proxy)
    that owns only the three proxy env var entries per container. Boxcutter
    continues to own all other fields on the same workload object without
    conflict.
  • Proxy changes propagate to running pods independently of the revision
    lifecycle, including when the installer has fallen back to an older
    revision (N-1) because the newest revision failed to roll out.
  • Starting with Option B avoids a future field-ownership migration: if
    Option A (inject at render time) had been deployed first, converting to
    Option B later would require force-stealing SSA ownership from boxcutter.

The annotation is added to provider Deployments by manifests-gen (see next
commit). Providers that use different container names or need proxy injection
on additional containers should set the annotation in their upstream manifests.

SetupWithManager now returns the TrackingCache so it can be shared with
SetupProxyController without creating a second managed-object cache.

Summary by CodeRabbit

  • New Features

    • Automatic proxy injection: cluster proxy settings are applied to managed Deployments, DaemonSets, and StatefulSets when annotated, adding HTTP_PROXY/HTTPS_PROXY/NO_PROXY envs.
    • Dedicated proxy reconciler: operator wires a proxy controller to propagate proxy changes to tracked workloads and target specified containers.
  • Tests

    • Test harness metrics endpoint and default TLS updated for stronger TLS (min TLS 1.2 and modern cipher) and explicit metrics binding.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 9, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 9, 2026

Copy link
Copy Markdown

@tmshort: This pull request references OCPCLOUD-3465 which is a valid jira issue.

Details

In response to this:

When a cluster is configured with a cluster-wide proxy, CAPI provider
controllers must be able to reach external endpoints through it.

The RevisionController now reads the Proxy CR and stores a PROXY_HASH
substitution in the revision spec. This single value is sufficient to:

  1. Make the revision ContentID change whenever proxy configuration changes,
    causing the revision controller to create a new revision.
  2. Drive two annotations onto the pod templates of every provider
    Deployment, DaemonSet, and StatefulSet in the rendered manifests:
  • config.openshift.io/inject-proxy: manager
    Signals OpenShift's admission webhook to inject HTTP_PROXY /
    HTTPS_PROXY / NO_PROXY into the "manager" container at pod creation.

  • cluster-api.operator.openshift.io/proxy-hash:
    Changes when the proxy changes, so Kubernetes triggers a rolling
    update and running pods pick up the new proxy configuration.

The RevisionController also watches the proxies/cluster singleton so any
proxy change immediately triggers reconciliation. The proxy hash is stored
as a manifest substitution in the revision spec, keeping rendering fully
deterministic: NewInstallerRevisionFromAPI reconstructs the same annotations
from the stored revision, preserving ContentID validation. Older revisions
(no PROXY_HASH substitution) render without proxy annotations and their
ContentID validation continues to pass.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tmshort

tmshort commented Jun 9, 2026

Copy link
Copy Markdown
Author

/wip

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds cluster-wide proxy reconciliation: installer wiring exposes the tracking cache, a new ProxyController syncs configv1.Proxy into tracked workloads' pod-template env, a util reads the Proxy resource, manifests add an inject-proxy annotation, and tests adjust manager metrics/TLS settings.

Changes

Proxy Configuration Reconciliation

Layer / File(s) Summary
Installer tracking cache & wiring
pkg/controllers/installer/installer_controller.go, cmd/capi-operator/main.go, pkg/controllers/installer/suite_test.go
installer.SetupWithManager now returns managedcache.TrackingCache, error; main captures the tracking cache and calls installer.SetupProxyController; tests updated to assert the returned error.
Deployment proxy annotation
manifests-gen/customizations.go
Adds capi-operator.openshift.io/inject-proxy annotation to the operator Deployment pod template for the manager container when absent.
Proxy utility
pkg/util/proxy.go
New GetProxyEnvVars(ctx, cl client.Reader) ([]corev1.EnvVar, error) reads the cluster configv1.Proxy named cluster and returns HTTP_PROXY/HTTPS_PROXY/NO_PROXY env entries when configured, or nil, nil if absent.
Proxy controller implementation
pkg/controllers/installer/proxy_controller.go
New ProxyController watches configv1.Proxy (name cluster) and tracked workloads from the shared tracking cache, computes proxy env vars, and patches pod-template container env using SSA with a dedicated field manager and ForceOwnership; supports per-workload container selection via capi-operator.openshift.io/inject-proxy annotation and aggregates per-workload errors.
Test manager TLS/metrics config updates
pkg/controllers/revision/helpers_test.go
Imports controller-runtime metrics server, sets manager metrics BindAddress to "0", and updates the test manager's default TLS fallback to require TLS1.2 and a modern ECDHE-RSA-AES-GCM cipher suite.

Sequence Diagram

sequenceDiagram
  participant Proxy as configv1.Proxy
  participant ProxyCtrl as ProxyController
  participant Util as GetProxyEnvVars
  participant TrackingCache as managedcache.TrackingCache
  participant Workload as Deployment/DaemonSet/StatefulSet
  Proxy->>ProxyCtrl: watch cluster proxy changes
  ProxyCtrl->>Util: fetch HTTP_PROXY, HTTPS_PROXY, NO_PROXY
  Util-->>ProxyCtrl: return proxy env vars
  ProxyCtrl->>TrackingCache: list tracked workloads
  ProxyCtrl->>Workload: patch pod template containers with proxy env vars via SSA
  Workload-->>ProxyCtrl: acknowledge patch
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning In PR test files, there are assertions like Expect(err).NotTo(HaveOccurred()) without messages (installer/suite_test.go, revision/helpers_test.go, installer/helpers_test.go) and Eventually waits wi... Add meaningful messages to HaveOccurred assertions (e.g., Expect(err).NotTo(HaveOccurred(), "...") ) and ensure all Eventually/Consistently cluster waits specify timeouts (e.g., WithTimeout(defaultEventuallyTimeout)).
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing cluster-wide proxy support via a new ProxyController, with clear reference to the Jira ticket (OCPCLOUD-3465) and WIP status.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR #586 changes only pkg/controllers/revision/helpers_test.go and pkg/controllers/installer/suite_test.go; neither contains any Ginkgo It/Describe/Context/When test titles with dynamic values.
Microshift Test Compatibility ✅ Passed PR changes only controller code plus installer/revision test setup; no new e2e Ginkgo tests were added (no test/e2e files or new Describe/It blocks).
Single Node Openshift (Sno) Test Compatibility ✅ Passed Inspected the PR-mentioned test files (helpers_test.go, installer suite_test.go): no new Ginkgo e2e tests or multi-node/HA assumptions found (no node counts, anti-affinity, drain/failover).
Topology-Aware Scheduling Compatibility ✅ Passed Checked the PR’s described files (proxy_controller.go, proxy.go, customizations.go, installer_controller.go, suite_test.go, main.go, helpers_test.go) for affinity/topologyKey/spread/anti-affinity/n...
Ote Binary Stdout Contract ✅ Passed Scanned cmd/capi-operator/main.go and suite/revision test setup for fmt.Print/Printf/Println, os.Stdout, and klog/log output-to-stdout; none found. main config sets klog.LogToStderr(default true).
Ipv6 And Disconnected Network Test Compatibility ✅ Passed git diff origin/main..HEAD shows only 7 changed files (none under test/e2e); modified files contain no ginkgo It/Describe blocks and no hardcoded IPv4 literal/external URL patterns.
No-Weak-Crypto ✅ Passed Non-vendor repo scan found no RC4/DES/3DES/Blowfish/ECB/MD5/SHA1 crypto usage in PR code; revision TLS test now sets MinVersion TLS1.2 (AES-GCM) and no constant-time token/secret compares exist.
Container-Privileges ✅ Passed Searched PR-related Go/YAML/templates and manifests/capi-operator-manifests for privileged/hostPID/hostNetwork/hostIPC/SYS_ADMIN/allowPrivilegeEscalation and runAsUser:0; none found.
No-Sensitive-Data-In-Logs ✅ Passed In PR files, proxy_controller.go logs only generic reconciliation/status and container names (no proxy/env values); pkg/util/proxy.go has no logging; no password/token/HTTP_PROXY values appear in l...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from RadekManak and racheljpg June 9, 2026 20:49
@tmshort tmshort changed the title OCPCLOUD-3465: inject cluster-wide proxy annotations into CAPI provider Deployments WIP: OCPCLOUD-3465: inject cluster-wide proxy annotations into CAPI provider Deployments Jun 9, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2026
@tmshort

tmshort commented Jun 10, 2026

Copy link
Copy Markdown
Author

@mdbooth this is a replacement for #517. Does it match expectations?

@tmshort

tmshort commented Jun 10, 2026

Copy link
Copy Markdown
Author

/assign mdbooth

@mdbooth

mdbooth commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

There's a few problems here. The biggest one is that config.openshift.io/inject-proxy isn't handled by an admission webhook or even a controller watching annotated objects (yeah, this surprised me too). It's handled by CVO's resource builder. Therefore because these are not CVO resources, the proxy env vars won't be applied.

The other big thing is that I don't want the revision/installer controllers to contain any heuristics about providers. This is why for the TLS stuff I made the providers include explicit substitution points for the TLS vars. In this case the heuristic is that we only want to add proxy vars to a D/DS/SS with a container called 'manager'. This isn't true today (OpenStack already packages ORC, which doesn't follow this pattern), and I only expect the exceptions to expand over time.

The takeaways are:

  • We need to implement our own mechanism for applying proxy env vars, because the existing one isn't reusable
  • The new mechanism can (and should) copy the behaviour of the existing mechanism, but if it's annotation based it must use a different name to avoid confusion
  • If using an annotation based mechanism, it must be applied in the provider repo

So perhaps something like an annotation capi-operator.openshift.io/inject-proxy which has identical semantics to config.openshift.io/inject-proxy, but isn't going to confuse anybody into wondering how CVO is operating on a non-CVO object. We could update manifests-gen to apply this automatically to any D/DS/SS with a container called manager, but the provider can also add it manually anywhere else it's required, or even override the manifests-gen generated default on provider deployments if required.

This also means we can't use the PROXY_HASH mechanism, because if we're going to apply the proxy env vars directly they need to be included in the revision in full.

I also had another option in mind. We could create a new controller which watches objects managed by the installer controller which have the new annotation, and adds the env vars after creation. This works because:

  • The installer's trackingCache should be reusable by a second controller
  • Boxcutter uses SSA, so a second controller can add new fields without conflict as long as it doesn't change existing fields (which is not true for TLS)
  • It covers the edge case where we want to apply a new proxy configuration to revision N-1 because revision N is failing to roll out

This may or may not be simpler, not sure. It's something to consider.

I think we need to:

  • Decide on a provider contract
  • Implement it in manifests-gen
  • Roll it out to all providers
  • Implement it in capi-installer

@tmshort

tmshort commented Jun 12, 2026

Copy link
Copy Markdown
Author

Reevaluating based on @mdbooth's comments.

@tmshort tmshort force-pushed the fix-OCPCLOUD-3465 branch from ceacc09 to b269f09 Compare June 12, 2026 18:11
@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from mdbooth. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tmshort tmshort changed the title WIP: OCPCLOUD-3465: inject cluster-wide proxy annotations into CAPI provider Deployments WIP: OCPCLOUD-3465: feat: implement cluster-wide proxy support via new ProxyController Jun 12, 2026
@tmshort

tmshort commented Jun 12, 2026

Copy link
Copy Markdown
Author

@CodeRabbit help

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai rate limit to show your current review rate limit status.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
  • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
  • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai emit path instructions to emit generated path-specific review instructions for this repository.
  • @coderabbitai resolve merge conflict to automatically resolve merge conflicts.
  • @coderabbitai autofix to automatically fix issues identified in unresolved review comments.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@tmshort

tmshort commented Jun 12, 2026

Copy link
Copy Markdown
Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controllers/installer/proxy_controller.go`:
- Around line 159-161: The per-object calls to applyProxyVars inside the
workload reconciliation loops currently return on the first error, which stops
processing remaining objects; update each loop that calls applyProxyVars (the
calls at deploy.Name / deploy.Namespace and the similar calls for other workload
kinds) to instead collect errors into a slice and continue processing the rest
of the items, then after the loop return a combined error if any occurred (use
k8s.io/apimachinery/pkg/util/errors.NewAggregate or errors.Join) or nil if none;
ensure you append the returned err from applyProxyVars to the collector and do
not early-return inside the loop.
- Around line 153-157: The code currently skips objects when
ProxyInjectAnnotation is missing (e.g., checks like
deploy.Spec.Template.Annotations[ProxyInjectAnnotation]) which leaves previously
injected proxy env vars owned by "capi-operator-proxy" stale; update the
reconcile logic for Deployments, StatefulSets and DaemonSets to, when the
annotation is absent, explicitly remove env vars injected by this operator
(filter by env var owner/marker and remove from each container/envFrom) and
persist the updated PodTemplate (e.g., handle deploy.Spec.Template,
statefulSet.Spec.Template, daemonSet.Spec.Template) so opting-out cleans up
proxy env vars instead of leaving them behind.

In `@pkg/controllers/installer/suite_test.go`:
- Around line 109-110: The SetupWithManager call in the test asserts a bare
success but lacks context on failure; modify the assertion around
SetupWithManager (the call that assigns _, err = SetupWithManager(mgr,
allProviderProfiles, triggerSource)) to include a descriptive failure message
(e.g., "failed to register installer controller with manager") by passing that
message to the Expect assertion so a BeforeSuite failure clearly points at
installer controller registration rather than surfacing only the raw error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 24a04a88-f1b2-4d76-a9cf-ecad951c51aa

📥 Commits

Reviewing files that changed from the base of the PR and between 05c113e and b269f09.

📒 Files selected for processing (7)
  • cmd/capi-operator/main.go
  • manifests-gen/customizations.go
  • pkg/controllers/installer/installer_controller.go
  • pkg/controllers/installer/proxy_controller.go
  • pkg/controllers/installer/suite_test.go
  • pkg/controllers/revision/helpers_test.go
  • pkg/util/proxy.go

Comment thread pkg/controllers/installer/proxy_controller.go Outdated
Comment thread pkg/controllers/installer/proxy_controller.go Outdated
Comment thread pkg/controllers/installer/suite_test.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controllers/installer/proxy_controller.go`:
- Around line 180-190: When the ProxyInjectAnnotation value changes (e.g. from
"manager,sidecar" to "manager") the current code only patches containers listed
in the new annotation, leaving proxy env vars in containers removed from the
annotation. Update the logic around ProxyInjectAnnotation handling so that after
computing containers := containerNamesFromAnnotation(annotation) you also
compute specContainers := containerNamesFromSpec(item.containers), compute
removed := specContainers minus containers, and call c.applyProxyVars twice:
once for the annotated containers with effectiveVars and once for the removed
containers with nil (to clear stale vars). Use the existing symbols
ProxyInjectAnnotation, containerNamesFromAnnotation, containerNamesFromSpec,
effectiveVars and applyProxyVars to locate and implement the fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 99e7c71d-f60d-4ac5-8223-4d95f98cd930

📥 Commits

Reviewing files that changed from the base of the PR and between b269f09 and 228d14c.

📒 Files selected for processing (2)
  • pkg/controllers/installer/proxy_controller.go
  • pkg/controllers/installer/suite_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/controllers/installer/suite_test.go

Comment thread pkg/controllers/installer/proxy_controller.go Outdated

@mdbooth mdbooth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think this is most of the way there. I've requested some code cleanup things like use of constants, using a fully qualified field owner, etc.

Big things:

Please can you put the new controller in its own package? This is mostly for consistency. We've got a bunch of controller, and they're all separated in their own package.

I'm pretty sure the SSA handling in the workitem loop isn't right. An SSA fully specifies everything owned by the field owner on the resource. Multiple SSAs with the same field owner will just overwrite each other. Fully construct the desired state (owned fields only) and submit a single SSA. Any which weren't specified will be removed automatically.

No tests. I think this would have highlighed the above. These should be envtests. We have a ton of examples, so Claude will write them for you in a minute.

I think the 'nothing to do' optimisation in applyProxyVars is important.

Comment thread manifests-gen/customizations.go Outdated
return serviceSecretNames
}

const proxyInjectAnnotation = "capi-operator.openshift.io/inject-proxy"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we've got a constant defined somewhere for capi-operator.openshift.io. Please can we use it here?

// container names that should receive HTTP_PROXY, HTTPS_PROXY, and NO_PROXY
// environment variables. The annotation is added by manifests-gen and can be
// overridden by provider manifests.
ProxyInjectAnnotation = "capi-operator.openshift.io/inject-proxy"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use this constant from manifests-gen? I can't remember which way round we configured the dependencies (note that manifests-gen is a separate module).

Comment on lines +230 to +231
apiVersion: "apps/v1",
kind: "Deployment",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-nit: I'd personally have pulled these from the item, too.

Don't change this unless you're particularly inspired to.

},
}

if err := c.client.Patch(ctx, patch, util.ApplyConfigPatch(patch.Object), client.FieldOwner(proxyFieldManager), client.ForceOwnership); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field owner needs to be qualified (i.e. capi-operator.../thingy). The domain should again use an existing constant. Depending on where the constant is currently, we might have to move it somewhere more sensible.

for _, containerName := range containerNames {
containerPatches = append(containerPatches, map[string]interface{}{
"name": containerName,
"env": envEntries,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd bet this could be simplified to just pass proxyVars directly here and get rid of the proxyEnvEntries function.

Comment on lines +179 to +186
if !hasAnnotation {
// No annotation: clear any stale proxy env vars from all containers.
if err := c.applyProxyVars(ctx, item.apiVersion, item.kind, item.name, item.namespace, allContainers, nil); err != nil {
errs = append(errs, err)
}

continue
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant. If the annotation is not set, all we need is for containerNamesFromAnnotation below to return empty targets and any env vars previously set by us will be cleared when we construct an empty SSA.

Comment on lines +289 to +291
if len(containerNames) == 0 {
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic. We need to be able to submit an empty transaction.

// it applies an empty env list, which removes any env var entries previously
// owned by the proxy controller's SSA field manager. Only the three proxy env
// var entries are owned; all other fields remain owned by boxcutter.
func (c *ProxyController) applyProxyVars(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs an optimisation to prevent a potential apply storm during installation. The controller triggers on modification to any object, but for each modified object reconciles the Proxy object and iterates over all objects. We've got a bit of matrix amplification going here.

That's ok, but we're also unconditionally submitting an SSA for every workitem on every reconcile, even if it's blank. That's not very nice to KAS.

The typed apply configuration objects have an ExtractInto function which will extract all fields managed by a particular field owner. I wonder if an equivalent exists for Unstructured?

We could:

  • construct patch as we're already doing below
  • ExtractInto a new unstructured with the same field owner
  • If patch deepequals existing, I think we can skip it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why these changes were required? Doesn't look like we should be touching these.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was likely a CodeRabbit comment; it ought to be revertable.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the BindAddress was changed to explicitly be "0" so that a port can be randomly selected, and subsequently allowing more parallelization of tests.

Comment on lines -65 to -66
// Arbitrarily chosen insecure default tls configuration for tests
config.CipherSuites = []uint16{tls.TLS_RSA_WITH_RC4_128_SHA}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially shouldn't be changing these. Was this code rabbit thinking deliberately insecure test defaults are a security issue by any chance?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was. Since the test have two stages; I couldn't tell if it passed or not. :(

@tmshort tmshort force-pushed the fix-OCPCLOUD-3465 branch from a1d05f6 to 724bbbd Compare June 16, 2026 20:20
@tmshort

tmshort commented Jun 16, 2026

Copy link
Copy Markdown
Author

/test vendor

@tmshort

tmshort commented Jun 17, 2026

Copy link
Copy Markdown
Author

/test e2e-aws-capi-techpreview

Test at least one of the e2es.

@tmshort tmshort force-pushed the fix-OCPCLOUD-3465 branch from 724bbbd to b095b94 Compare June 18, 2026 14:51
@tmshort

tmshort commented Jun 18, 2026

Copy link
Copy Markdown
Author

rebased

@tmshort

tmshort commented Jun 18, 2026

Copy link
Copy Markdown
Author

/test e2e-aws-capi-techpreview

Test at least one of the e2es.

@mdbooth mdbooth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I asked you to move the proxy controller into a separate package same as we've done for the other controller, but I'm questioning that. I was wondering why you didn't also move the tests: it's because you're using the installer controller to create managed resources for the tests.

This wasn't what I expected, but I think I like it better than what I expected: the proxy controller explicitly only ever operates on resources managed by the installer controller. Integrating those in the integration tests absolute makes sense (to me, but I know this is not a universally popular view).

How can we make that relationship more explicit in the structure? For crdcompatibility we put related controllers in sub-packages. Suggest we also do that here? I think I'd leave the tests where they are for now.

I've asked for consideration of another SSA thing (UnstructuredExtractor): either a fix or at least a comment. And either way some test coverage of the case.

Comment on lines +54 to +56
// proxyVarNames are the only env var names this controller ever owns.
// Used for the skip-if-unchanged optimisation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale?

Comment on lines +111 to +117
Watches(&appsv1.Deployment{}, handler.EnqueueRequestsFromMapFunc(toProxy),
builder.WithPredicates(isManagedWorkload)).
Watches(&appsv1.DaemonSet{}, handler.EnqueueRequestsFromMapFunc(toProxy),
builder.WithPredicates(isManagedWorkload)).
Watches(&appsv1.StatefulSet{}, handler.EnqueueRequestsFromMapFunc(toProxy),
builder.WithPredicates(isManagedWorkload)).
Complete(c)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think is right. I think we need to use the tracking cache here for 2 reasons:

  • This will start new D/DS/SS informers rather than using the tracking cache's existing watches, so it uses more memory/KAS capacity
  • This will be subject to the DefaultNamespace cache options we configured in the entrypoint, so we won't watch anything the provider installs outside of the default namespaces

On the latter point, although we expect providers to install things in the default namespace I don't think we want or need to hardcode this requirement.

Comment on lines +89 to +101
// Watch only managed workloads (those labelled by the installer).
// Using standard controller-runtime Watches instead of trackingCache.Source()
// avoids registering a handler on the TrackingCache, which prevents it from
// stopping cleanly and can leave orphaned envtest processes.
managedReq, err := labels.NewRequirement(revisiongenerator.ManagedLabelKey, selection.Exists, nil)
if err != nil {
return fmt.Errorf("building managed label requirement: %w", err)
}

isManagedWorkload := predicate.NewPredicateFuncs(func(obj client.Object) bool {
return labels.NewSelector().Add(*managedReq).Matches(labels.Set(obj.GetLabels()))
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the tracking cache directly I think we don't need (or want) this.

}

// Build the set of containers that should receive proxy env vars.
annotatedSet := make(map[string]struct{})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Set[string] is idiomatic in this codebase

Comment on lines +372 to +381
} else {
// Not annotated, or annotated but proxy was cleared: none of our var
// names may be present (covers both "never had proxy" and "proxy removed").
for _, name := range proxyVarNames() {
if _, ok := envByName[name]; ok {
return false
}
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite right, because it asserts that unannotated containers do not have proxy vars. They could have proxy vars which aren't managed by us. I think UnstructuredExtractor could be the answer here. Just be careful to create the Extractor 'globally' e.g. in SetupWithManager, because it has its own discovery cache.

That said, while my preference is to fix this, it should only result in 1 additional reconcile because the resulting SSA will still be correct and will not result in further reconciles. So if the fix starts looking too complex may just add a comment here explaining that it's not quite right and why the impact of that is contained.

}, defaultNodeTimeout)
})

Context("SSA field ownership correctness", func() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add another test here: does not affect proxy vars set on un-annotated containers.

So: add some proxy vars to the other container with custom values unrelated to the cluster configuration. Annotate the deployment to manage the manager container. Proxy vars should be set on manager and left alone (i.e. not removed) on other.

@mdbooth

mdbooth commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

I've copied an additional clanker review below. I agree with most of that, although I wonder if 'Only uppercase proxy env vars' should be major.

Blocking

None. The feature works correctly on the primary happy paths (full proxy enable/disable/delete).

Major Issues

1. proxyVarsAlreadyApplied() allows stale env vars on partial proxy reconfiguration

https://github.com/openshift/cluster-capi-operator/blob/b269f09a1bae68269e132911e56dea251c83138b/pkg/controllers/proxy
/proxy_controller.go#L365-L380

When a container is annotated and proxyVars is non-empty, the function only checks that the desired vars are present
and correct. It does not check that vars removed from proxyVars are absent.

Scenario: Admin previously had HTTPProxy, HTTPSProxy, and NoProxy all configured (three env vars injected). Admin then
clears only NoProxy from the Proxy status. GetProxyEnvVars() returns two vars. proxyVarsAlreadyApplied() sees both
match and returns true, skipping the apply. Stale NO_PROXY remains on the container indefinitely.

Fix: In the annotated+non-empty branch, additionally verify that no other proxy var names from proxyVarNames() are
present unless they appear in proxyVars:

  desiredNames := make(map[string]struct{}, len(proxyVars))
  for _, pv := range proxyVars {
      desiredNames[pv.Name] = struct{}{}
  }
  for _, name := range proxyVarNames() {
      if _, desired := desiredNames[name]; !desired {
          if _, present := envByName[name]; present {
              return false
          }
      }
  }

2. Test coverage gaps for important behaviour paths

The 6 integration tests cover the core flows well but miss:

  • Proxy URL update (change value, not just create/delete) — regressions in value comparison go undetected
  • Partial proxy field removal (clear one of three) — would not catch Major Add owners file #1 above
  • Multi-container annotation (manager,sidecar) — supported by containerNamesFromAnnotation but untested
  • DaemonSet / StatefulSet workloads — listed in collectWorkloads but only Deployment tested

Additionally, the haveProxyEnvVars() matcher only checks env var names, not values:

https://github.com/openshift/cluster-capi-operator/blob/b269f09a1bae68269e132911e56dea251c83138b/pkg/controllers/insta
ller/proxy_controller_test.go#L100-L107

A proxy URL update test would pass even if the values are wrong. Consider matching both name and value:

  func haveProxyEnvVarsWithURL(url string) OmegaMatcher {
      return HaveField("Env", ContainElements(
          SatisfyAll(HaveField("Name", "HTTP_PROXY"), HaveField("Value", url)),
          SatisfyAll(HaveField("Name", "HTTPS_PROXY"), HaveField("Value", url)),
          // ...
      ))
  }

3. No unit tests for SSA helpers in pkg/controllers/proxy/

buildPatch, proxyVarsAlreadyApplied, and containerNamesFromAnnotation are non-trivial pure functions with edge cases.
They are only reachable through slow envtests in pkg/controllers/installer/. Table-driven unit tests in
pkg/controllers/proxy/ would:

  • Make SSA semantics easier to review and maintain
  • Run much faster than envtests
  • Cover edge cases (empty annotations, whitespace-only values, ValueFrom env vars)

Minor Issues

4. Only uppercase proxy env vars

Some Go HTTP clients (and libraries like net/http's ProxyFromEnvironment) check lowercase http_proxy first. The
OpenShift build API documents both forms. Uppercase-only probably works for current CAPI provider controllers, but
injecting both would be more defensive. Worth a deliberate decision comment if staying uppercase-only.

5. EnvVar.ValueFrom not handled by skip check

In proxyVarsAlreadyApplied, only Value is indexed. If another field manager sets HTTP_PROXY via ValueFrom (e.g. a
SecretKeyRef), the skip check sees a mismatch and applies with ForceOwnership, replacing the ValueFrom. This is
probably fine in practice (nobody uses ValueFrom for proxy URLs), but worth a comment noting the assumption.

6. Stale doc reference in installer_controller.go

The updated SetupWithManager godoc references SetupProxyController, but the actual function is proxy.SetupWithManager.

7. manifests-gen domain constant duplication

Duplicated because manifests-gen is a separate Go module. The comment explaining this is good. No great alternative
exists without creating a tiny shared module, but a CI lint check that the strings match would reduce drift risk.

Nits

  • proxyVarNames() allocates on every call — could be a package-level var. Cost is negligible but the allocation is
    unnecessary.
  • Info-level logging on skip may be noisy during installation — many managed objects trigger proxy reconciliation
    during initial install. Consider V(1) for the "skipping apply" log path.

tmshort and others added 10 commits June 26, 2026 14:50
Add a ProxyController to the capi-installer manager that watches managed
workloads opting into proxy injection via the new annotation
  capi-operator.openshift.io/inject-proxy: <container,...>
and applies HTTP_PROXY/HTTPS_PROXY/NO_PROXY environment variables to the
named containers via Server-Side Apply whenever the cluster-wide Proxy CR
changes.

Design (Option B from OCPCLOUD-3465 Jira):
- A dedicated controller in the capi-installer manager watches the Proxy
  singleton (proxies/cluster) and all managed objects via the existing
  trackingCache, so it never creates a second informer for those objects.
- Env vars are applied with a separate SSA field manager (capi-operator-proxy)
  that owns only the three proxy env var entries per container. Boxcutter
  continues to own all other fields on the same workload object without
  conflict.
- Proxy changes propagate to running pods independently of the revision
  lifecycle, including when the installer has fallen back to an older
  revision (N-1) because the newest revision failed to roll out.
- Starting with Option B avoids a future field-ownership migration: if
  Option A (inject at render time) had been deployed first, converting to
  Option B later would require force-stealing SSA ownership from boxcutter.

The annotation is added to provider Deployments by manifests-gen (see next
commit). Providers that use different container names or need proxy injection
on additional containers should set the annotation in their upstream manifests.

SetupWithManager now returns the TrackingCache so it can be shared with
SetupProxyController without creating a second managed-object cache.

Refs OCPCLOUD-3465

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
…or new SetupWithManager signature

- Add Metrics.BindAddress="0" to revision controller test manager so it
  binds to a random port instead of :8080, eliminating port conflicts when
  consecutive test runs start new managers before the previous one releases
  the port. Matches the pattern already used by every other controller test
  suite in this repo.
- Replace RC4 (TLS_RSA_WITH_RC4_128_SHA / TLS 1.0) with
  TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 / TLS 1.2 in the default test TLS
  config to avoid the No-Weak-Crypto CI check.
- Update installer suite_test.go to handle the new (TrackingCache, error)
  return type of SetupWithManager.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
Clear stale proxy env vars on annotation removal: previously, workloads
that lost the inject-proxy annotation kept proxy env vars owned by the
capi-operator-proxy SSA field manager indefinitely. Now when the annotation
is absent the controller applies an empty env list to all containers in the
pod spec, which removes any previously-owned HTTP_PROXY/HTTPS_PROXY/NO_PROXY
entries. Objects that never had the annotation are unaffected (no ownership
to clear).

Continue reconciling remaining workloads after per-object patch errors:
the three separate applyTo* functions (which also caused a duplication lint
failure) are now merged into a single applyToAllWorkloads+collectWorkloads
pair. Errors are accumulated per-object and joined rather than short-
circuiting the rest of the reconcile loop.

Add assertion message to SetupWithManager in suite_test.go so a BeforeSuite
failure names the failing step clearly.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
…ation

The previous fix only cleaned up stale proxy env vars when the annotation
was entirely removed. If the annotation changed from "manager,sidecar" to
"manager", the sidecar container kept HTTP_PROXY/HTTPS_PROXY/NO_PROXY owned
by capi-operator-proxy indefinitely.

Now for each workload we split containers into two groups:
- Containers named in the annotation: proxy vars applied (or refreshed)
- Containers in the spec but NOT in the annotation: empty env applied,
  removing our SSA field-manager ownership of any proxy vars we set before

This covers three cases cleanly:
- Annotation absent: all containers cleared (existing behaviour)
- Annotation present, unchanged: annotated get vars, non-annotated get
  cleared (no-op if we never owned their env vars)
- Annotation shrinks: dropped containers get cleared, remaining get vars

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
SSA field manager correctness:
The previous implementation sent two separate SSA applies from the same
field manager per workload (one for annotated containers, one for stale
containers). In Kubernetes SSA semantics, each apply from a field manager
is the *complete* desired state: the second apply would cancel the ownership
established by the first. The fix submits a single apply covering ALL
containers in the spec. Annotated containers receive proxy env vars;
non-annotated containers receive an empty env list, which clears any proxy
env vars previously owned by our field manager. As a result the separate
"stale" handling, the redundant !hasAnnotation fast path, and the early
return on empty containerNames are all removed.

Use existing domain constant for annotation and field manager:
- ProxyInjectAnnotation now uses operatorstatus.CAPIOperatorIdentifierDomain
  so the string "capi-operator.openshift.io" is only defined once in the
  main module.
- The SSA field manager now uses operatorstatus.CAPIFieldOwner("proxy"),
  which produces the qualified name "capi-operator.openshift.io/proxy".
- manifests-gen defines a local capiOperatorDomain constant (manifests-gen
  is a standalone module and cannot import from operatorstatus).

Remove proxyEnvEntries helper: env map construction is now inlined inside
applyProxyVars, removing the unnecessary indirection.

Revert inadvertent cipher change in helpers_test.go: the RC4/TLS1.0 default
was intentionally insecure for tests and should not have been changed. Only
the Metrics.BindAddress: "0" fix (needed to prevent port conflicts between
consecutive test runs) is retained.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
Add six integration tests that run through the complete proxy injection
lifecycle against a real envtest API server:

1. inject proxy env vars into the annotated "manager" container when the
   Proxy CR is configured, while leaving the unannotated "other" container
   unaffected — this validates the single-SSA-apply correctness (if the
   previous two-call design had been used, the second apply would have
   wiped the first container's vars)
2. clear proxy env vars when the Proxy CR is deleted
3. clear proxy env vars when the Proxy CR is reconfigured to empty values
4. do NOT inject env vars into any container when the inject-proxy
   annotation is absent — even with a configured proxy
5. clear env vars after the inject-proxy annotation is removed via a new
   revision (annotation change propagated through the installer)
6. preserve env vars set by other field managers alongside the proxy vars

Infrastructure changes:
- ProxyController is wired into the installer suite manager alongside the
  InstallerController, sharing the existing TrackingCache (SetupWithManager
  now returns it so SetupProxyController can reuse it)
- Two new provider profiles (proxy-annotated, proxy-not-annotated) provide
  a two-container Deployment that exercises both annotated and plain
  containers in the same object
- markDeploymentAvailable() patches the Deployment status to Available=True
  using komega.UpdateStatus so the installer's deploymentAvailableProbe
  passes and the revision is marked complete
- Proxy CRD added to the shared envtest setup (pkg/test/envtest.go)
- AfterSuite manager-stop wait changed from Eventually to a select with a
  30s timeout so that tracking-cache drain delays under heavy parallel load
  do not fail the suite

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
…troller

Registering a handler on the TrackingCache via trackingCache.Source() prevents
the cache from stopping cleanly when the manager context is cancelled. This
caused mgr.Start() to hang indefinitely in tests, which in turn prevented
test.StopEnvTest() from running, leaving orphaned etcd and kube-apiserver
processes after every make unit invocation.

Replace the WatchesRawSource(c.trackingCache.Source(...)) registration with
standard controller-runtime Watches on Deployment, DaemonSet, and StatefulSet,
filtered to objects carrying the ManagedLabelKey. The manager's own cache
handles these watches and shuts down cleanly with the context. The listing step
in collectWorkloads still uses the TrackingCache (correct, since it only caches
managed objects), but the event delivery no longer depends on it.

Also revert the AfterSuite workaround (select with 30s timeout) back to the
original Eventually now that the root cause is fixed.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
…ged optimisation

Move pkg/controllers/installer/proxy_controller.go to
pkg/controllers/proxy/proxy_controller.go for consistency with the rest of
the codebase where each controller lives in its own package.
The type is renamed from ProxyController to Controller and the entry point
from SetupProxyController to SetupWithManager, following the existing
convention (e.g. revision.RevisionController / revision.SetupWithManager).

Add the "skip if nothing to do" optimisation that mdbooth requested.
Before submitting the SSA apply, proxyVarsAlreadyApplied() compares the
desired proxy state (derived from the annotation + current Proxy CR) against
what the tracking cache reports as the current container state:

  - annotated container with proxy configured: all desired vars must be
    present with the correct values
  - annotated container with proxy cleared, or unannotated container: none
    of the proxy var names (HTTP_PROXY / HTTPS_PROXY / NO_PROXY) may be
    present

If the state already matches, the SSA apply is skipped entirely, avoiding
unnecessary KAS round-trips during installation when every newly-applied
managed object triggers a ProxyController reconcile.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
After rebasing onto main, the installer/revision/proxy controllers moved
from cmd/capi-operator to the new cmd/capi-installer binary (PR openshift#581).
Wire the proxy controller setup into the new binary and reset
cmd/capi-operator/main.go to the upstream version (which now only runs
InstallerDeploymentController).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
Use trackingCache.Source() via sub-reconciler pattern:
The ProxyController no longer registers as a standalone controller.
Instead it becomes a sub-reconciler called directly from the
InstallerController's Reconcile(), so both share a single
trackingCache.Source() registration and avoid the double-registration
shutdown issue that previously caused orphaned envtest processes.
The InstallerController gains a Proxy CR watch so that proxy changes
still trigger reconciliation.

Use UnstructuredExtractor for the skip-if-unchanged optimisation:
The previous proxyVarsAlreadyApplied() checked the full container env,
which caused an infinite reconcile loop when another field manager set
proxy var names on an unannotated container (the check would keep seeing
those vars and triggering applies, but SSA would never remove them since
they're owned by a different manager). The new check uses
UnstructuredExtractor to compare only what our field manager owns against
the desired patch, which is both correct and efficient.

Use sets.Set[string] for annotatedSet (idiomatic in this codebase).

Remove stale orphaned comment about proxyVarNames.

Add test: verify proxy vars set by other field managers on unannotated
containers are not removed by our controller's SSA apply.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
@tmshort tmshort force-pushed the fix-OCPCLOUD-3465 branch from b095b94 to 26a862e Compare June 26, 2026 19:01
@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@tmshort: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants