Add TelemetryConfigRef support to MCPRemoteProxy#4719
Add TelemetryConfigRef support to MCPRemoteProxy#4719ChrisJBurns wants to merge 6 commits intomainfrom
Conversation
Brings MCPRemoteProxy to parity with MCPServer's telemetry API by adding a TelemetryConfigRef field for referencing shared MCPTelemetryConfig resources. Includes CEL mutual exclusivity validation with the deprecated inline Telemetry field, TelemetryConfigHash in status for change detection, and condition type/reason constants. Part of #4620 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Namespace-scoped fetch for MCPTelemetryConfig referenced by an MCPRemoteProxy. Returns (nil, nil) when the ref is nil or the resource is not found, matching the MCPServer getTelemetryConfigForMCPServer contract. Part of #4620 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…roxy Wire up the reconciler to validate referenced MCPTelemetryConfig resources, track config hashes in status, and reconcile when the underlying MCPTelemetryConfig changes. Follows the same handler pattern as handleToolConfig and the MCPServer telemetry handler. Part of #4620 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Config When building the RunConfig, resolve telemetry from the referenced MCPTelemetryConfig first and fall back to the deprecated inline Telemetry field. Adds ctx parameter to createRunConfigFromMCPRemoteProxy to support the API fetch, matching the MCPServer runconfig pattern. Part of #4620 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-generated from MCPRemoteProxy type changes: adds telemetryConfigRef to CRD schema with CEL mutual exclusivity validation, telemetryConfigHash to status, and updated API reference docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract resolveToolConfig and addTelemetryOptions from createRunConfigFromMCPRemoteProxy to bring cyclomatic complexity below the threshold. Rename shadowed ctx to apiCtx for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4719 +/- ##
==========================================
+ Coverage 68.71% 68.73% +0.02%
==========================================
Files 517 517
Lines 54817 54936 +119
==========================================
+ Hits 37666 37759 +93
- Misses 14252 14274 +22
- Partials 2899 2903 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jhrozek
left a comment
There was a problem hiding this comment.
Code review from automated analysis. Five findings — three require changes in files not touched by this PR (prose below), two have inline comments on the diff.
[HIGH] Missing +kubebuilder:rbac annotation for mcptelemetryconfigs
File: cmd/thv-operator/controllers/mcpremoteproxy_controller.go (around line 48–51)
The MCPRemoteProxyReconciler now lists and gets MCPTelemetryConfig objects (in mapTelemetryConfigToMCPRemoteProxy and via GetTelemetryConfigForMCPRemoteProxy), but there is no +kubebuilder:rbac marker for mcptelemetryconfigs. task operator-manifests generates the ClusterRole from these markers — without this one, the controller gets permission-denied errors at runtime the first time any proxy uses TelemetryConfigRef.
MCPServerReconciler already has:
// +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcptelemetryconfigs,verbs=get;list;watchAdd the same line to MCPRemoteProxyReconciler alongside the existing mcptoolconfigs/mcpoidcconfigs markers.
[HIGH] MCPTelemetryConfigReconciler.findReferencingWorkloads does not scan MCPRemoteProxy
File: cmd/thv-operator/controllers/mcptelemetryconfig_controller.go line 259 (not changed in this PR)
findReferencingWorkloads only calls FindWorkloadRefsFromMCPServers. It never looks at MCPRemoteProxy, so:
- Deletion protection is broken: an operator can delete an
MCPTelemetryConfigreferenced only by a proxy — the finalizer check passes. status.referencingWorkloadswill never include proxy names.- The reverse-direction watch (proxy changes → re-evaluate
MCPTelemetryConfig) is also missing fromSetupWithManager.
MCPOIDCConfigReconciler and MCPExternalAuthConfigReconciler were both extended to include MCPRemoteProxy when those ref fields were added. The same pattern needs to be applied here.
[HIGH] Missing CA bundle volume mount for TelemetryConfigRef in proxy deployment
File: cmd/thv-operator/controllers/mcpremoteproxy_deployment.go (buildVolumesForProxy, not changed in this PR)
addTelemetryOptions correctly calls ctrlutil.TelemetryCABundleFilePath(telCfg) and embeds the path in the RunConfig. At pod startup the proxy runner opens that path via os.ReadFile to build a custom TLS config for the OTLP collector.
However, buildVolumesForProxy never calls ctrlutil.AddTelemetryCABundleVolumes. The ConfigMap is never mounted into the pod. Any user who sets spec.openTelemetry.caBundleRef in their MCPTelemetryConfig will get a file-not-found error at proxy startup.
MCPServerReconciler.deploymentForMCPServer already handles this correctly — the same block needs to be added to deploymentForMCPRemoteProxy.
| } | ||
|
|
||
| if tt.expectNoCondition { | ||
| for _, c := range tt.proxy.Status.Conditions { |
There was a problem hiding this comment.
[HIGH] Test assertions read in-memory state, not persisted state
All assertions in this loop read from tt.proxy.Status.* — the same pointer that meta.SetStatusCondition / meta.RemoveStatusCondition already mutated before r.Status().Update() was called. These assertions would pass even if the Status().Update() call were deleted entirely. WithStatusSubresource is correctly set up but has no practical effect because fakeClient.Get is never called.
The pattern from TestMCPServerReconciler_handleOIDCConfig_ConditionPersistedOnRecovery is the right model: after calling the handler, re-fetch and assert on the persisted object:
persisted := &mcpv1alpha1.MCPRemoteProxy{}
require.NoError(t, fakeClient.Get(ctx, types.NamespacedName{
Name: tt.proxy.Name, Namespace: tt.proxy.Namespace,
}, persisted))
// assert on persisted.Status.* instead of tt.proxy.Status.*While fixing this, also add a recovery-from-False scenario: proxy has a hash matching the config's hash but a stale False condition — assert the True condition is persisted. This covers the needsUpdate logic at line 714 which is currently untested.
|
|
||
| if proxy.Spec.TelemetryConfigRef == nil { | ||
| // No MCPTelemetryConfig referenced, clear any stored hash | ||
| meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) |
There was a problem hiding this comment.
[MEDIUM] Condition removal not persisted when TelemetryConfigHash is already empty
When TelemetryConfigRef is removed from a proxy whose hash is already "" (e.g., a stale False condition remains from a prior error), meta.RemoveStatusCondition modifies the in-memory slice but r.Status().Update is skipped. The controller re-fetches on the next reconcile, discarding the change — the stale condition stays in the cluster.
| meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) | |
| if proxy.Spec.TelemetryConfigRef == nil { | |
| // No MCPTelemetryConfig referenced, clear any stored hash and condition. | |
| condRemoved := meta.FindStatusCondition(proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) != nil | |
| meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) | |
| if condRemoved || proxy.Status.TelemetryConfigHash != "" { | |
| proxy.Status.TelemetryConfigHash = "" | |
| if err := r.Status().Update(ctx, proxy); err != nil { | |
| return fmt.Errorf("failed to clear MCPTelemetryConfig hash from status: %w", err) | |
| } | |
| } | |
| return nil | |
| } |
| } | ||
|
|
||
| if tt.expectNoCondition { | ||
| for _, c := range tt.proxy.Status.Conditions { |
There was a problem hiding this comment.
[HIGH] Test assertions read in-memory state, not persisted state
All assertions here read from tt.proxy.Status.* — the same pointer that meta.SetStatusCondition / meta.RemoveStatusCondition already mutated before r.Status().Update() was called. These assertions would pass even if the Status().Update() call were deleted entirely. WithStatusSubresource is correctly set up but has no practical effect because fakeClient.Get is never called to verify what was actually persisted.
The pattern from TestMCPServerReconciler_handleOIDCConfig_ConditionPersistedOnRecovery is the right model — re-fetch from fakeClient after the handler call and assert on the result:
persisted := &mcpv1alpha1.MCPRemoteProxy{}
require.NoError(t, fakeClient.Get(ctx, types.NamespacedName{
Name: tt.proxy.Name, Namespace: tt.proxy.Namespace,
}, persisted))
// assert on persisted.Status.* instead of tt.proxy.Status.*While fixing this, also add a recovery-from-False scenario: proxy has a hash matching the config's hash but a stale False condition — assert the True condition is persisted after the handler runs. This covers the needsUpdate logic at line 714 which is currently untested.
|
|
||
| if proxy.Spec.TelemetryConfigRef == nil { | ||
| // No MCPTelemetryConfig referenced, clear any stored hash | ||
| meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) |
There was a problem hiding this comment.
[MEDIUM] Condition removal not persisted when TelemetryConfigHash is already empty
When TelemetryConfigRef is removed from a proxy whose hash is already "" (e.g., a stale False condition remains from a prior error cycle), meta.RemoveStatusCondition modifies the in-memory slice but r.Status().Update is skipped. The controller re-fetches from the API server on the next reconcile, discarding the in-memory change — the stale condition stays in the cluster indefinitely.
| meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) | |
| if proxy.Spec.TelemetryConfigRef == nil { | |
| // No MCPTelemetryConfig referenced, clear any stored hash and condition. | |
| condRemoved := meta.FindStatusCondition(proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) != nil | |
| meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) | |
| if condRemoved || proxy.Status.TelemetryConfigHash != "" { | |
| proxy.Status.TelemetryConfigHash = "" | |
| if err := r.Status().Update(ctx, proxy); err != nil { | |
| return fmt.Errorf("failed to clear MCPTelemetryConfig hash from status: %w", err) | |
| } | |
| } | |
| return nil | |
| } |
Summary
MCPRemoteProxy lacked the ability to reference shared
MCPTelemetryConfigresources, forcing users to duplicate inline telemetry configuration across every proxy instance. This brings MCPRemoteProxy to parity with MCPServer's telemetry API by addingTelemetryConfigRefsupport.Closes #4620
Type of change
Changes
cmd/thv-operator/api/v1alpha1/mcpremoteproxy_types.goTelemetryConfigReffield with CEL mutual exclusivity validation, deprecate inlineTelemetry, addTelemetryConfigHashstatus field and condition constantscmd/thv-operator/pkg/controllerutil/config.goGetTelemetryConfigForMCPRemoteProxy()helper — namespace-scoped fetch returning (nil, nil) for not-foundcmd/thv-operator/controllers/mcpremoteproxy_controller.gohandleTelemetryConfig()handler, call invalidateAndHandleConfigs(), addmapTelemetryConfigToMCPRemoteProxy()watch mapper, registerMCPTelemetryConfigwatch inSetupWithManager()cmd/thv-operator/controllers/mcpremoteproxy_runconfig.goTelemetryConfigRefover inlineTelemetryin RunConfig; extractaddTelemetryOptionsandresolveToolConfighelpers to fix cyclomatic complexityTest plan
task lint-fixpasses (0 issues)task testpasses (all operator unit tests)go build ./...passesTestGetTelemetryConfigForMCPRemoteProxy(4 cases),TestHandleTelemetryConfig_MCPRemoteProxy(5 cases),TestMapTelemetryConfigToMCPRemoteProxyTestCreateRunConfigFromMCPRemoteProxytests still pass with updatedctxparameterSpecial notes for reviewers
resolveToolConfigextraction inmcpremoteproxy_runconfig.gois a refactor needed to keep cyclomatic complexity under the lint threshold (was already at 15 pre-change). No behavioral change.Generated with Claude Code