Skip to content

Add TelemetryConfigRef support to MCPRemoteProxy#4719

Open
ChrisJBurns wants to merge 6 commits intomainfrom
chrisburns/mcpremoteproxy-telemetry-config-ref
Open

Add TelemetryConfigRef support to MCPRemoteProxy#4719
ChrisJBurns wants to merge 6 commits intomainfrom
chrisburns/mcpremoteproxy-telemetry-config-ref

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

Summary

MCPRemoteProxy lacked the ability to reference shared MCPTelemetryConfig resources, forcing users to duplicate inline telemetry configuration across every proxy instance. This brings MCPRemoteProxy to parity with MCPServer's telemetry API by adding TelemetryConfigRef support.

Closes #4620

Type of change

  • New feature

Changes

File Change
cmd/thv-operator/api/v1alpha1/mcpremoteproxy_types.go Add TelemetryConfigRef field with CEL mutual exclusivity validation, deprecate inline Telemetry, add TelemetryConfigHash status field and condition constants
cmd/thv-operator/pkg/controllerutil/config.go Add GetTelemetryConfigForMCPRemoteProxy() helper — namespace-scoped fetch returning (nil, nil) for not-found
cmd/thv-operator/controllers/mcpremoteproxy_controller.go Add handleTelemetryConfig() handler, call in validateAndHandleConfigs(), add mapTelemetryConfigToMCPRemoteProxy() watch mapper, register MCPTelemetryConfig watch in SetupWithManager()
cmd/thv-operator/controllers/mcpremoteproxy_runconfig.go Prefer TelemetryConfigRef over inline Telemetry in RunConfig; extract addTelemetryOptions and resolveToolConfig helpers to fix cyclomatic complexity

Test plan

  • task lint-fix passes (0 issues)
  • task test passes (all operator unit tests)
  • go build ./... passes
  • New unit tests: TestGetTelemetryConfigForMCPRemoteProxy (4 cases), TestHandleTelemetryConfig_MCPRemoteProxy (5 cases), TestMapTelemetryConfigToMCPRemoteProxy
  • Existing TestCreateRunConfigFromMCPRemoteProxy tests still pass with updated ctx parameter

Special notes for reviewers

  • The resolveToolConfig extraction in mcpremoteproxy_runconfig.go is a refactor needed to keep cyclomatic complexity under the lint threshold (was already at 15 pre-change). No behavioral change.
  • CEL validation is tested via the generated CRD manifest; integration test coverage for MCPRemoteProxy CEL rules can follow in a separate PR.
  • VirtualMCPServer telemetry config ref is out of scope (separate issue).

Generated with Claude Code

ChrisJBurns and others added 6 commits April 9, 2026 23:56
Brings MCPRemoteProxy to parity with MCPServer's telemetry API by adding
a TelemetryConfigRef field for referencing shared MCPTelemetryConfig
resources. Includes CEL mutual exclusivity validation with the deprecated
inline Telemetry field, TelemetryConfigHash in status for change
detection, and condition type/reason constants.

Part of #4620

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Namespace-scoped fetch for MCPTelemetryConfig referenced by an
MCPRemoteProxy. Returns (nil, nil) when the ref is nil or the resource
is not found, matching the MCPServer getTelemetryConfigForMCPServer
contract.

Part of #4620

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…roxy

Wire up the reconciler to validate referenced MCPTelemetryConfig
resources, track config hashes in status, and reconcile when the
underlying MCPTelemetryConfig changes. Follows the same handler pattern
as handleToolConfig and the MCPServer telemetry handler.

Part of #4620

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Config

When building the RunConfig, resolve telemetry from the referenced
MCPTelemetryConfig first and fall back to the deprecated inline
Telemetry field. Adds ctx parameter to createRunConfigFromMCPRemoteProxy
to support the API fetch, matching the MCPServer runconfig pattern.

Part of #4620

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-generated from MCPRemoteProxy type changes: adds
telemetryConfigRef to CRD schema with CEL mutual exclusivity validation,
telemetryConfigHash to status, and updated API reference docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract resolveToolConfig and addTelemetryOptions from
createRunConfigFromMCPRemoteProxy to bring cyclomatic complexity below
the threshold. Rename shadowed ctx to apiCtx for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/L Large PR: 600-999 lines changed label Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 65.71429% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.73%. Comparing base (d851c69) to head (ec43d0b).

Files with missing lines Patch % Lines
...-operator/controllers/mcpremoteproxy_controller.go 68.18% 22 Missing and 6 partials ⚠️
...v-operator/controllers/mcpremoteproxy_runconfig.go 53.84% 12 Missing and 6 partials ⚠️
cmd/thv-operator/pkg/controllerutil/config.go 84.61% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4719      +/-   ##
==========================================
+ Coverage   68.71%   68.73%   +0.02%     
==========================================
  Files         517      517              
  Lines       54817    54936     +119     
==========================================
+ Hits        37666    37759      +93     
- Misses      14252    14274      +22     
- Partials     2899     2903       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review from automated analysis. Five findings — three require changes in files not touched by this PR (prose below), two have inline comments on the diff.


[HIGH] Missing +kubebuilder:rbac annotation for mcptelemetryconfigs

File: cmd/thv-operator/controllers/mcpremoteproxy_controller.go (around line 48–51)

The MCPRemoteProxyReconciler now lists and gets MCPTelemetryConfig objects (in mapTelemetryConfigToMCPRemoteProxy and via GetTelemetryConfigForMCPRemoteProxy), but there is no +kubebuilder:rbac marker for mcptelemetryconfigs. task operator-manifests generates the ClusterRole from these markers — without this one, the controller gets permission-denied errors at runtime the first time any proxy uses TelemetryConfigRef.

MCPServerReconciler already has:

// +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcptelemetryconfigs,verbs=get;list;watch

Add the same line to MCPRemoteProxyReconciler alongside the existing mcptoolconfigs/mcpoidcconfigs markers.


[HIGH] MCPTelemetryConfigReconciler.findReferencingWorkloads does not scan MCPRemoteProxy

File: cmd/thv-operator/controllers/mcptelemetryconfig_controller.go line 259 (not changed in this PR)

findReferencingWorkloads only calls FindWorkloadRefsFromMCPServers. It never looks at MCPRemoteProxy, so:

  1. Deletion protection is broken: an operator can delete an MCPTelemetryConfig referenced only by a proxy — the finalizer check passes.
  2. status.referencingWorkloads will never include proxy names.
  3. The reverse-direction watch (proxy changes → re-evaluate MCPTelemetryConfig) is also missing from SetupWithManager.

MCPOIDCConfigReconciler and MCPExternalAuthConfigReconciler were both extended to include MCPRemoteProxy when those ref fields were added. The same pattern needs to be applied here.


[HIGH] Missing CA bundle volume mount for TelemetryConfigRef in proxy deployment

File: cmd/thv-operator/controllers/mcpremoteproxy_deployment.go (buildVolumesForProxy, not changed in this PR)

addTelemetryOptions correctly calls ctrlutil.TelemetryCABundleFilePath(telCfg) and embeds the path in the RunConfig. At pod startup the proxy runner opens that path via os.ReadFile to build a custom TLS config for the OTLP collector.

However, buildVolumesForProxy never calls ctrlutil.AddTelemetryCABundleVolumes. The ConfigMap is never mounted into the pod. Any user who sets spec.openTelemetry.caBundleRef in their MCPTelemetryConfig will get a file-not-found error at proxy startup.

MCPServerReconciler.deploymentForMCPServer already handles this correctly — the same block needs to be added to deploymentForMCPRemoteProxy.

}

if tt.expectNoCondition {
for _, c := range tt.proxy.Status.Conditions {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Test assertions read in-memory state, not persisted state

All assertions in this loop read from tt.proxy.Status.* — the same pointer that meta.SetStatusCondition / meta.RemoveStatusCondition already mutated before r.Status().Update() was called. These assertions would pass even if the Status().Update() call were deleted entirely. WithStatusSubresource is correctly set up but has no practical effect because fakeClient.Get is never called.

The pattern from TestMCPServerReconciler_handleOIDCConfig_ConditionPersistedOnRecovery is the right model: after calling the handler, re-fetch and assert on the persisted object:

persisted := &mcpv1alpha1.MCPRemoteProxy{}
require.NoError(t, fakeClient.Get(ctx, types.NamespacedName{
    Name: tt.proxy.Name, Namespace: tt.proxy.Namespace,
}, persisted))
// assert on persisted.Status.* instead of tt.proxy.Status.*

While fixing this, also add a recovery-from-False scenario: proxy has a hash matching the config's hash but a stale False condition — assert the True condition is persisted. This covers the needsUpdate logic at line 714 which is currently untested.


if proxy.Spec.TelemetryConfigRef == nil {
// No MCPTelemetryConfig referenced, clear any stored hash
meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Condition removal not persisted when TelemetryConfigHash is already empty

When TelemetryConfigRef is removed from a proxy whose hash is already "" (e.g., a stale False condition remains from a prior error), meta.RemoveStatusCondition modifies the in-memory slice but r.Status().Update is skipped. The controller re-fetches on the next reconcile, discarding the change — the stale condition stays in the cluster.

Suggested change
meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated)
if proxy.Spec.TelemetryConfigRef == nil {
// No MCPTelemetryConfig referenced, clear any stored hash and condition.
condRemoved := meta.FindStatusCondition(proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) != nil
meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated)
if condRemoved || proxy.Status.TelemetryConfigHash != "" {
proxy.Status.TelemetryConfigHash = ""
if err := r.Status().Update(ctx, proxy); err != nil {
return fmt.Errorf("failed to clear MCPTelemetryConfig hash from status: %w", err)
}
}
return nil
}

}

if tt.expectNoCondition {
for _, c := range tt.proxy.Status.Conditions {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Test assertions read in-memory state, not persisted state

All assertions here read from tt.proxy.Status.* — the same pointer that meta.SetStatusCondition / meta.RemoveStatusCondition already mutated before r.Status().Update() was called. These assertions would pass even if the Status().Update() call were deleted entirely. WithStatusSubresource is correctly set up but has no practical effect because fakeClient.Get is never called to verify what was actually persisted.

The pattern from TestMCPServerReconciler_handleOIDCConfig_ConditionPersistedOnRecovery is the right model — re-fetch from fakeClient after the handler call and assert on the result:

persisted := &mcpv1alpha1.MCPRemoteProxy{}
require.NoError(t, fakeClient.Get(ctx, types.NamespacedName{
    Name: tt.proxy.Name, Namespace: tt.proxy.Namespace,
}, persisted))
// assert on persisted.Status.* instead of tt.proxy.Status.*

While fixing this, also add a recovery-from-False scenario: proxy has a hash matching the config's hash but a stale False condition — assert the True condition is persisted after the handler runs. This covers the needsUpdate logic at line 714 which is currently untested.


if proxy.Spec.TelemetryConfigRef == nil {
// No MCPTelemetryConfig referenced, clear any stored hash
meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Condition removal not persisted when TelemetryConfigHash is already empty

When TelemetryConfigRef is removed from a proxy whose hash is already "" (e.g., a stale False condition remains from a prior error cycle), meta.RemoveStatusCondition modifies the in-memory slice but r.Status().Update is skipped. The controller re-fetches from the API server on the next reconcile, discarding the in-memory change — the stale condition stays in the cluster indefinitely.

Suggested change
meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated)
if proxy.Spec.TelemetryConfigRef == nil {
// No MCPTelemetryConfig referenced, clear any stored hash and condition.
condRemoved := meta.FindStatusCondition(proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated) != nil
meta.RemoveStatusCondition(&proxy.Status.Conditions, mcpv1alpha1.ConditionTypeMCPRemoteProxyTelemetryConfigRefValidated)
if condRemoved || proxy.Status.TelemetryConfigHash != "" {
proxy.Status.TelemetryConfigHash = ""
if err := r.Status().Update(ctx, proxy); err != nil {
return fmt.Errorf("failed to clear MCPTelemetryConfig hash from status: %w", err)
}
}
return nil
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add TelemetryConfigRef support to MCPRemoteProxy

3 participants