feat: expose metrics for MCP calls by xcoulon · Pull Request #38 · codeready-toolchain/argocd-mcp-server

xcoulon · 2026-01-05T16:24:09Z

the mcp_calls_total and mcp_call_duration_seconds metrics are collected
via the SDK's mcp.Middleware and exposed on the /metrics endpoint of the server

Summary by CodeRabbit

New Features
- Added Prometheus metrics collection for MCP tool calls (counts and durations) and exposed a /metrics endpoint.
Updates
- HTTP transport example adds ARGOCD_MCP_SERVER_LISTEN_HOST=0.0.0.0.
- Startup log output key adjusted.
Tests
- End-to-end tests extended to validate metrics are recorded for HTTP transport scenarios.
Chores
- Build/task workflow adjusted and compose file version removed.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-05T16:24:16Z

Walkthrough

Adds Prometheus metrics for MCP tool calls (counter and histogram), registers a /metrics HTTP endpoint, injects a metrics middleware into the MCP server request path, updates go.mod for Prometheus and k8s deps, extends e2e tests to validate metrics, and adjusts build/task and compose/README examples.

Changes

Cohort / File(s)	Summary
Metrics definitions `internal/metrics/metrics.go`	New exported Prometheus metrics: `MCPCallsTotal` (CounterVec) and `MCPCallDuration` (HistogramVec).
Server middleware `internal/server/middleware.go`, `internal/server/server.go`	New `NewMetricsMiddleware` middleware that times MCP calls, extracts tool name, marks success, and records metrics; middleware registered during server setup.
HTTP / startup `cmd/start_server.go`	Registers Prometheus `/metrics` endpoint in the HTTP mux and updates log key from `url` to `argocd-url`.
E2E tests `test/e2e/server_test.go`	Adds metric fetching helpers and assertions to verify `mcp_calls_total` and `mcp_call_duration_seconds` increments for HTTP transport test cases; renames some test variants to `*-unreachable`.
Dependencies `go.mod`	Adds `prometheus/client_golang` and `k8s.io/client-go` as direct deps, updates several k8s/prometheus transitive versions and test tooling.
Tasks & compose & docs `taskfile.yaml`, `compose.yaml`, `README.md`	Renamed/added tasks (`install`→`build`, added `podman-compose-build`), removed `version: '3.8'` from `compose.yaml`, and updated README example to set `ARGOCD_MCP_SERVER_LISTEN_HOST=0.0.0.0`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MCPServer as MCP Server
    participant MetricsMW as Metrics Middleware
    participant Prometheus as Prometheus Collector
    participant HTTP as /metrics Endpoint

    Client->>MCPServer: Send MCP tool call
    MCPServer->>MetricsMW: Invoke middleware (method, params)
    MetricsMW->>MetricsMW: start timer, inspect params
    MetricsMW->>MCPServer: Call next handler
    MCPServer-->>MetricsMW: Return result, error
    MetricsMW->>Prometheus: Increment MCPCallsTotal (method, tool, success)
    MetricsMW->>Prometheus: Observe MCPCallDuration (method, tool, success)
    MetricsMW-->>Client: Return result, error

    Client->>HTTP: GET /metrics
    HTTP->>Prometheus: Gather metrics
    Prometheus-->>HTTP: Return exposition
    HTTP-->>Client: Metrics payload

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

chore: rename 'argocd-mcp' to 'argocd-mcp-server' #29 — touches cmd/start_server.go and server HTTP setup, related to endpoint/startup changes.
test: e2e tests with podman compose #40 — adjusts e2e runtime and listen-host usage, related to README/test environment changes.

Suggested reviewers

mfrancisc
rsoaresd

Poem

🐰 I hopped in code where metrics grow,
Counters tick and histograms flow,
Each MCP call now leaves a trace,
Prometheus watches every race —
Hop, measure, ship — a bouncy show!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: expose metrics for MCP calls' accurately and concisely summarizes the main change—adding Prometheus metrics exposure for MCP tool calls.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

test/e2e/server_test.go (2)
49-97: Metrics verification pattern looks good.

The before/after metrics comparison with explicit label matching is a solid approach for verifying metric increments. The gating to HTTP transport only makes sense since stdio doesn't expose a metrics endpoint.

Consider extracting the repeated metrics verification pattern into a helper to reduce boilerplate, e.g.:
func assertMetricsIncremented(t *testing.T, transport string, url string, labels map[string]string, action func()) {
    if transport != "http" {
        action()
        return
    }
    before := getMetrics(t, url, labels)
    action()
    after := getMetrics(t, url, labels)
    // assertions...
}
This would reduce the ~20 lines per test case to a single call, but the current explicit approach is also fine for readability.

253-277: Inconsistent error logging between counter and histogram retrieval.

The counter metric logs when it fails to fetch (line 265), but the histogram bucket retrieval silently defaults to 0 on error. For consistency and easier debugging, consider logging histogram errors as well.
♻️ Proposed fix
 	if buckets, err := toolchaintests.GetHistogramBuckets(&rest.Config{}, mcpServerURL, `mcp_call_duration_seconds`, labelStrings); err == nil {
 		for _, bucket := range buckets {
 			if bucket.GetUpperBound() == math.Inf(1) {
 				mcpCallsDurationSecondsInf = int64(bucket.GetCumulativeCount()) //nolint:gosec
 				break
 			}
 		}
+	} else {
+		t.Logf("failed to get mcp_call_duration_seconds histogram, assuming 0: %v", err)
 	}

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59e362b and 634c829.

📒 Files selected for processing (2)

taskfile.yaml
test/e2e/server_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

taskfile.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test

🔇 Additional comments (4)

test/e2e/server_test.go (4)

3-20: LGTM!

The new imports are appropriate for the metrics verification functionality—math for infinity comparison, toolchaintests for metric retrieval helpers, and rest for the required config type.

218-230: LGTM!

The -unreachable suffix clearly distinguishes these test cases from the main test suite and accurately describes the scenario being tested.

254-258: Label slice construction is correct.

The key-value pair format matches the expected interface for the metrics helpers. Map iteration order doesn't affect correctness since label matching is order-independent.

279-312: LGTM!

The session helper functions remain unchanged and continue to provide clean abstractions for test setup.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

xcoulon · 2026-01-05T16:24:57Z

draft until codeready-toolchain/toolchain-e2e#1243 #37 have been reviewed and merged

the `mcp_calls_total` and `mcp_call_duration_seconds` metrics are collected via the SDK's `mcp.Middleware` and exposed on the `/metrics` endpoint of the server Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @internal/metrics/metrics.go:
- Line 10: The comment above the metric is incorrect: it references
"ToolCallsTotal" but the variable is named "MCPCallsTotal"; update the comment
to accurately describe MCPCallsTotal (e.g., "MCPCallsTotal counts total MCP
invocations by method/name") so the comment and the variable MCPCallsTotal in
internal/metrics/metrics.go match.
- Around line 19-26: The MCPCallDuration histogram uses seconds buckets but the
middleware is observing duration.Milliseconds(), causing wrong bucketing; in the
middleware function that records MCPCallDuration (the code that computes a
duration variable and calls MCPCallDuration.WithLabelValues(...).Observe(...)),
replace the use of duration.Milliseconds() with duration.Seconds() (no cast
needed) so the observed value is in seconds and aligns with
prometheus.DefBuckets.

In @internal/server/middleware.go:
- Line 34: The metric uses MCPCallDuration (named mcp_call_duration_seconds) but
records duration.Milliseconds(), causing a 1000x mismatch; update the
observation to use duration.Seconds() and cast to float64 (i.e., replace
duration.Milliseconds() with duration.Seconds()) in the
metrics.MCPCallDuration.WithLabelValues(method, tool,
strconv.FormatBool(success)).Observe(...) call so the recorded unit matches
seconds.

In @test/e2e/server_test.go:
- Line 249: The require.NoError call can panic because cmd.ProcessState may be
nil; modify the failure message around require.NoError(t, err, ...) to first
check cmd.ProcessState != nil and include cmd.ProcessState.ExitCode() only when
non-nil (e.g., compute a local variable like exitInfo := "<no process state>" or
an int default and conditionally format the message), otherwise omit or
substitute a safe placeholder; update the call that currently references
cmd.ProcessState.ExitCode() so it uses the guarded value and prevents nil
dereference.

🧹 Nitpick comments (2)

go.mod (1)

12-12: Version v1.22.0 is stable and secure.

prometheus/client_golang v1.22.0 is a stable release (April 2025) with no known vulnerabilities; the major CVE-2022-21698 was fixed in v1.11.1. Consider upgrading to v1.23.2 (latest stable, September 2025) at a future maintenance window if alignment with the latest release is desired.
internal/server/middleware.go (1)
23-26: Consider using a default value for non-tool requests.

When the request is not a CallToolParams, the tool variable remains an empty string. While Prometheus accepts empty label values, using a default like "N/A" or "none" would improve metric queryability and clarity.
♻️ Proposed refactor
-		var tool string
+		tool := "N/A"
 		if p, ok := req.GetParams().(*mcp.CallToolParamsRaw); ok {
 			tool = p.Name
 		}

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d352f3 and 045028c.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (7)

README.md
cmd/start_server.go
go.mod
internal/metrics/metrics.go
internal/server/middleware.go
internal/server/server.go
test/e2e/server_test.go

🧰 Additional context used

🧬 Code graph analysis (2)

internal/server/middleware.go (1)

internal/metrics/metrics.go (2)

MCPCallsTotal (11-17)

MCPCallDuration (19-26)

internal/server/server.go (1)

internal/server/middleware.go (1)

NewMetricsMiddleware (13-38)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Lint
GitHub Check: Test

🔇 Additional comments (11)

README.md (1)

117-117: LGTM! Necessary for container accessibility.

Setting ARGOCD_MCP_SERVER_LISTEN_HOST=0.0.0.0 ensures the server binds to all network interfaces, making it accessible from outside the container. This aligns well with the port mapping configuration.

internal/server/server.go (1)

27-27: LGTM! Metrics middleware integration is well-placed.

The metrics middleware is correctly positioned to capture all incoming MCP requests. Placing it after prompts but before tools ensures comprehensive coverage of both tool invocations and prompt handling.

cmd/start_server.go (2)

62-62: LGTM! Log key renamed for clarity.

Renaming the log key from "url" to "argocd-url" improves clarity and consistency with the flag name.

86-87: Consider access control for the metrics endpoint.

The /metrics endpoint exposes operational metrics without authentication. While this is standard for internal Kubernetes deployments where metrics are scraped within a trusted network, verify this aligns with your security requirements.

If the server is exposed to untrusted networks, consider adding authentication or restricting access to the metrics endpoint.

go.mod (1)

16-16: The k8s.io/client-go version override is intentional—no conflict exists.

Line 16 declares k8s.io/client-go v0.33.0, and line 182 replaces it with v0.32.2. This is standard Go practice using replace directives. The effective version is unambiguous: v0.33.0 => v0.32.2. All k8s.io packages are consistently pinned to v0.32.2, suggesting a deliberate compatibility strategy. If the reasoning for this downgrade isn't already documented, consider adding a comment in go.mod explaining why v0.33.0 is required but v0.32.2 is used.

test/e2e/server_test.go (4)

15-16: LGTM!

The new imports for metrics verification (toolchaintests and rest) are correctly added and align with the metrics testing requirements.

Also applies to: 23-23

175-180: LGTM!

Metrics verification is consistently applied across both success and error scenarios. The pattern correctly validates that the mcp_calls_total metric is incremented with appropriate label values for each test case.

Also applies to: 195-200

210-210: LGTM!

The test case renames to include "-unreachable" improve clarity, and the change from t.Errorf to t.Logf on line 267 appropriately handles expected exit errors during teardown.

Also applies to: 214-214, 267-267

106-112: Verification of toolchaintests.GetMetricValue API usage is inconclusive.

The code consistently uses flattened array format []string{"method", "tools/call", "name", "unhealthyApplications", "success", "true"} across all metric queries in the test file. However, the actual GetMetricValue implementation in the external github.com/codeready-toolchain/toolchain-e2e/testsupport/metrics package could not be accessed to confirm the expected label format against the API specification. Verify this against the toolchain-e2e library's actual API documentation to ensure compliance.

internal/server/middleware.go (2)

13-22: LGTM!

The middleware structure follows the standard pattern correctly: it logs the request, records the start time, invokes the next handler, and measures the duration.

27-31: LGTM!

The success determination logic correctly accounts for both Go errors (err == nil) and MCP protocol errors (!r.IsError), ensuring accurate metrics labeling for both transport-level and business-logic failures.

rsoaresd

Great work 🚀

Just minor question for learning purposes

rsoaresd · 2026-01-07T13:49:37Z

+				// also, check the metrics when the server runs on HTTP
+				if td.name == "http" {
+					// get the metrics
+					metric, err := toolchaintests.GetMetricValue(&rest.Config{}, "http://"+MCPServerListen, `mcp_calls_total`, []string{"method", "tools/call", "name", "unhealthyApplications", "success", "true"})


is there a reason why we are not checking mcp_call_duration_seconds? Sorry if it is a noob question, just want to understand

good point, let me add a check for that metric ;)

alexeykazakov

Looks good bu take a look at the CodeRabitAI comments :)
Any plans for adding metrics to the sandbox MCP servers too?

alexeykazakov · 2026-01-07T17:27:09Z

+
+var (
+
+	// ToolCallsTotal counts total tool invocation by tool name


Suggested change

// ToolCallsTotal counts total tool invocation by tool name

// MCPCallsTotal counts total tool invocation by tool name

done in 5b6f44d

xcoulon · 2026-01-07T17:33:48Z

Looks good bu take a look at the CodeRabitAI comments :)

I will, definitely ;)

Any plans for adding metrics to the sandbox MCP servers too?

well, @mfrancisc has already done a round of metrics here, and I will port my work here into https://github.com/codeready-toolchain/devsandbox-observability-mcp-server once this PR is merged

mfrancisc · 2026-01-08T15:04:03Z

+			// Metrics endpoint
+			mux.Handle("/metrics", promhttp.Handler())


I think we usually expose the metrics using it's own server and different port , but I guess this is fine for the mcp server that is not exposed publicly.

I've opened a discussion on Slack for this specific question

mfrancisc · 2026-01-08T15:13:26Z

+	MCPCallsTotal = promauto.NewCounterVec(
+		prometheus.CounterOpts{
+			Name: "mcp_calls_total",
+			Help: "Total number of MCP calls",
+		},
+		[]string{"method", "name", "success"},
+	)


I think we should have separate metrics for tool calls and method calls.
For example as https://github.com/codeready-toolchain/mcp-server-devsandbox/blob/master/pkg/metrics/metrics.go

the mcp_method_calls_total will count invocations to standard MCP protocol operations like : initialize , ping, tools/list

the mcp_tool_calls_total will count calls per tool

Also it would be nice to have mcp_tool_call_errors_total counting errors per tool . So that we can have insights on tools that are failing ( maybe because the llm can't figure out how to call them properly or for any other reason .. )

but since we have labels for the mcp_calls_total, we can filter by method (initialize, tools/call, etc.) and then for tools, by name (unhealthyApplications, unhealthyApplicationResources etc.), and also by success (true, false).
I mean, I don't see the benefits of having multiple metrics.

Also, that's what is usually done with HTTP metrics: you have labels for the method (get, post) and the return code (200, 404, 500, etc.)

I thought that having separated metrics for tools and methods would help with creating separate dashboards , since I guess we would be mostly interested into looking at the tools charts ( more than the methods ones ), but maybe I'm wrong.

I believe we ca have a single metric and use the labels to focus on tool calls per name, etc.

sounds good to me 👍

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

internal/metrics/metrics.go (2)
19-27: Consider custom histogram buckets (DefBuckets may not fit MCP latency profile).
prometheus.DefBuckets can be too coarse/short for RPC-like calls (or too coarse at the low end). If you have expected SLOs (e.g., p95 ~ 200ms, but occasional 30s), tune buckets accordingly so the histogram is actually useful.
Possible bucket tuning (example)
 	MCPCallDuration = promauto.NewHistogramVec(
 		prometheus.HistogramOpts{
 			Name:    "mcp_call_duration_seconds",
 			Help:    "Duration of MCP calls in seconds",
-			Buckets: prometheus.DefBuckets,
+			// Adjust based on expected MCP latency distribution / SLOs.
+			Buckets: []float64{0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30},
 		},
 		[]string{"method", "name", "success"},
 	)
1-6: The current implementation is sound and avoids the risks mentioned. The metric names (mcp_calls_total and mcp_call_duration_seconds) are registered only once in the codebase, eliminating duplicate registration concerns. Label cardinality is well-controlled: the name label is populated by tool (bounded to available tools), not unbounded user input, and success is a boolean. If embedding or custom testing scenarios require a custom prometheus.Registerer in the future, it can be refactored then—the current default-registry approach is appropriate for this service.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 045028c and 5b6f44d.

📒 Files selected for processing (1)

internal/metrics/metrics.go

🔇 Additional comments (1)

internal/metrics/metrics.go (1)

8-17: The name label cardinality is bounded and not a concern.

The name label only takes values from the two registered tools ("unhealthyApplications" and "unhealthyApplicationResources") or an empty string for non-tools/call methods. There is no risk of unbounded cardinality or memory explosion. While using a sentinel value like "__none__" instead of an empty string would be slightly clearer for code readability, this is not a functional issue.

Likely an incorrect or invalid review comment.

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

mfrancisc

Looks good 👍

Thanks for answering my questions and for the discussion !

rajivnathan

Assuming you still plan to add the test for mcp_call_duration_seconds ?

xcoulon · 2026-01-13T16:28:34Z

Assuming you still plan to add the test for mcp_call_duration_seconds ?

yes, it's still something I plan to add, indeed! I've switched to #40 for now but I'll come back to this one right after that ;)

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@test/e2e/server_test.go`:
- Around line 266-268: The test helper valueInLastBucket should guard against an
empty buckets slice and avoid unsafe uint64->int conversion; update
valueInLastBucket to return 0 (or an error) when len(buckets) == 0, read the
uint64 value from buckets[len(buckets)-1].CumulativeCount into a uint64
variable, then clamp it to the runtime int range (compare against math.MaxInt)
before converting to int so you never overflow on 32-bit systems; reference the
valueInLastBucket function and buckets[..].CumulativeCount when making this
change.

🧹 Nitpick comments (1)

test/e2e/server_test.go (1)

253-264: Consider parameterizing mcpServerURL or making it a constant.

The static analysis tool flags that mcpServerURL always receives "http://localhost:50081". While this is minor for test code, you could either extract it as a constant or accept the lint warning with a //nolint:unparam directive if this is intentional.

♻️ Option: Extract as a constant

+const mcpServerMetricsURL = "http://localhost:50081"
+
-func getMetrics(t *testing.T, mcpServerURL string, labels map[string]string) (int, []*io_prometheus_client.Bucket) {
+func getMetrics(t *testing.T, labels map[string]string) (int, []*io_prometheus_client.Bucket) {
 	labelStrings := make([]string, 0, 2*len(labels))
 	for k, v := range labels {
 		labelStrings = append(labelStrings, k)
 		labelStrings = append(labelStrings, v)
 	}
-	mcpCallsTotalMetric, err := toolchaintests.GetMetricValue(&rest.Config{}, mcpServerURL, `mcp_calls_total`, labelStrings)
+	mcpCallsTotalMetric, err := toolchaintests.GetMetricValue(&rest.Config{}, mcpServerMetricsURL, `mcp_calls_total`, labelStrings)
 	require.NoError(t, err)
-	mcpCallsDurationSecondsBuckets, err := toolchaintests.GetHistogramBuckets(&rest.Config{}, mcpServerURL, `mcp_call_duration_seconds`, labelStrings)
+	mcpCallsDurationSecondsBuckets, err := toolchaintests.GetHistogramBuckets(&rest.Config{}, mcpServerMetricsURL, `mcp_call_duration_seconds`, labelStrings)
 	require.NoError(t, err)
 	return int(mcpCallsTotalMetric), mcpCallsDurationSecondsBuckets
 }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33ec9b7 and a09342a.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (6)

.github/workflows/cd.yml
README.md
cmd/start_server.go
go.mod
taskfile.yaml
test/e2e/server_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

README.md

🧰 Additional context used

🪛 GitHub Actions: ci

test/e2e/server_test.go

[error] 267-267: Gosec: G115: integer overflow conversion uint64 -> int. Potential overflow when converting uint64 to int in test/e2e/server_test.go:267.

🪛 GitHub Check: Lint

test/e2e/server_test.go

[failure] 253-253:
getMetrics - mcpServerURL always receives "http://localhost:50081" (unparam)

[failure] 267-267:
G115: integer overflow conversion uint64 -> int (gosec)

🔇 Additional comments (8)

.github/workflows/cd.yml (1)

24-24: LGTM!

The containerfile reference is correctly updated from Containerfile.yaml to Containerfile, consistent with the corresponding change in taskfile.yaml.

taskfile.yaml (2)

17-27: LGTM!

Good modernization: updating from podman-compose to podman compose (Compose v2 syntax) and shifting the build strategy to container-based builds with podman compose build.

39-39: LGTM!

The Containerfile reference is correctly updated to match .github/workflows/cd.yml.

cmd/start_server.go (2)

15-15: LGTM!

Correct import for exposing Prometheus metrics via HTTP.

86-87: LGTM!

Standard approach for exposing Prometheus metrics. The /metrics endpoint correctly uses promhttp.Handler() from the Prometheus client library.

test/e2e/server_test.go (2)

10-19: LGTM!

Imports are correctly added for Prometheus metrics testing utilities.

49-97: LGTM!

Good test pattern: capturing metrics before the call, executing the tool call, then verifying both mcp_calls_total incremented by 1 and mcp_call_duration_seconds histogram recorded the call. The HTTP-transport guard is appropriate since stdio doesn't expose metrics.

go.mod (1)

8-17: LGTM!

Dependencies are correctly added to support the Prometheus metrics feature:

toolchain-e2e for metrics test utilities

prometheus/client_golang for metrics instrumentation

k8s.io/client-go required by the test utilities

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

… use `docker-compose` Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

xcoulon changed the title ~~argocd mcp server metrics~~ feat: expose metricsfor MCP calls Jan 5, 2026

xcoulon changed the title ~~feat: expose metricsfor MCP calls~~ feat: expose metrics for MCP calls Jan 5, 2026

feat: expose metricsfor MCP calls

e845c33

the `mcp_calls_total` and `mcp_call_duration_seconds` metrics are collected via the SDK's `mcp.Middleware` and exposed on the `/metrics` endpoint of the server Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

xcoulon force-pushed the argocd_mcp_server_metrics branch from 5fe0cdb to e845c33 Compare January 6, 2026 08:29

update dependencies

045028c

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

xcoulon marked this pull request as ready for review January 7, 2026 13:24

coderabbitai Bot reviewed Jan 7, 2026

View reviewed changes

Comment thread internal/metrics/metrics.go Outdated

Comment thread internal/metrics/metrics.go

Comment thread internal/server/middleware.go Outdated

Comment thread test/e2e/server_test.go Outdated

rsoaresd approved these changes Jan 7, 2026

View reviewed changes

alexeykazakov approved these changes Jan 7, 2026

View reviewed changes

mfrancisc reviewed Jan 8, 2026

View reviewed changes

fix comments

5b6f44d

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

coderabbitai Bot reviewed Jan 9, 2026

View reviewed changes

xcoulon added 3 commits January 9, 2026 14:36

fix unit for duration

b4ba469

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

avoid panic when the MCP server connection fails

33ec9b7

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

wip...

71e1898

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

mfrancisc approved these changes Jan 9, 2026

View reviewed changes

rajivnathan approved these changes Jan 13, 2026

View reviewed changes

xcoulon added 2 commits January 14, 2026 11:35

Merge branch 'master' into argocd_mcp_server_metrics

770a040

refactor test and verify the mcp_call_duration_seconds metric

a09342a

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

coderabbitai Bot reviewed Jan 14, 2026

View reviewed changes

Comment thread test/e2e/server_test.go Outdated

xcoulon added 4 commits January 14, 2026 15:05

fix lint issues

eebbf5d

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

Merge branch 'master' into argocd_mcp_server_metrics

1acfbea

explicit podman-compose to make sure that podman compose does not…

59e362b

… use `docker-compose` Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

fix e2e test: metrics do not exist until first call to MCP server

634c829

Signed-off-by: Xavier Coulon <xcoulon@redhat.com>

xcoulon merged commit 3f447ea into codeready-toolchain:master Jan 14, 2026
4 checks passed

xcoulon deleted the argocd_mcp_server_metrics branch January 14, 2026 16:09

This was referenced Jan 15, 2026

refactor: bind to 127.0.0.1:8080 by default #43

Merged

SANDBOX-1567: Add stateless mode configuration #39

Merged

coderabbitai Bot mentioned this pull request Jan 27, 2026

metrics: include the name of the server #47

Merged


		var (

		// ToolCallsTotal counts total tool invocation by tool name

	// ToolCallsTotal counts total tool invocation by tool name
	// MCPCallsTotal counts total tool invocation by tool name

		// Metrics endpoint
		mux.Handle("/metrics", promhttp.Handler())

Conversation

xcoulon commented Jan 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

xcoulon commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rsoaresd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexeykazakov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xcoulon commented Jan 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xcoulon Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mfrancisc left a comment

Choose a reason for hiding this comment

Uh oh!

rajivnathan left a comment

Choose a reason for hiding this comment

Uh oh!

xcoulon commented Jan 13, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

xcoulon commented Jan 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 5, 2026 •

edited

Loading

xcoulon commented Jan 5, 2026 •

edited

Loading

xcoulon Jan 9, 2026 •

edited

Loading