Skip to content

Proxy audit middleware records outcome=success for JSON-RPC application-level errors in HTTP 200 responses #4678

@gmogmzGithub

Description

@gmogmzGithub

Bug description

The proxy audit middleware determines the outcome field exclusively from the HTTP status code (pkg/audit/auditor.go:308-322, determineOutcome()). Because MCP servers return application-level errors inside HTTP 200 responses per the JSON-RPC spec, the audit log records outcome=success even when the response body contains a JSON-RPC error object.

This means failures like expired tokens, API errors, and invalid parameters are invisible in the audit trail.

Question for maintainers: Is this transport-only outcome determination a deliberate design choice (keeping the audit middleware protocol-agnostic), or is it a gap worth fixing? If you are open for discussion before submitting a PR, I'm in.

Steps to reproduce

  1. Run an MCP server through ToolHive with audit enabled (--audit)
  2. Trigger a tool call that results in an application-level error (e.g., use an expired PAT, request a non-existent resource)
  3. The MCP server returns HTTP 200 with a JSON-RPC error body:
{
  "jsonrpc": "2.0",
  "id": "...",
  "error": {
    "code": -32603,
    "message": "GitLab API error: 401 Unauthorized"
  }
}
  1. Check the proxy audit log — the event is recorded as outcome=success

Expected behavior

The audit log should distinguish between a true success (HTTP 200 with a valid JSON-RPC result) and an application-level failure (HTTP 200 with a JSON-RPC error field). For example, recording outcome=application_error with the JSON-RPC error code and a truncated message.

Actual behavior

All HTTP 200 responses are logged as outcome=success, regardless of the JSON-RPC response body content.

Relevant codepkg/audit/auditor.go:

// Line 308-322
func (*Auditor) determineOutcome(statusCode int) string {
    switch {
    case statusCode >= 200 && statusCode < 300:
        return OutcomeSuccess
    // ...
    }
}

The response body is only captured when IncludeResponseData is enabled (line 200-202), and even then it's only used for data logging in addEventData() (line 250) — never for outcome determination.

The impact we Observed

During an internal MCP stability pilot with multiple users and workloads, we observed:

  • Expired authentication tokens persisting for 5+ days with zero visibility in audit logs. All tool calls showed outcome=success in the proxy log while the MCP log contained repeated code=-32603, "401 Unauthorized" errors.
  • API errors (400 Bad Request, 502 Bad Gateway) wrapped in JSON-RPC error responses, all recorded as successful.
  • File-not-found and routing errors across multiple MCP servers (GitLab, Slack, Google Workspace), invisible in proxy audit.

In total, we identified 3,700+ silent failures across 5 workloads in the pilot data.

Environment

  • ToolHive: latest main branch
  • OS: macOS

Additional context

If the team agrees this is worth addressing, I'd like to submit a PR. The approach I have in mind:

  1. Always buffer a small portion of the response body (even when IncludeResponseData is off) — enough to detect a JSON-RPC error field
  2. In logAuditEvent(), after determining the HTTP-based outcome, check the buffered body for a top-level "error" field when the outcome is success
  3. Add OutcomeApplicationError as a new outcome constant and record the JSON-RPC error code/message as metadata

This would be additive — existing outcome values remain unchanged, and consumers that don't know about application_error won't break.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiItems related to the APIauditbugSomething isn't workinggoPull requests that update go code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions