Skip to content

feat(runtime-api): mTLS-aware live inventory probing#338

Merged
Agent-Hellboy merged 2 commits into
mainfrom
services-api/live_inventory_mtls
Jun 29, 2026
Merged

feat(runtime-api): mTLS-aware live inventory probing#338
Agent-Hellboy merged 2 commits into
mainfrom
services-api/live_inventory_mtls

Conversation

@Agent-Hellboy

Copy link
Copy Markdown
Owner

Summary

  • Project authMode and trustDomain through controlplane.ServerInfo.
  • Live inventory skips spoofable X-MCP-Human-ID / X-MCP-Agent-ID headers when the target server uses auth.mode: mtls.
  • For mtls servers, probe via the public HTTPS ingress using a session-bound client certificate (issued through the existing cert-manager path, or env-mounted cert files) instead of the in-cluster HTTP header shortcut.

Dependencies

Test plan

  • go test -race -count=1 ./pkg/controlplane/...
  • go test -race -count=1 ./internal/runtimeapi/... (in services/runtime-api)

Part of post-#331 follow-up queue (#5). Next: live data-path E2E (mtls scenario).

Made with Cursor

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for mTLS authentication when probing live inventory by adding AuthMode and TrustDomain to ServerInfo, refactoring the adapter certificate issuance logic for reuse, and implementing mTLS probe client creation. Feedback focuses on a critical performance improvement: caching the generated mTLS *http.Client instances to avoid recreating them on every probe request, which would otherwise bypass connection pooling, cause socket exhaustion, and trigger redundant Kubernetes API calls.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 248 to 252
client *http.Client
baseURLForServer func(controlplane.ServerInfo) string
now func() time.Time
access *AccessService
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To prevent recreating the *http.Client and *http.Transport on every single probe request (which bypasses connection pooling and leads to high CPU/latency and socket exhaustion), we should cache the mTLS clients. Let's add a mutex and a map to cache these clients in mcpLiveInventoryProber.

	client           *http.Client
	baseURLForServer func(controlplane.ServerInfo) string
	now              func() time.Time
	access           *AccessService
	mu               sync.Mutex
	mtlsClients      map[string]*cachedClient
}

type cachedClient struct {
	client    *http.Client
	expiresAt time.Time
}

Comment on lines 100 to 103
prober = &mcpLiveInventoryProber{
client: &http.Client{Timeout: liveInventoryProbeTimeout},
access: s.access,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Initialize the mtlsClients cache map when instantiating mcpLiveInventoryProber.

			prober = &mcpLiveInventoryProber{
				client:      &http.Client{Timeout: liveInventoryProbeTimeout},
				access:      s.access,
				mtlsClients: make(map[string]*cachedClient),
			}

Comment on lines +47 to +73
func (p *mcpLiveInventoryProber) mtlsProbeClient(ctx context.Context, server controlplane.ServerInfo) (*http.Client, error) {
if certFile := strings.TrimSpace(os.Getenv(liveInventoryClientCertEnv)); certFile != "" {
keyFile := strings.TrimSpace(os.Getenv(liveInventoryClientKeyEnv))
if keyFile == "" {
return nil, fmt.Errorf("mTLS live inventory client key file is not configured")
}
return tlsHTTPClientFromFiles(certFile, keyFile, strings.TrimSpace(os.Getenv(liveInventoryClientCAEnv)), liveInventoryProbeTimeout)
}
if p.access == nil {
return nil, fmt.Errorf("mTLS live inventory certificate issuer is not configured")
}
keyPEM, csrPEM, _, err := certauth.BuildSessionCSR(server.TrustDomain, server.Namespace, liveInventoryProbeSessionName())
if err != nil {
return nil, err
}
certPEM, caPEM, err := p.access.issueSessionCertificate(
ctx,
server.Namespace,
liveInventoryProbeSessionName(),
server.TrustDomain,
string(csrPEM),
)
if err != nil {
return nil, err
}
return tlsHTTPClientFromPEM(keyPEM, []byte(certPEM), []byte(caPEM), liveInventoryProbeTimeout)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Implement the caching logic in mtlsProbeClient. This ensures that we reuse the same *http.Client (and its underlying connection pool) until the certificate is close to its expiration time. This avoids making redundant Kubernetes API calls to create CertificateRequest resources and generating new private keys/CSRs every 30 seconds.

func (p *mcpLiveInventoryProber) mtlsProbeClient(ctx context.Context, server controlplane.ServerInfo) (*http.Client, error) {
	p.mu.Lock()
	defer p.mu.Unlock()

	if p.mtlsClients == nil {
		p.mtlsClients = make(map[string]*cachedClient)
	}

	cacheKey := server.Namespace + "/" + server.Name
	if cached, ok := p.mtlsClients[cacheKey]; ok && p.currentTime().Add(2*time.Minute).Before(cached.expiresAt) {
		return cached.client, nil
	}

	var client *http.Client
	var expiresAt time.Time
	var err error

	if certFile := strings.TrimSpace(os.Getenv(liveInventoryClientCertEnv)); certFile != "" {
		keyFile := strings.TrimSpace(os.Getenv(liveInventoryClientKeyEnv))
		if keyFile == "" {
			return nil, fmt.Errorf("mTLS live inventory client key file is not configured")
		}
		client, expiresAt, err = tlsHTTPClientFromFiles(certFile, keyFile, strings.TrimSpace(os.Getenv(liveInventoryClientCAEnv)), liveInventoryProbeTimeout)
		if err != nil {
			return nil, err
		}
	} else {
		if p.access == nil {
			return nil, fmt.Errorf("mTLS live inventory certificate issuer is not configured")
		}
		keyPEM, csrPEM, _, err := certauth.BuildSessionCSR(server.TrustDomain, server.Namespace, liveInventoryProbeSessionName())
		if err != nil {
			return nil, err
		}
		certPEM, caPEM, err := p.access.issueSessionCertificate(
			ctx,
			server.Namespace,
			liveInventoryProbeSessionName(),
			server.TrustDomain,
			string(csrPEM),
		)
		if err != nil {
			return nil, err
		}
		client, expiresAt, err = tlsHTTPClientFromPEM(keyPEM, []byte(certPEM), []byte(caPEM), liveInventoryProbeTimeout)
		if err != nil {
			return nil, err
		}
	}

	p.mtlsClients[cacheKey] = &cachedClient{
		client:    client,
		expiresAt: expiresAt,
	}
	return client, nil
}

Comment on lines +75 to +101
func tlsHTTPClientFromFiles(certFile, keyFile, caFile string, timeout time.Duration) (*http.Client, error) {
cert, err := tls.LoadX509KeyPair(certFile, keyFile)
if err != nil {
return nil, fmt.Errorf("load mTLS client key pair: %w", err)
}
tlsConfig := &tls.Config{
MinVersion: tls.VersionTLS12,
Certificates: []tls.Certificate{cert},
}
if caFile != "" {
caPEM, err := os.ReadFile(caFile)
if err != nil {
return nil, fmt.Errorf("read mTLS client CA bundle: %w", err)
}
pool := x509.NewCertPool()
if !pool.AppendCertsFromPEM(caPEM) {
return nil, fmt.Errorf("parse mTLS client CA bundle")
}
tlsConfig.RootCAs = pool
}
return &http.Client{
Timeout: timeout,
Transport: &http.Transport{
TLSClientConfig: tlsConfig,
},
}, nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Modify tlsHTTPClientFromFiles to parse and return the leaf certificate's expiration time (NotAfter) so that we can cache the client until the certificate expires.

func tlsHTTPClientFromFiles(certFile, keyFile, caFile string, timeout time.Duration) (*http.Client, time.Time, error) {
	cert, err := tls.LoadX509KeyPair(certFile, keyFile)
	if err != nil {
		return nil, time.Time{}, fmt.Errorf("load mTLS client key pair: %w", err)
	}
	var expiresAt time.Time
	if len(cert.Certificate) > 0 {
		if leaf, err := x509.ParseCertificate(cert.Certificate[0]); err == nil {
			expiresAt = leaf.NotAfter
		}
	}
	tlsConfig := &tls.Config{
		MinVersion:   tls.VersionTLS12,
		Certificates: []tls.Certificate{cert},
	}
	if caFile != "" {
		caPEM, err := os.ReadFile(caFile)
		if err != nil {
			return nil, time.Time{}, fmt.Errorf("read mTLS client CA bundle: %w", err)
		}
		pool := x509.NewCertPool()
		if !pool.AppendCertsFromPEM(caPEM) {
			return nil, time.Time{}, fmt.Errorf("parse mTLS client CA bundle")
		}
		tlsConfig.RootCAs = pool
	}
	client := &http.Client{
		Timeout: timeout,
		Transport: &http.Transport{
			TLSClientConfig: tlsConfig,
		},
	}
	return client, expiresAt, nil
}

Comment on lines +103 to +125
func tlsHTTPClientFromPEM(keyPEM, certPEM, caPEM []byte, timeout time.Duration) (*http.Client, error) {
cert, err := tls.X509KeyPair(certPEM, keyPEM)
if err != nil {
return nil, fmt.Errorf("load mTLS client key pair: %w", err)
}
tlsConfig := &tls.Config{
MinVersion: tls.VersionTLS12,
Certificates: []tls.Certificate{cert},
}
if len(caPEM) > 0 {
pool := x509.NewCertPool()
if !pool.AppendCertsFromPEM(caPEM) {
return nil, fmt.Errorf("parse mTLS client CA bundle")
}
tlsConfig.RootCAs = pool
}
return &http.Client{
Timeout: timeout,
Transport: &http.Transport{
TLSClientConfig: tlsConfig,
},
}, nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Modify tlsHTTPClientFromPEM to parse and return the leaf certificate's expiration time (NotAfter) so that we can cache the client until the certificate expires.

func tlsHTTPClientFromPEM(keyPEM, certPEM, caPEM []byte, timeout time.Duration) (*http.Client, time.Time, error) {
	cert, err := tls.X509KeyPair(certPEM, keyPEM)
	if err != nil {
		return nil, time.Time{}, fmt.Errorf("load mTLS client key pair: %w", err)
	}
	var expiresAt time.Time
	if len(cert.Certificate) > 0 {
		if leaf, err := x509.ParseCertificate(cert.Certificate[0]); err == nil {
			expiresAt = leaf.NotAfter
		}
	}
	tlsConfig := &tls.Config{
		MinVersion:   tls.VersionTLS12,
		Certificates: []tls.Certificate{cert},
	}
	if len(caPEM) > 0 {
		pool := x509.NewCertPool()
		if !pool.AppendCertsFromPEM(caPEM) {
			return nil, time.Time{}, fmt.Errorf("parse mTLS client CA bundle")
		}
		tlsConfig.RootCAs = pool
	}
	client := &http.Client{
		Timeout: timeout,
		Transport: &http.Transport{
			TLSClientConfig: tlsConfig,
		},
	}
	return client, expiresAt, nil
}

Agent-Hellboy and others added 2 commits June 29, 2026 17:20
Project auth.mode and trustDomain through ServerInfo so live inventory can
avoid spoofable governance headers on mtls servers. For mtls targets, probe
via the public https ingress with a session-bound client certificate issued
through the shared cert-manager path (or env-mounted cert files), instead of
the in-cluster HTTP header identity shortcut used for header-mode servers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Reuse the per-server mTLS *http.Client (and its connection pool) until
the client certificate nears expiry instead of rebuilding the transport
on every probe. For issuer-backed certs this also avoids re-issuing a
CertificateRequest every probe interval. tlsHTTPClientFrom{Files,PEM}
now return the leaf NotAfter so the cache can expire entries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Agent-Hellboy Agent-Hellboy force-pushed the services-api/live_inventory_mtls branch from 29bfc85 to e319e45 Compare June 29, 2026 11:55
@Agent-Hellboy Agent-Hellboy changed the base branch from pkg/identity_certauth to main June 29, 2026 11:55
@Agent-Hellboy Agent-Hellboy merged commit 3a6c551 into main Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant