feat(runtime-api): mTLS-aware live inventory probing#338
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Code Review
This pull request introduces support for mTLS authentication when probing live inventory by adding AuthMode and TrustDomain to ServerInfo, refactoring the adapter certificate issuance logic for reuse, and implementing mTLS probe client creation. Feedback focuses on a critical performance improvement: caching the generated mTLS *http.Client instances to avoid recreating them on every probe request, which would otherwise bypass connection pooling, cause socket exhaustion, and trigger redundant Kubernetes API calls.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| client *http.Client | ||
| baseURLForServer func(controlplane.ServerInfo) string | ||
| now func() time.Time | ||
| access *AccessService | ||
| } |
There was a problem hiding this comment.
To prevent recreating the *http.Client and *http.Transport on every single probe request (which bypasses connection pooling and leads to high CPU/latency and socket exhaustion), we should cache the mTLS clients. Let's add a mutex and a map to cache these clients in mcpLiveInventoryProber.
client *http.Client
baseURLForServer func(controlplane.ServerInfo) string
now func() time.Time
access *AccessService
mu sync.Mutex
mtlsClients map[string]*cachedClient
}
type cachedClient struct {
client *http.Client
expiresAt time.Time
}| prober = &mcpLiveInventoryProber{ | ||
| client: &http.Client{Timeout: liveInventoryProbeTimeout}, | ||
| access: s.access, | ||
| } |
| func (p *mcpLiveInventoryProber) mtlsProbeClient(ctx context.Context, server controlplane.ServerInfo) (*http.Client, error) { | ||
| if certFile := strings.TrimSpace(os.Getenv(liveInventoryClientCertEnv)); certFile != "" { | ||
| keyFile := strings.TrimSpace(os.Getenv(liveInventoryClientKeyEnv)) | ||
| if keyFile == "" { | ||
| return nil, fmt.Errorf("mTLS live inventory client key file is not configured") | ||
| } | ||
| return tlsHTTPClientFromFiles(certFile, keyFile, strings.TrimSpace(os.Getenv(liveInventoryClientCAEnv)), liveInventoryProbeTimeout) | ||
| } | ||
| if p.access == nil { | ||
| return nil, fmt.Errorf("mTLS live inventory certificate issuer is not configured") | ||
| } | ||
| keyPEM, csrPEM, _, err := certauth.BuildSessionCSR(server.TrustDomain, server.Namespace, liveInventoryProbeSessionName()) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| certPEM, caPEM, err := p.access.issueSessionCertificate( | ||
| ctx, | ||
| server.Namespace, | ||
| liveInventoryProbeSessionName(), | ||
| server.TrustDomain, | ||
| string(csrPEM), | ||
| ) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| return tlsHTTPClientFromPEM(keyPEM, []byte(certPEM), []byte(caPEM), liveInventoryProbeTimeout) | ||
| } |
There was a problem hiding this comment.
Implement the caching logic in mtlsProbeClient. This ensures that we reuse the same *http.Client (and its underlying connection pool) until the certificate is close to its expiration time. This avoids making redundant Kubernetes API calls to create CertificateRequest resources and generating new private keys/CSRs every 30 seconds.
func (p *mcpLiveInventoryProber) mtlsProbeClient(ctx context.Context, server controlplane.ServerInfo) (*http.Client, error) {
p.mu.Lock()
defer p.mu.Unlock()
if p.mtlsClients == nil {
p.mtlsClients = make(map[string]*cachedClient)
}
cacheKey := server.Namespace + "/" + server.Name
if cached, ok := p.mtlsClients[cacheKey]; ok && p.currentTime().Add(2*time.Minute).Before(cached.expiresAt) {
return cached.client, nil
}
var client *http.Client
var expiresAt time.Time
var err error
if certFile := strings.TrimSpace(os.Getenv(liveInventoryClientCertEnv)); certFile != "" {
keyFile := strings.TrimSpace(os.Getenv(liveInventoryClientKeyEnv))
if keyFile == "" {
return nil, fmt.Errorf("mTLS live inventory client key file is not configured")
}
client, expiresAt, err = tlsHTTPClientFromFiles(certFile, keyFile, strings.TrimSpace(os.Getenv(liveInventoryClientCAEnv)), liveInventoryProbeTimeout)
if err != nil {
return nil, err
}
} else {
if p.access == nil {
return nil, fmt.Errorf("mTLS live inventory certificate issuer is not configured")
}
keyPEM, csrPEM, _, err := certauth.BuildSessionCSR(server.TrustDomain, server.Namespace, liveInventoryProbeSessionName())
if err != nil {
return nil, err
}
certPEM, caPEM, err := p.access.issueSessionCertificate(
ctx,
server.Namespace,
liveInventoryProbeSessionName(),
server.TrustDomain,
string(csrPEM),
)
if err != nil {
return nil, err
}
client, expiresAt, err = tlsHTTPClientFromPEM(keyPEM, []byte(certPEM), []byte(caPEM), liveInventoryProbeTimeout)
if err != nil {
return nil, err
}
}
p.mtlsClients[cacheKey] = &cachedClient{
client: client,
expiresAt: expiresAt,
}
return client, nil
}| func tlsHTTPClientFromFiles(certFile, keyFile, caFile string, timeout time.Duration) (*http.Client, error) { | ||
| cert, err := tls.LoadX509KeyPair(certFile, keyFile) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("load mTLS client key pair: %w", err) | ||
| } | ||
| tlsConfig := &tls.Config{ | ||
| MinVersion: tls.VersionTLS12, | ||
| Certificates: []tls.Certificate{cert}, | ||
| } | ||
| if caFile != "" { | ||
| caPEM, err := os.ReadFile(caFile) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("read mTLS client CA bundle: %w", err) | ||
| } | ||
| pool := x509.NewCertPool() | ||
| if !pool.AppendCertsFromPEM(caPEM) { | ||
| return nil, fmt.Errorf("parse mTLS client CA bundle") | ||
| } | ||
| tlsConfig.RootCAs = pool | ||
| } | ||
| return &http.Client{ | ||
| Timeout: timeout, | ||
| Transport: &http.Transport{ | ||
| TLSClientConfig: tlsConfig, | ||
| }, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
Modify tlsHTTPClientFromFiles to parse and return the leaf certificate's expiration time (NotAfter) so that we can cache the client until the certificate expires.
func tlsHTTPClientFromFiles(certFile, keyFile, caFile string, timeout time.Duration) (*http.Client, time.Time, error) {
cert, err := tls.LoadX509KeyPair(certFile, keyFile)
if err != nil {
return nil, time.Time{}, fmt.Errorf("load mTLS client key pair: %w", err)
}
var expiresAt time.Time
if len(cert.Certificate) > 0 {
if leaf, err := x509.ParseCertificate(cert.Certificate[0]); err == nil {
expiresAt = leaf.NotAfter
}
}
tlsConfig := &tls.Config{
MinVersion: tls.VersionTLS12,
Certificates: []tls.Certificate{cert},
}
if caFile != "" {
caPEM, err := os.ReadFile(caFile)
if err != nil {
return nil, time.Time{}, fmt.Errorf("read mTLS client CA bundle: %w", err)
}
pool := x509.NewCertPool()
if !pool.AppendCertsFromPEM(caPEM) {
return nil, time.Time{}, fmt.Errorf("parse mTLS client CA bundle")
}
tlsConfig.RootCAs = pool
}
client := &http.Client{
Timeout: timeout,
Transport: &http.Transport{
TLSClientConfig: tlsConfig,
},
}
return client, expiresAt, nil
}| func tlsHTTPClientFromPEM(keyPEM, certPEM, caPEM []byte, timeout time.Duration) (*http.Client, error) { | ||
| cert, err := tls.X509KeyPair(certPEM, keyPEM) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("load mTLS client key pair: %w", err) | ||
| } | ||
| tlsConfig := &tls.Config{ | ||
| MinVersion: tls.VersionTLS12, | ||
| Certificates: []tls.Certificate{cert}, | ||
| } | ||
| if len(caPEM) > 0 { | ||
| pool := x509.NewCertPool() | ||
| if !pool.AppendCertsFromPEM(caPEM) { | ||
| return nil, fmt.Errorf("parse mTLS client CA bundle") | ||
| } | ||
| tlsConfig.RootCAs = pool | ||
| } | ||
| return &http.Client{ | ||
| Timeout: timeout, | ||
| Transport: &http.Transport{ | ||
| TLSClientConfig: tlsConfig, | ||
| }, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
Modify tlsHTTPClientFromPEM to parse and return the leaf certificate's expiration time (NotAfter) so that we can cache the client until the certificate expires.
func tlsHTTPClientFromPEM(keyPEM, certPEM, caPEM []byte, timeout time.Duration) (*http.Client, time.Time, error) {
cert, err := tls.X509KeyPair(certPEM, keyPEM)
if err != nil {
return nil, time.Time{}, fmt.Errorf("load mTLS client key pair: %w", err)
}
var expiresAt time.Time
if len(cert.Certificate) > 0 {
if leaf, err := x509.ParseCertificate(cert.Certificate[0]); err == nil {
expiresAt = leaf.NotAfter
}
}
tlsConfig := &tls.Config{
MinVersion: tls.VersionTLS12,
Certificates: []tls.Certificate{cert},
}
if len(caPEM) > 0 {
pool := x509.NewCertPool()
if !pool.AppendCertsFromPEM(caPEM) {
return nil, time.Time{}, fmt.Errorf("parse mTLS client CA bundle")
}
tlsConfig.RootCAs = pool
}
client := &http.Client{
Timeout: timeout,
Transport: &http.Transport{
TLSClientConfig: tlsConfig,
},
}
return client, expiresAt, nil
}Project auth.mode and trustDomain through ServerInfo so live inventory can avoid spoofable governance headers on mtls servers. For mtls targets, probe via the public https ingress with a session-bound client certificate issued through the shared cert-manager path (or env-mounted cert files), instead of the in-cluster HTTP header identity shortcut used for header-mode servers. Co-authored-by: Cursor <cursoragent@cursor.com>
Reuse the per-server mTLS *http.Client (and its connection pool) until
the client certificate nears expiry instead of rebuilding the transport
on every probe. For issuer-backed certs this also avoids re-issuing a
CertificateRequest every probe interval. tlsHTTPClientFrom{Files,PEM}
now return the leaf NotAfter so the cache can expire entries.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
29bfc85 to
e319e45
Compare
Summary
authModeandtrustDomainthroughcontrolplane.ServerInfo.X-MCP-Human-ID/X-MCP-Agent-IDheaders when the target server usesauth.mode: mtls.Dependencies
pkg/identity_certauth) for sharedpkg/identity/pkg/certauthhelpers.Test plan
go test -race -count=1 ./pkg/controlplane/...go test -race -count=1 ./internal/runtimeapi/...(inservices/runtime-api)Part of post-#331 follow-up queue (#5). Next: live data-path E2E (
mtlsscenario).Made with Cursor