Skip to content

Commit 2f521ef

Browse files
committed
fix(site-agent): use mTLS for Flow gRPC and protect temporal-certs secret
Three related issues caused the site-agent to fail connecting to Flow gRPC: 1. FLOW_GRPC_SEC_OPT was missing from the site-agent Helm chart values, so the Flow gRPC client defaulted to FlowServerTLS (1). Flow requires mTLS (RequireAndVerifyClientCert); the server rejected the connection with "tls: certificate required". Fix: add FLOW_GRPC_SEC_OPT: "2" as the chart default so FlowMutualTLS is used by default. 2. The FlowMutualTLS case in NewFlowGrpcClient used tls.Config{Certificates: ...} to present the client cert. Go's TLS stack only selects a cert from that field when the cert's issuer matches the server's CertificateRequest acceptable CA list; when there is no match it silently sends no cert, and the server again rejects with "certificate required". Fix: use GetClientCertificate instead, which unconditionally returns the cert — the same approach used in rest-api/flow/pkg/certs/certs.go TLSConfig(). 3. temporal-certs-secret.yaml created the secret with empty placeholder values on every helm upgrade, overwriting certs written by the bootstrap process. On the next pod restart the site-agent found the OTP gone, re-attempted bootstrap (which fails because the OTP was already consumed), and blocked forever — preventing FlowGrpc().Start() from running. Fix: add helm.sh/resource-policy: keep so upgrades never touch this secret after initial creation.
1 parent 45cb7a1 commit 2f521ef

3 files changed

Lines changed: 19 additions & 3 deletions

File tree

helm/rest/nico-rest-site-agent/templates/temporal-certs-secret.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4-
# Temporal client TLS certs for site-agent — placeholder values populated by bootstrap
4+
# Temporal client TLS certs for site-agent — placeholder values populated by bootstrap.
5+
# resource-policy: keep prevents helm upgrade from overwriting certs written by the
6+
# bootstrap process; deleting and re-installing the release creates a fresh placeholder.
57
apiVersion: v1
68
kind: Secret
79
metadata:
810
name: {{ .Values.secrets.temporalClientCerts }}
911
namespace: {{ include "nico-rest-site-agent.namespace" . }}
1012
labels:
1113
{{- include "nico-rest-site-agent.labels" . | nindent 4 }}
14+
annotations:
15+
helm.sh/resource-policy: keep
1216
type: Opaque
1317
stringData:
1418
otp: ""

helm/rest/nico-rest-site-agent/values.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,8 @@ envConfig:
8282
ENABLE_TLS: "true"
8383
NICO_ADDRESS: ""
8484
NICO_SEC_OPT: "0"
85+
FLOW_GRPC_ENABLED: "false"
86+
FLOW_GRPC_SEC_OPT: "2"
8587
CLUSTER_ID: ""
8688
TEMPORAL_HOST: "temporal-frontend.temporal"
8789
TEMPORAL_PORT: "7233"

rest-api/site-workflow/pkg/grpc/client/flow_client.go

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,19 @@ func NewFlowGrpcClient(config *FlowGrpcClientConfig) (client *FlowGrpcClient, er
158158
if !capool.AppendCertsFromPEM(cabytes) {
159159
return nil, fmt.Errorf("FlowGrpcClient: Failed to append CA cert to CA pool")
160160
}
161+
// Use GetClientCertificate (not Certificates) to unconditionally present
162+
// the client cert. With Certificates, Go's TLS stack only selects a cert
163+
// whose issuer matches the acceptable CA list from the server's
164+
// CertificateRequest; when no match is found it silently sends no cert,
165+
// causing the server to reject with "tls: certificate required".
166+
// GetClientCertificate bypasses that matching and always returns the cert,
167+
// leaving verification to the server — the same approach used in
168+
// rest-api/flow/pkg/certs/certs.go TLSConfig().
161169
mutualTLSConfig := &tls.Config{
162-
Certificates: []tls.Certificate{clientCert},
163-
RootCAs: capool,
170+
GetClientCertificate: func(*tls.CertificateRequestInfo) (*tls.Certificate, error) {
171+
return &clientCert, nil
172+
},
173+
RootCAs: capool,
164174
}
165175
creds := credentials.NewTLS(mutualTLSConfig)
166176

0 commit comments

Comments
 (0)