fix(site-agent): use mTLS for Flow gRPC and protect temporal-certs secret#2942
fix(site-agent): use mTLS for Flow gRPC and protect temporal-certs secret#2942shayan1995 wants to merge 1 commit into
Conversation
Summary by CodeRabbit
WalkthroughThe Helm chart now preserves the Temporal client TLS placeholder secret across upgrades and adds Flow gRPC environment values. The Flow gRPC client now returns its client certificate through ChangesFlow deployment and startup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-06-27 00:16:55 UTC | Commit: 22fb1d3 |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
rest-api/site-workflow/pkg/grpc/client/flow_client.go (1)
169-173: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low valueOptional hardening: consider pinning
MinVersion. Static analysis notes the absence ofMinVersion. As a client, Go already negotiates a TLS 1.2 floor, so this is not a defect — but explicitly settingMinVersion: tls.VersionTLS13would make the security posture intentional rather than implicit, provided the Flow server supports it.🔒 Optional diff
mutualTLSConfig := &tls.Config{ + MinVersion: tls.VersionTLS13, GetClientCertificate: func(*tls.CertificateRequestInfo) (*tls.Certificate, error) { return &clientCert, nil }, RootCAs: capool, }Please confirm the Flow gRPC server negotiates TLS 1.3 before pinning, to avoid handshake regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest-api/site-workflow/pkg/grpc/client/flow_client.go` around lines 169 - 173, The mutual TLS setup in flow client construction should explicitly set a minimum TLS version if the Flow gRPC server supports it. Review the tls.Config used with mutualTLSConfig and, before pinning, confirm server-side TLS 1.3 negotiation is available; if it is, add MinVersion to the tls.Config to make the security posture explicit without regressing handshakes.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@helm-prereqs/setup.sh`:
- Around line 807-816: The Docker config JSON in the registry pull secret setup
is being built with printf in a way that can emit invalid JSON when
REGISTRY_PULL_USERNAME or REGISTRY_PULL_SECRET contains special characters.
Update the shell logic in the registry secret construction block to use a proper
JSON encoder or equivalent escaping step before base64 encoding, keeping the
existing NICO_FLOW_ARGS and image-pull-secret wiring intact.
In `@helm/rest/nico-rest-site-agent/templates/temporal-certs-secret.yaml`:
- Around line 4-6: Clarify the keep-policy behavior in the Secret template
comments for temporal-certs-secret.yaml: `helm.sh/resource-policy: keep`
preserves the Secret on uninstall, so a reinstall will reuse the existing object
unless it is manually deleted first. Update the explanatory text near the kept
Secret to reflect this behavior accurately, using the Secret template and its
bootstrap placeholder context as the reference point.
---
Nitpick comments:
In `@rest-api/site-workflow/pkg/grpc/client/flow_client.go`:
- Around line 169-173: The mutual TLS setup in flow client construction should
explicitly set a minimum TLS version if the Flow gRPC server supports it. Review
the tls.Config used with mutualTLSConfig and, before pinning, confirm
server-side TLS 1.3 negotiation is available; if it is, add MinVersion to the
tls.Config to make the security posture explicit without regressing handshakes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b4eeeb77-48f7-4aa0-836d-db000ecd1635
📒 Files selected for processing (6)
helm-prereqs/setup.shhelm/rest/nico-rest-site-agent/templates/temporal-certs-secret.yamlhelm/rest/nico-rest-site-agent/values.yamlrest-api/site-agent/pkg/components/managers/flowgrpc/init.gorest-api/site-agent/pkg/components/managers/flowgrpc/metrics.gorest-api/site-workflow/pkg/grpc/client/flow_client.go
…cret
Three related issues caused the site-agent to fail connecting to Flow gRPC:
1. FLOW_GRPC_SEC_OPT was missing from the site-agent Helm chart values, so
the Flow gRPC client defaulted to FlowServerTLS (1). Flow requires mTLS
(RequireAndVerifyClientCert); the server rejected the connection with
"tls: certificate required". Fix: add FLOW_GRPC_SEC_OPT: "2" as the
chart default so FlowMutualTLS is used by default.
2. The FlowMutualTLS case in NewFlowGrpcClient used tls.Config{Certificates: ...}
to present the client cert. Go's TLS stack only selects a cert from that
field when the cert's issuer matches the server's CertificateRequest acceptable
CA list; when there is no match it silently sends no cert, and the server
again rejects with "certificate required". Fix: use GetClientCertificate
instead, which unconditionally returns the cert — the same approach used in
rest-api/flow/pkg/certs/certs.go TLSConfig().
3. temporal-certs-secret.yaml created the secret with empty placeholder values
on every helm upgrade, overwriting certs written by the bootstrap process.
On the next pod restart the site-agent found the OTP gone, re-attempted
bootstrap (which fails because the OTP was already consumed), and blocked
forever — preventing FlowGrpc().Start() from running. Fix: add
helm.sh/resource-policy: keep so upgrades never touch this secret after
initial creation.
22fb1d3 to
2f521ef
Compare
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
helm/rest/nico-rest-site-agent/templates/temporal-certs-secret.yaml (1)
4-6: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winCorrect the reinstall behavior note for the kept Secret.
Line 6 still implies a release delete/reinstall creates a fresh placeholder. With
helm.sh/resource-policy: keep, Helm skips deletion when an operation would remove the resource, so the kept Secret can survive and be reused unless the Secret itself is deleted first. (v3-1-0.helm.sh)As per path instructions,
helm/**: “Review Helm charts for template correctness, values compatibility, Kubernetes API compatibility, security context, RBAC, probes, and upgrade behavior.”Suggested wording
-# bootstrap process; deleting and re-installing the release creates a fresh placeholder. +# bootstrap process. The kept Secret also survives uninstall; delete the Secret +# explicitly before reinstalling if a fresh placeholder is required.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@helm/rest/nico-rest-site-agent/templates/temporal-certs-secret.yaml` around lines 4 - 6, The comment in temporal-certs-secret.yaml overstates reinstall behavior for the kept Secret. Update the explanatory text around the helm.sh/resource-policy: keep note to say that Helm preserves the Secret across delete/reinstall flows unless the Secret itself is removed first, and avoid claiming a fresh placeholder is automatically created. Use the existing Temporal client TLS certs / bootstrap placeholder wording so the guidance matches the template’s actual behavior.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@helm/rest/nico-rest-site-agent/templates/temporal-certs-secret.yaml`:
- Around line 4-6: The comment in temporal-certs-secret.yaml overstates
reinstall behavior for the kept Secret. Update the explanatory text around the
helm.sh/resource-policy: keep note to say that Helm preserves the Secret across
delete/reinstall flows unless the Secret itself is removed first, and avoid
claiming a fresh placeholder is automatically created. Use the existing Temporal
client TLS certs / bootstrap placeholder wording so the guidance matches the
template’s actual behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 43ce878f-8cda-407c-9e1f-e31238e9e4e8
📒 Files selected for processing (3)
helm/rest/nico-rest-site-agent/templates/temporal-certs-secret.yamlhelm/rest/nico-rest-site-agent/values.yamlrest-api/site-workflow/pkg/grpc/client/flow_client.go
🚧 Files skipped from review as they are similar to previous changes (1)
- helm/rest/nico-rest-site-agent/values.yaml
Three fixes for site-agent → Flow gRPC connectivity:
flow_client.go: switchFlowMutualTLSfromtls.Config{Certificates}toGetClientCertificate— Go silently drops the cert when the issuer doesn't match the server's CA list, causingtls: certificate requiredvalues.yaml: addFLOW_GRPC_SEC_OPT: "2"as chart default (was unset, defaulted to server-TLS only)temporal-certs-secret.yaml: addhelm.sh/resource-policy: keep— template was overwriting bootstrap-written certs on everyhelm upgrade, causing bootstrap to retry forever and block Flow gRPC startupRelated issues
Type of Change
Breaking Changes
Testing
Verified on dev6: site-agent pods reach
1/1 Runningwith 0 restarts and logs showFlowGrpcClient: Successfully connected to server.helm upgradeno longer overwritestemporal-client-site-agent-certs.Additional Notes