Skip to content

Commit f8acf95

Browse files
authored
Merge pull request #106 from restatedev/feat/configure-canary-image
feat: allow canary image to be configured in helm
2 parents 6864e7e + 7342dfd commit f8acf95

9 files changed

Lines changed: 102 additions & 2 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,8 @@ kubectl delete pod -n restate-operator -l app=restate-operator
260260
- `image.repository` - Image repository (default: `ghcr.io/restatedev/restate-operator`)
261261
- `image.pullPolicy` - Pull policy (default: `IfNotPresent`)
262262
- `awsPodIdentityAssociationCluster` - Enables EKS Pod Identity support
263+
- `gcpWorkloadIdentity` - Enables GCP Workload Identity via Config Connector
264+
- `canaryImage` - Container image for canary jobs (default: `busybox:uclibc`); must provide `grep` and `wget`
263265
- `operatorNamespace` - Namespace where operator runs
264266
- `operatorLabelName/Value` - Labels for network policy selectors
265267

README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -642,6 +642,27 @@ cluster name, by setting `awsPodIdentityAssociationCluster` in the helm chart. I
642642
be installed or the operator will fail to start. Then, you may provide `awsPodIdentityAssociationRoleArn` in
643643
the `RestateCluster` spec.
644644

645+
### Canary Image
646+
647+
Both EKS Pod Identity and GCP Workload Identity use a canary job to validate that credentials are available before
648+
starting the Restate cluster. By default, this uses the `busybox:uclibc` image from Docker Hub. In environments where
649+
nodes cannot pull from Docker Hub (e.g. air-gapped or restricted registries), you can override this with the
650+
`canaryImage` Helm value:
651+
652+
```yaml
653+
canaryImage: my-private-registry.example.com/busybox:uclibc
654+
```
655+
656+
The simplest approach is to mirror the default image:
657+
658+
```bash
659+
docker pull busybox:uclibc
660+
docker tag busybox:uclibc my-private-registry.example.com/busybox:uclibc
661+
docker push my-private-registry.example.com/busybox:uclibc
662+
```
663+
664+
If using a different base image, it must provide `grep` and `wget`.
665+
645666
### EKS Security Groups for Pods
646667

647668
[EKS Security Groups for Pods](https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html) allows

charts/restate-operator-helm/templates/deployment.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,10 @@ spec:
6464
- name: GCP_WORKLOAD_IDENTITY
6565
value: "true"
6666
{{- end }}
67+
{{- if .Values.canaryImage }}
68+
- name: CANARY_IMAGE
69+
value: {{ .Values.canaryImage }}
70+
{{- end }}
6771
{{- if .Values.clusterDns }}
6872
- name: CLUSTER_DNS
6973
value: {{ .Values.clusterDns }}

charts/restate-operator-helm/values.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ podAnnotations: {}
1616
awsPodIdentityAssociationCluster: null
1717
gcpWorkloadIdentity: null
1818
clusterDns: null # defaults to "cluster.local" in the operator binary
19+
canaryImage: null # defaults to "busybox:uclibc"; image must provide grep and wget
1920

2021
podSecurityContext:
2122
fsGroup: 2000
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Release Notes for Issue #94: Configurable canary image
2+
3+
## New Feature
4+
5+
### What Changed
6+
The container image used for PIA and Workload Identity canary jobs is now
7+
configurable via the `canaryImage` Helm value, `CANARY_IMAGE` environment
8+
variable, or `--canary-image` CLI flag. Previously `busybox:uclibc` was
9+
hardcoded, which fails in environments that cannot pull from Docker Hub.
10+
11+
### Why This Matters
12+
Air-gapped or restricted environments require all images to be pulled from
13+
a private registry. The hardcoded image caused canary pods to enter
14+
ImagePullBackOff, blocking RestateCluster reconciliation.
15+
16+
### Impact on Users
17+
- **Existing deployments**: No impact. The default remains `busybox:uclibc`.
18+
- **Restricted environments**: Can now point to a private registry mirror.
19+
20+
### Migration Guidance
21+
If your nodes cannot pull from Docker Hub, set the canary image in your
22+
Helm values:
23+
24+
```yaml
25+
canaryImage: my-registry.example.com/busybox:uclibc
26+
```
27+
28+
The simplest approach is to mirror the default image to your private registry:
29+
30+
```bash
31+
docker pull busybox:uclibc
32+
docker tag busybox:uclibc my-registry.example.com/busybox:uclibc
33+
docker push my-registry.example.com/busybox:uclibc
34+
```
35+
36+
If using a different image, it must provide `grep` and `wget` (used by the
37+
AWS PIA and GCP Workload Identity canary jobs respectively).
38+
39+
### Related Issues
40+
- Issue #94: Cannot configure image URI for PIA canary pods

src/controllers/mod.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,13 @@ pub struct State {
5353

5454
/// The cluster DNS suffix (e.g. "cluster.local")
5555
pub cluster_dns: String,
56+
57+
/// The container image to use for canary jobs
58+
pub canary_image: String,
5659
}
5760

5861
/// State wrapper around the controller outputs for the web server
62+
#[allow(clippy::too_many_arguments)]
5963
impl State {
6064
pub fn new(
6165
aws_pod_identity_association_cluster: Option<String>,
@@ -65,6 +69,7 @@ impl State {
6569
operator_label_value: Option<String>,
6670
tunnel_client_default_image: String,
6771
cluster_dns: String,
72+
canary_image: String,
6873
) -> Self {
6974
Self {
7075
diagnostics: Arc::new(RwLock::new(Diagnostics::default())),
@@ -76,6 +81,7 @@ impl State {
7681
operator_label_value,
7782
tunnel_client_default_image,
7883
cluster_dns,
84+
canary_image,
7985
}
8086
}
8187

src/controllers/restatecluster/controller.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,8 @@ pub(super) struct Context {
7676
pub gcp_workload_identity: bool,
7777
/// The cluster DNS suffix (e.g. "cluster.local")
7878
pub cluster_dns: String,
79+
/// The container image to use for canary jobs
80+
pub canary_image: String,
7981
/// Diagnostics read by the web server
8082
pub diagnostics: Arc<RwLock<Diagnostics>>,
8183
/// Prometheus metrics
@@ -108,6 +110,7 @@ impl Context {
108110
secret_provider_class_installed,
109111
gcp_workload_identity: state.gcp_workload_identity,
110112
cluster_dns: state.cluster_dns.clone(),
113+
canary_image: state.canary_image.clone(),
111114
diagnostics: state.diagnostics.clone(),
112115
metrics,
113116
})

src/controllers/restatecluster/reconcilers/compute.rs

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -557,6 +557,7 @@ pub async fn reconcile_compute(
557557
spec.compute.tolerations.as_ref(),
558558
&job_api,
559559
&pod_api,
560+
&ctx.canary_image,
560561
)
561562
.await?;
562563

@@ -672,6 +673,7 @@ pub async fn reconcile_compute(
672673
base_metadata,
673674
spec.compute.tolerations.as_ref(),
674675
&job_api,
676+
&ctx.canary_image,
675677
)
676678
.await?;
677679

@@ -825,6 +827,8 @@ async fn apply_pod_identity_association(
825827
struct CanaryConfig {
826828
/// Job name, e.g. "restate-pia-canary"
827829
name: &'static str,
830+
/// Container image to use for the canary pod
831+
image: String,
828832
/// Command to run in the canary container
829833
command: Vec<String>,
830834
/// Reason prefix for NotReady conditions, e.g. "PodIdentityAssociation"
@@ -859,7 +863,7 @@ fn canary_job_spec(
859863
service_account_name: Some("restate".into()),
860864
containers: vec![Container {
861865
name: "canary".into(),
862-
image: Some("busybox:uclibc".into()),
866+
image: Some(config.image.clone()),
863867
command: Some(config.command.clone()),
864868
..Default::default()
865869
}],
@@ -965,9 +969,11 @@ async fn check_pia(
965969
tolerations: Option<&Vec<Toleration>>,
966970
job_api: &Api<Job>,
967971
pod_api: &Api<Pod>,
972+
canary_image: &str,
968973
) -> Result<(), Error> {
969974
let config = CanaryConfig {
970975
name: "restate-pia-canary",
976+
image: canary_image.into(),
971977
command: vec![
972978
"grep".into(),
973979
"-q".into(),
@@ -1182,9 +1188,11 @@ async fn check_workload_identity(
11821188
base_metadata: &ObjectMeta,
11831189
tolerations: Option<&Vec<Toleration>>,
11841190
job_api: &Api<Job>,
1191+
canary_image: &str,
11851192
) -> Result<(), Error> {
11861193
let config = CanaryConfig {
11871194
name: "restate-wi-canary",
1195+
image: canary_image.into(),
11881196
command: vec![
11891197
"wget".into(),
11901198
"--header".into(),
@@ -1680,6 +1688,7 @@ mod tests {
16801688
fn test_canary_job_spec_structure() {
16811689
let config = CanaryConfig {
16821690
name: "test-canary",
1691+
image: "my-registry/busybox:latest".into(),
16831692
command: vec!["echo".into(), "hello".into()],
16841693
reason_prefix: "Test",
16851694
failure_message: "test failed",
@@ -1697,7 +1706,10 @@ mod tests {
16971706

16981707
let container = &pod_spec.containers[0];
16991708
assert_eq!(container.name, "canary");
1700-
assert_eq!(container.image.as_deref(), Some("busybox:uclibc"));
1709+
assert_eq!(
1710+
container.image.as_deref(),
1711+
Some("my-registry/busybox:latest")
1712+
);
17011713
assert_eq!(
17021714
container.command.as_ref().unwrap(),
17031715
&vec!["echo".to_string(), "hello".to_string()]
@@ -1708,6 +1720,7 @@ mod tests {
17081720
fn test_canary_job_spec_label() {
17091721
let config = CanaryConfig {
17101722
name: "my-canary",
1723+
image: "busybox:uclibc".into(),
17111724
command: vec!["true".into()],
17121725
reason_prefix: "Test",
17131726
failure_message: "",
@@ -1734,6 +1747,7 @@ mod tests {
17341747
}];
17351748
let config = CanaryConfig {
17361749
name: "test-canary",
1750+
image: "busybox:uclibc".into(),
17371751
command: vec!["true".into()],
17381752
reason_prefix: "Test",
17391753
failure_message: "",

src/main.rs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,14 @@ struct Arguments {
6161
default_value = "cluster.local"
6262
)]
6363
cluster_dns: String,
64+
65+
#[arg(
66+
long = "canary-image",
67+
env = "CANARY_IMAGE",
68+
value_name = "IMAGE",
69+
default_value = "busybox:uclibc"
70+
)]
71+
canary_image: String,
6472
}
6573

6674
#[get("/metrics")]
@@ -109,6 +117,7 @@ async fn main() -> anyhow::Result<()> {
109117
args.operator_label_value,
110118
args.tunnel_client_default_image,
111119
args.cluster_dns,
120+
args.canary_image,
112121
);
113122

114123
let client = Client::try_default()

0 commit comments

Comments
 (0)