Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,8 @@ kubectl delete pod -n restate-operator -l app=restate-operator
- `image.repository` - Image repository (default: `ghcr.io/restatedev/restate-operator`)
- `image.pullPolicy` - Pull policy (default: `IfNotPresent`)
- `awsPodIdentityAssociationCluster` - Enables EKS Pod Identity support
- `gcpWorkloadIdentity` - Enables GCP Workload Identity via Config Connector
- `canaryImage` - Container image for canary jobs (default: `busybox:uclibc`); must provide `grep` and `wget`
- `operatorNamespace` - Namespace where operator runs
- `operatorLabelName/Value` - Labels for network policy selectors

Expand Down
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -642,6 +642,27 @@ cluster name, by setting `awsPodIdentityAssociationCluster` in the helm chart. I
be installed or the operator will fail to start. Then, you may provide `awsPodIdentityAssociationRoleArn` in
the `RestateCluster` spec.

### Canary Image

Both EKS Pod Identity and GCP Workload Identity use a canary job to validate that credentials are available before
starting the Restate cluster. By default, this uses the `busybox:uclibc` image from Docker Hub. In environments where
nodes cannot pull from Docker Hub (e.g. air-gapped or restricted registries), you can override this with the
`canaryImage` Helm value:

```yaml
canaryImage: my-private-registry.example.com/busybox:uclibc
```

The simplest approach is to mirror the default image:

```bash
docker pull busybox:uclibc
docker tag busybox:uclibc my-private-registry.example.com/busybox:uclibc
docker push my-private-registry.example.com/busybox:uclibc
```

If using a different base image, it must provide `grep` and `wget`.

### EKS Security Groups for Pods

[EKS Security Groups for Pods](https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html) allows
Expand Down
4 changes: 4 additions & 0 deletions charts/restate-operator-helm/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ spec:
- name: GCP_WORKLOAD_IDENTITY
value: "true"
{{- end }}
{{- if .Values.canaryImage }}
- name: CANARY_IMAGE
value: {{ .Values.canaryImage }}
{{- end }}
{{- if .Values.clusterDns }}
- name: CLUSTER_DNS
value: {{ .Values.clusterDns }}
Expand Down
1 change: 1 addition & 0 deletions charts/restate-operator-helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ podAnnotations: {}
awsPodIdentityAssociationCluster: null
gcpWorkloadIdentity: null
clusterDns: null # defaults to "cluster.local" in the operator binary
canaryImage: null # defaults to "busybox:uclibc"; image must provide grep and wget

podSecurityContext:
fsGroup: 2000
Expand Down
40 changes: 40 additions & 0 deletions release-notes/unreleased/94-configurable-canary-image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Release Notes for Issue #94: Configurable canary image

## New Feature

### What Changed
The container image used for PIA and Workload Identity canary jobs is now
configurable via the `canaryImage` Helm value, `CANARY_IMAGE` environment
variable, or `--canary-image` CLI flag. Previously `busybox:uclibc` was
hardcoded, which fails in environments that cannot pull from Docker Hub.

### Why This Matters
Air-gapped or restricted environments require all images to be pulled from
a private registry. The hardcoded image caused canary pods to enter
ImagePullBackOff, blocking RestateCluster reconciliation.

### Impact on Users
- **Existing deployments**: No impact. The default remains `busybox:uclibc`.
- **Restricted environments**: Can now point to a private registry mirror.

### Migration Guidance
If your nodes cannot pull from Docker Hub, set the canary image in your
Helm values:

```yaml
canaryImage: my-registry.example.com/busybox:uclibc
```

The simplest approach is to mirror the default image to your private registry:

```bash
docker pull busybox:uclibc
docker tag busybox:uclibc my-registry.example.com/busybox:uclibc
docker push my-registry.example.com/busybox:uclibc
```

If using a different image, it must provide `grep` and `wget` (used by the
AWS PIA and GCP Workload Identity canary jobs respectively).

### Related Issues
- Issue #94: Cannot configure image URI for PIA canary pods
6 changes: 6 additions & 0 deletions src/controllers/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,13 @@ pub struct State {

/// The cluster DNS suffix (e.g. "cluster.local")
pub cluster_dns: String,

/// The container image to use for canary jobs
pub canary_image: String,
}

/// State wrapper around the controller outputs for the web server
#[allow(clippy::too_many_arguments)]
impl State {
pub fn new(
aws_pod_identity_association_cluster: Option<String>,
Expand All @@ -65,6 +69,7 @@ impl State {
operator_label_value: Option<String>,
tunnel_client_default_image: String,
cluster_dns: String,
canary_image: String,
) -> Self {
Self {
diagnostics: Arc::new(RwLock::new(Diagnostics::default())),
Expand All @@ -76,6 +81,7 @@ impl State {
operator_label_value,
tunnel_client_default_image,
cluster_dns,
canary_image,
}
}

Expand Down
3 changes: 3 additions & 0 deletions src/controllers/restatecluster/controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ pub(super) struct Context {
pub gcp_workload_identity: bool,
/// The cluster DNS suffix (e.g. "cluster.local")
pub cluster_dns: String,
/// The container image to use for canary jobs
pub canary_image: String,
/// Diagnostics read by the web server
pub diagnostics: Arc<RwLock<Diagnostics>>,
/// Prometheus metrics
Expand Down Expand Up @@ -108,6 +110,7 @@ impl Context {
secret_provider_class_installed,
gcp_workload_identity: state.gcp_workload_identity,
cluster_dns: state.cluster_dns.clone(),
canary_image: state.canary_image.clone(),
diagnostics: state.diagnostics.clone(),
metrics,
})
Expand Down
18 changes: 16 additions & 2 deletions src/controllers/restatecluster/reconcilers/compute.rs
Original file line number Diff line number Diff line change
Expand Up @@ -557,6 +557,7 @@ pub async fn reconcile_compute(
spec.compute.tolerations.as_ref(),
&job_api,
&pod_api,
&ctx.canary_image,
)
.await?;

Expand Down Expand Up @@ -672,6 +673,7 @@ pub async fn reconcile_compute(
base_metadata,
spec.compute.tolerations.as_ref(),
&job_api,
&ctx.canary_image,
)
.await?;

Expand Down Expand Up @@ -825,6 +827,8 @@ async fn apply_pod_identity_association(
struct CanaryConfig {
/// Job name, e.g. "restate-pia-canary"
name: &'static str,
/// Container image to use for the canary pod
image: String,
/// Command to run in the canary container
command: Vec<String>,
/// Reason prefix for NotReady conditions, e.g. "PodIdentityAssociation"
Expand Down Expand Up @@ -859,7 +863,7 @@ fn canary_job_spec(
service_account_name: Some("restate".into()),
containers: vec![Container {
name: "canary".into(),
image: Some("busybox:uclibc".into()),
image: Some(config.image.clone()),
command: Some(config.command.clone()),
..Default::default()
}],
Expand Down Expand Up @@ -965,9 +969,11 @@ async fn check_pia(
tolerations: Option<&Vec<Toleration>>,
job_api: &Api<Job>,
pod_api: &Api<Pod>,
canary_image: &str,
) -> Result<(), Error> {
let config = CanaryConfig {
name: "restate-pia-canary",
image: canary_image.into(),
command: vec![
"grep".into(),
"-q".into(),
Expand Down Expand Up @@ -1182,9 +1188,11 @@ async fn check_workload_identity(
base_metadata: &ObjectMeta,
tolerations: Option<&Vec<Toleration>>,
job_api: &Api<Job>,
canary_image: &str,
) -> Result<(), Error> {
let config = CanaryConfig {
name: "restate-wi-canary",
image: canary_image.into(),
command: vec![
"wget".into(),
"--header".into(),
Expand Down Expand Up @@ -1680,6 +1688,7 @@ mod tests {
fn test_canary_job_spec_structure() {
let config = CanaryConfig {
name: "test-canary",
image: "my-registry/busybox:latest".into(),
command: vec!["echo".into(), "hello".into()],
reason_prefix: "Test",
failure_message: "test failed",
Expand All @@ -1697,7 +1706,10 @@ mod tests {

let container = &pod_spec.containers[0];
assert_eq!(container.name, "canary");
assert_eq!(container.image.as_deref(), Some("busybox:uclibc"));
assert_eq!(
container.image.as_deref(),
Some("my-registry/busybox:latest")
);
assert_eq!(
container.command.as_ref().unwrap(),
&vec!["echo".to_string(), "hello".to_string()]
Expand All @@ -1708,6 +1720,7 @@ mod tests {
fn test_canary_job_spec_label() {
let config = CanaryConfig {
name: "my-canary",
image: "busybox:uclibc".into(),
command: vec!["true".into()],
reason_prefix: "Test",
failure_message: "",
Expand All @@ -1734,6 +1747,7 @@ mod tests {
}];
let config = CanaryConfig {
name: "test-canary",
image: "busybox:uclibc".into(),
command: vec!["true".into()],
reason_prefix: "Test",
failure_message: "",
Expand Down
9 changes: 9 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,14 @@ struct Arguments {
default_value = "cluster.local"
)]
cluster_dns: String,

#[arg(
long = "canary-image",
env = "CANARY_IMAGE",
value_name = "IMAGE",
default_value = "busybox:uclibc"
)]
canary_image: String,
}

#[get("/metrics")]
Expand Down Expand Up @@ -109,6 +117,7 @@ async fn main() -> anyhow::Result<()> {
args.operator_label_value,
args.tunnel_client_default_image,
args.cluster_dns,
args.canary_image,
);

let client = Client::try_default()
Expand Down
Loading