Skip to content

OSAC-669: Apply PinnedImageSet to handle LZ registry failover#381

Open
danielerez wants to merge 1 commit into
mainfrom
OSAC-669_PinnedImageSets
Open

OSAC-669: Apply PinnedImageSet to handle LZ registry failover#381
danielerez wants to merge 1 commit into
mainfrom
OSAC-669_PinnedImageSets

Conversation

@danielerez
Copy link
Copy Markdown
Contributor

@danielerez danielerez commented May 20, 2026

To ensure the mgmt cluster can recover from node reboots and crashes even without an available registry, image resilience is necessary. This can be achieved by pinning the required images using the PinnedImageSet functionality, as detailed in the OpenShift documentation on Pinning and Preloading Images [*].
This action stops CRI-O from triggering Garbage Collection for the images, and thus ensuring the the required images exist for getting the cluster back to an healthy status.

[*] https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html/machine_configuration/machine-config-pin-preload-images-about#machine-config-pin-preload-images-about

Summary by CodeRabbit

Release Notes

  • New Features
    • Added configurable master PinnedImageSet that applies after Quay mirror operations
    • Control whether PinnedImageSet is applied via quay_pinnedimageset_enabled configuration; automatically skipped during mirror_dry_run
    • Automatically collects image digest references from master nodes and applies them as a pinned image set with a maximum limit

Review Change Stack

To ensure the mgmt cluster can recover from node reboots and crashes even
without an available registry, image resilience is necessary.
This can be achieved by pinning the required images using the PinnedImageSet
functionality, as detailed in the OpenShift documentation on Pinning and
Preloading Images [*].
This action stops CRI-O from triggering Garbage Collection for the images,
and thus ensuring the the required images exist for getting the cluster back
to an healthy status.

[*] https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html/machine_configuration/machine-config-pin-preload-images-about#machine-config-pin-preload-images-about
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Walkthrough

The PR adds a feature to pin container images from Kubernetes master nodes in disconnected Quay deployments. It introduces a new Jinja2 template for PinnedImageSet manifests, an Ansible task to collect image digests from master nodes via oc debug and crictl, a task to generate and apply the manifest, integration into the Quay finish workflow, and configuration toggles to enable or disable the feature.

Changes

Master Node Image PinnedImageSet Feature

Layer / File(s) Summary
PinnedImageSet manifest template
templates/pinnedimageset.yaml.j2
Template dynamically generates OpenShift PinnedImageSet YAML with role metadata and pinned image list from collected digest refs.
Master node image digest collection
playbooks/tasks/collect_master_node_image_digests.yaml
Collects container image digests from all master nodes by enumerating nodes, running oc debug with crictl, parsing JSON output with embedded Python to normalize refs in name@sha256 format, deduplicating, enforcing max count limits, and writing results to shared file.
PinnedImageSet generation and application
operators/quay-operator/quay_pinnedimageset.yaml
Wires digest collection into workflow, emits debug statistics, conditionally renders manifest from template with collected images, and applies to cluster via oc apply with operator kubeconfig.
Quay finish workflow integration
operators/quay-operator/quay_disconnected_finish.yaml
Integrates PinnedImageSet generation/application into the Quay disconnected finish task, gated by quay_pinnedimageset_enabled flag and mirror_dry_run state, replacing prior release signature step.
Configuration and deployment controls
config/global.example.yaml, scripts/deployment/deploy_phase.sh
Adds example configuration option for quay_pinnedimageset_enabled and deployment script logic to conditionally disable the feature via ENCLAVE_QUAY_PINNEDIMAGESET_ENABLED environment variable.

Sequence Diagram

sequenceDiagram
  participant DeployScript as Deploy Script
  participant QuayFinish as Quay Finish Workflow
  participant DigestCollection as Digest Collection Task
  participant MasterNode as Master Node (oc debug)
  participant PinnedGeneration as PinnedImageSet Generation
  participant Cluster as OpenShift Cluster
  
  DeployScript->>QuayFinish: quay_pinnedimageset_enabled flag
  alt quay_pinnedimageset_enabled and not mirror_dry_run
    QuayFinish->>DigestCollection: Include collection task
    DigestCollection->>MasterNode: oc debug to run crictl images
    MasterNode-->>DigestCollection: JSON image list with digests
    DigestCollection-->>DigestCollection: Parse, normalize, deduplicate
    DigestCollection-->>QuayFinish: Collected pinned_images list
    QuayFinish->>PinnedGeneration: Render template with images
    PinnedGeneration-->>PinnedGeneration: Generate pinnedimageset-master.yaml
    PinnedGeneration->>Cluster: oc apply pinnedimageset-master.yaml
    Cluster-->>PinnedGeneration: Applied
  else Disabled or dry-run
    QuayFinish-->>QuayFinish: Skip PinnedImageSet step
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: implementing PinnedImageSet functionality to handle registry failover resilience, which is the core objective of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch OSAC-669_PinnedImageSets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added deployment Deployment-related changes operators Operator installation/config labels May 20, 2026
@danielerez danielerez requested review from eliorerz and maorfr May 20, 2026 13:00
@maorfr
Copy link
Copy Markdown
Collaborator

maorfr commented May 20, 2026

can we obtain the list of images using oc adm release info?

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@playbooks/tasks/collect_master_node_image_digests.yaml`:
- Around line 120-121: The shell line currently in the task injects '{{
pinned_image_exclude_contains | default(_pinned_image_exclude_contains_default)
| to_json }}' into a single-quoted command which breaks if the value contains a
single quote; instead set an environment variable (e.g. PINNED_IMAGE_EXCLUDE)
for that JSON by adding an env: PINNED_IMAGE_EXCLUDE: "{{
pinned_image_exclude_contains | default(_pinned_image_exclude_contains_default)
| to_json }}" to the same Ansible task and update the command to use the env var
(reference "$PINNED_IMAGE_EXCLUDE" in double-quotes) rather than inlining the
JSON string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5318c4c4-48f2-41c5-9b40-c52fc2606416

📥 Commits

Reviewing files that changed from the base of the PR and between f36ecc6 and f32c347.

📒 Files selected for processing (6)
  • config/global.example.yaml
  • operators/quay-operator/quay_disconnected_finish.yaml
  • operators/quay-operator/quay_pinnedimageset.yaml
  • playbooks/tasks/collect_master_node_image_digests.yaml
  • scripts/deployment/deploy_phase.sh
  • templates/pinnedimageset.yaml.j2

Comment on lines +120 to +121
'{{ pinned_image_exclude_contains | default(_pinned_image_exclude_contains_default) | to_json }}' \
|| true)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid direct shell interpolation of JSON config.

The JSON argument is injected into a single-quoted shell string. If pinned_image_exclude_contains contains a ', the command can break (and can become shell-injectable). Pass this value through an environment variable instead.

Suggested hardening
 - name: Collect container image digest refs from each master node
   ansible.builtin.shell: |
@@
     refs="$(python3 -c '
@@
-    ' \
-      "$tmp" \
-      '{{ pinned_image_exclude_contains | default(_pinned_image_exclude_contains_default) | to_json }}' \
+    ' \
+      "$tmp" \
+      "${PINNED_IMAGE_EXCLUDE_JSON:-[]}" \
       || true)"
@@
   environment:
     KUBECONFIG: "{{ workingDir }}/ocp-cluster/auth/kubeconfig"
+    PINNED_IMAGE_EXCLUDE_JSON: "{{ pinned_image_exclude_contains | default(_pinned_image_exclude_contains_default) | to_json }}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@playbooks/tasks/collect_master_node_image_digests.yaml` around lines 120 -
121, The shell line currently in the task injects '{{
pinned_image_exclude_contains | default(_pinned_image_exclude_contains_default)
| to_json }}' into a single-quoted command which breaks if the value contains a
single quote; instead set an environment variable (e.g. PINNED_IMAGE_EXCLUDE)
for that JSON by adding an env: PINNED_IMAGE_EXCLUDE: "{{
pinned_image_exclude_contains | default(_pinned_image_exclude_contains_default)
| to_json }}" to the same Ansible task and update the command to use the env var
(reference "$PINNED_IMAGE_EXCLUDE" in double-quotes) rather than inlining the
JSON string.

@danielerez
Copy link
Copy Markdown
Contributor Author

danielerez commented May 20, 2026

can we obtain the list of images using oc adm release info?

Yeah, that was my first attempt:)
But from my testings, it's problematic to pin all images in the release.
Besides taking forever to pin numerous images, the main issue is failure in specific images.
I.e. images with single manifests are failing on MCO prefetch (as it's using 'podman manifest inspect - which exits with error 125). So I've added them to an exclude list: https://github.com/rh-ecosystem-edge/enclave/pull/381/changes#diff-732cb47bb018901457d4500d498c66d1e25bc0995845f2b9de84b4b454119d39R127
The alternative done here is to simply pin all the images already pulled by the master nodes. Thus, we can ensure the prefetching process by MCO when pinning images would be quick and successful (and avoid caching uneeded images).

# Default excludes skip openshift-pipelines/*, /openshift/graph-image (Cincinnati graph-data image),
# /redhat/community-operator-index, /redhat/certified-operator-index, /redhat/redhat-marketplace-index,
# /redhat/redhat-operator-index, /rhel9/support-tools (single-manifest refs → exit 125).
- name: Collect container image digest refs from each master node
Copy link
Copy Markdown
Contributor

@rporres rporres May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline scripts are complicated to debug and maintain. Although it is not exactly a reconcile like command, we can add it as a reconcile command for the moment and think about we want to organize the whole enclave-cli afterwards. We already have a cli command that is called from a playbook, see https://github.com/rh-ecosystem-edge/enclave/blob/main/playbooks/tasks/configure_operator.yaml#L146

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployment Deployment-related changes operators Operator installation/config

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants