|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "System Roles support for image mode (bootc) builds" |
| 4 | +section: Blog |
| 5 | +date: 2025-06-25T09:45:00 |
| 6 | +author: Martin Pitt |
| 7 | +category: announcement |
| 8 | +--- |
| 9 | + |
| 10 | +## Goal |
| 11 | + |
| 12 | +Image mode, aka. "bootable containers", aka. "bootc" is an exciting new way to |
| 13 | +build and deploy operating systems. A bootable container image can be used to |
| 14 | +install or upgrade a real or virtual machine, similar to container images for |
| 15 | +applications. This is currently supported for |
| 16 | +[Red Hat Enterprise Linux 9/10](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html-single/using_image_mode_for_rhel_to_build_deploy_and_manage_operating_systems/index) |
| 17 | +and [Fedora/CentOS](https://docs.fedoraproject.org/en-US/bootc/), but also in |
| 18 | +other projects like [universal-blue](https://universal-blue.org/). |
| 19 | + |
| 20 | +With system roles being the supported high-level API to set up |
| 21 | +Fedora/RHEL/CentOS systems, we want to make them compatible with image mode |
| 22 | +builds. In particular, we need to make them detect the "non-booted" environment |
| 23 | +and adjust their behaviour to not e.g. try to start systemd units or talk to |
| 24 | +network services, and defer all of that to the first boot. We also need to add |
| 25 | +full bootc end-to-end integration tests to ensure this keeps working in the |
| 26 | +future on all supported platforms. |
| 27 | + |
| 28 | +## Build process |
| 29 | + |
| 30 | +This can work in two ways. Both ought to work, and which one you choose depends |
| 31 | +on your available infrastructure and preferences. |
| 32 | + |
| 33 | +### Treat a container build as an Ansible host |
| 34 | + |
| 35 | +Start a container build with e.g. |
| 36 | + |
| 37 | +```sh |
| 38 | +buildah from --name buildc quay.io/centos-bootc/centos-bootc:stream10 |
| 39 | +``` |
| 40 | + |
| 41 | +Create an inventory for the [buildah connector](https://docs.ansible.com/ansible/latest/collections/containers/podman/buildah_connection.html): |
| 42 | + |
| 43 | +``` |
| 44 | +buildc ansible_host=buildc ansible_connection=buildah ansible_become=false ansible_remote_tmp=/tmp |
| 45 | +``` |
| 46 | + |
| 47 | +Then run the system-roles playbooks on the "outside" against that inventory. |
| 48 | + |
| 49 | +That matches the spirit of Ansible and is cleaner as Ansible itself and |
| 50 | +system-roles do not need to be installed into the container. This is the |
| 51 | +approach outlined in ["Building Container Images with Buildah and |
| 52 | +Ansible"](https://blog.tomecek.net/post/building-containers-with-buildah-and-ansible/) |
| 53 | +and [Ansible and Podman Can Play Together |
| 54 | +Now](https://blog.tomecek.net/post/ansible-and-podman-can-play-together-now/) |
| 55 | +and implemented in the |
| 56 | +[ansible-bender](https://github.com/ansible-community/ansible-bender) proof of |
| 57 | +concept (⚠️ Warning: currently unmaintained). |
| 58 | + |
| 59 | +### Install Ansible and the system roles into the container |
| 60 | + |
| 61 | +The `Containerfile` looks roughly like this: |
| 62 | + |
| 63 | +``` |
| 64 | +FROM quay.io/centos-bootc/centos-bootc:stream10 |
| 65 | +RUN dnf -y install ansible-core rhel-system-roles |
| 66 | +COPY ./setup.yml . |
| 67 | +RUN ansible-playbook setup.yml |
| 68 | +``` |
| 69 | + |
| 70 | +Everything happens inside of the image build, and the playbooks run against |
| 71 | +`localhost`. This could use a [multi-stage |
| 72 | +build](https://docs.docker.com/build/building/multi-stage/) to avoid having |
| 73 | +Ansible and the roles in the final image. This is entirely self-contained and |
| 74 | +thus works well in automatic container build pipelines. |
| 75 | + |
| 76 | +⚠️ Warning: Unfortunately this is currently broken for many/most roles because |
| 77 | +of an Ansible bug: [`service:` fails in a container build environment](https://github.com/ansible/ansible/issues/85380). |
| 78 | +Once that is fixed, this approach will work well and might often be the |
| 79 | +preferred choice. |
| 80 | + |
| 81 | +## Status |
| 82 | + |
| 83 | +This effort is tracked in the [RHEL-78157](https://issues.redhat.com/browse/RHEL-78157) epic. |
| 84 | +At the time of writing, 15 roles are already supported, the other 22 still need to be updated. |
| 85 | + |
| 86 | +Roles which support image mode builds have the `containerbuild` tag, which you |
| 87 | +can see in the [Ansible Galaxy view](https://galaxy.ansible.com/ui/standalone/roles/linux-system-roles/firewall/) (expand the tag list at the top), or in the source code in [meta/main.yml](https://github.com/linux-system-roles/firewall/blob/main/meta/main.yml). |
| 88 | + |
| 89 | +Note that some roles also have a `container` tag, which means that they are |
| 90 | +tested and supported in a running system container (i.e. a docker/podman |
| 91 | +container with the `/sbin/init` entry point, or LXC/nspawn etc.), but not |
| 92 | +during a non-booted container build. |
| 93 | + |
| 94 | +## Steps for converting a role |
| 95 | + |
| 96 | +Helping out with that effort is very much appreciated! If you are interested in |
| 97 | +making a particular role compatible with image mode builds, please follow these steps: |
| 98 | + |
| 99 | +1. Clone the role's upstream git repository. Make sure that its `meta/main.yml` |
| 100 | + file does _not_ yet have a `containerbuild` tag – if it does, the role was |
| 101 | + already converted. In that case, please update the status in the epic. |
| 102 | + |
| 103 | +1. Familiarize yourself with the purpose of the role, have a look at README.md, |
| 104 | + and think about whether running the role in a container generally makes |
| 105 | + sense. That should be the case for most of them, but e.g `storage` is |
| 106 | + hardware specific and for the most part does not make sense in a container |
| 107 | + build environment. |
| 108 | + |
| 109 | +1. Make sure your developer machine can run tests in in general. Do the |
| 110 | + [integration test setup](https://github.com/linux-system-roles/tox-lsr?tab=readme-ov-file#integration-test-setup) and also read the following sections about running QEMU and container tests. |
| 111 | + E.g. running a QEMU test should work: |
| 112 | + ```sh |
| 113 | + tox -e qemu-ansible-core-2.16 -- --image-name centos-9 --log-level=debug -- tests/tests_default.yml |
| 114 | + ``` |
| 115 | + |
| 116 | +1. Do an initial run of the default or other test during a bootc container build, to get a first impression: |
| 117 | + ```sh |
| 118 | + LSR_CONTAINER_PROFILE=false LSR_CONTAINER_PRETTY=false tox -e container-ansible-core-2.16 -- --image-name centos-9-bootc tests/tests_default.yml |
| 119 | + ``` |
| 120 | + |
| 121 | +1. The most common causes of failures are `service_facts:` which just simply |
| 122 | + doesn't work in a container, and trying to set the `state:` of a unit in |
| 123 | + `service:`. The existing PRs linked from [RHEL-78157](https://issues.redhat.com/browse/RHEL-78157) |
| 124 | + have plenty of examples what to do with these. |
| 125 | + |
| 126 | + The [logging role PR](https://github.com/linux-system-roles/logging/pull/444) |
| 127 | + is a good example for the standard approach of adding a |
| 128 | + `__rolename_is_booted` flag to the role variables, and use that to |
| 129 | + conditionalize operations and tests which |
| 130 | + can't work in a container. E.g. the above `service: status:` can be fixed |
| 131 | + with |
| 132 | + ```yaml |
| 133 | + state: "{{ 'started' if __myrole_is_booted else omit }}" |
| 134 | + ``` |
| 135 | +
|
| 136 | + `service_facts:` can be replaced with `systemctl is-enabled` or similar, see e.g. the corresponding |
| 137 | + [mssql fix](https://github.com/linux-system-roles/mssql/commit/e9d16e0eafaf1859f65e28a00c3de6a5283b2536) or |
| 138 | + [firewall fix](https://github.com/linux-system-roles/firewall/commit/e88b15ea3821b6b90443d1c9f76987bafdad5595). |
| 139 | + |
| 140 | + Do these "standard recipe" fixes to clear away the easy noise. |
| 141 | + |
| 142 | +1. Create a branch on your fork, and add a |
| 143 | + [temporary commit to run tests on branch pushes](https://github.com/martinpitt/lsr-selinux/commit/58c1065b4751f13a9201ca767b7eaa0f09aaa92b), and another commit to |
| 144 | + [enable tests on container builds and in system containers](https://github.com/martinpitt/lsr-selinux/commit/56b18070f67c04d6f37a62bfa50f27cefd0a0779). |
| 145 | + With that you can iterate on your branch and get testing feedback without |
| 146 | + creating a lot of PR noise for other developers on the project. Push to your |
| 147 | + fork, go to the Actions page, and wait for the first test result. |
| 148 | + |
| 149 | +1. As described above, the `container` tag means that the role is supported and |
| 150 | + works in (booted) system containers. In most cases this is fairly easy to |
| 151 | + fix, and nice to have, as running tests and iterating is faster, and |
| 152 | + debugging is also a bit easier. In some cases running in system containers |
| 153 | + is hard (like in the selinux or podman roles), in that case don't bother and |
| 154 | + remove that tag again. |
| 155 | + |
| 156 | +1. Go through the other failures. You can download the log archive and/or run |
| 157 | + the individual tests locally. The following command helps for easier debugging – it |
| 158 | + keeps the container running for inspection after a failure, and removes |
| 159 | + containers and temp files from the previous run: |
| 160 | + |
| 161 | + ```sh |
| 162 | + buildah rm --all; rm -rf /tmp/runcontainer.*; LSR_DEBUG=1 LSR_CONTAINER_PROFILE=false LSR_CONTAINER_PRETTY=false tox -e container-ansible-core-2.16 -- --image-name centos-9-bootc tests/tests_default.yml |
| 163 | + ``` |
| 164 | + |
| 165 | + You can enter the container and debug with `buildah run tests_default bash`. |
| 166 | + The container name corresponds to the test name; check `buildah ps`. |
| 167 | + |
| 168 | +1. Fix the role and tests until you get a green result. Finally clean up and |
| 169 | + sort your commits into |
| 170 | + [fix: Skip runtime operations in non-systemd environments](https://github.com/linux-system-roles/postgresql/commit/089421478730b6bff88c42f1ac56eec9836ae852), |
| 171 | + and [feat: Support this role in container builds](https://github.com/linux-system-roles/postgresql/commit/fea9e802473805344d6e062f99961a4231b4f129). |
| 172 | + Any role specific or more intrusive and self-contained change should be in |
| 173 | + separate commits before these. |
| 174 | + |
| 175 | +1. Add an end-to-end integration test which ensures that running the role |
| 176 | + during a container build actually works as intended in a QEMU deployment. |
| 177 | + If there is an existing integration test which has representative complexity |
| 178 | + and calls the role just once (i.e. tests one scenario), you can convert it |
| 179 | + like |
| 180 | + [sudo's bootc e2e test](https://github.com/linux-system-roles/sudo/commit/2a1569f846b24e427ba4bbe078ee5ce7bf81e13d). |
| 181 | + If there is no existing test, you can also add a specific bootc e2e test |
| 182 | + like in |
| 183 | + [this demo PR](https://github.com/linux-system-roles/sudo/pull/58/commits/42df7f14e54813e4d6d97bbc9d388f59cc25e09d) |
| 184 | + or the |
| 185 | + [postgresql role](https://github.com/linux-system-roles/postgresql/commit/18be022885c3953678c70278f7503f0df3283f04). |
| 186 | + |
| 187 | +1. To locally run the bootc e2e test, see [Image mode testing tox-lsr docs](https://github.com/linux-system-roles/tox-lsr?tab=readme-ov-file#image-mode-testing). |
| 188 | + |
| 189 | +1. Push the e2e test to your branch, iterate until green. |
| 190 | + |
| 191 | +1. Send a PR, link it from the Jira epic, get it landed, update the list in the |
| 192 | + Jira epic again. |
| 193 | + |
| 194 | +1. Celebrate 🎉 and brag about your contribution! |
0 commit comments