Skip to content

Commit 55e2f17

Browse files
martinpittrichm
authored andcommitted
blog: Add System Roles support for image mode (bootc) builds
1 parent c924e1e commit 55e2f17

1 file changed

Lines changed: 194 additions & 0 deletions

File tree

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
layout: post
3+
title: "System Roles support for image mode (bootc) builds"
4+
section: Blog
5+
date: 2025-06-25T09:45:00
6+
author: Martin Pitt
7+
category: announcement
8+
---
9+
10+
## Goal
11+
12+
Image mode, aka. "bootable containers", aka. "bootc" is an exciting new way to
13+
build and deploy operating systems. A bootable container image can be used to
14+
install or upgrade a real or virtual machine, similar to container images for
15+
applications. This is currently supported for
16+
[Red Hat Enterprise Linux 9/10](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html-single/using_image_mode_for_rhel_to_build_deploy_and_manage_operating_systems/index)
17+
and [Fedora/CentOS](https://docs.fedoraproject.org/en-US/bootc/), but also in
18+
other projects like [universal-blue](https://universal-blue.org/).
19+
20+
With system roles being the supported high-level API to set up
21+
Fedora/RHEL/CentOS systems, we want to make them compatible with image mode
22+
builds. In particular, we need to make them detect the "non-booted" environment
23+
and adjust their behaviour to not e.g. try to start systemd units or talk to
24+
network services, and defer all of that to the first boot. We also need to add
25+
full bootc end-to-end integration tests to ensure this keeps working in the
26+
future on all supported platforms.
27+
28+
## Build process
29+
30+
This can work in two ways. Both ought to work, and which one you choose depends
31+
on your available infrastructure and preferences.
32+
33+
### Treat a container build as an Ansible host
34+
35+
Start a container build with e.g.
36+
37+
```sh
38+
buildah from --name buildc quay.io/centos-bootc/centos-bootc:stream10
39+
```
40+
41+
Create an inventory for the [buildah connector](https://docs.ansible.com/ansible/latest/collections/containers/podman/buildah_connection.html):
42+
43+
```
44+
buildc ansible_host=buildc ansible_connection=buildah ansible_become=false ansible_remote_tmp=/tmp
45+
```
46+
47+
Then run the system-roles playbooks on the "outside" against that inventory.
48+
49+
That matches the spirit of Ansible and is cleaner as Ansible itself and
50+
system-roles do not need to be installed into the container. This is the
51+
approach outlined in ["Building Container Images with Buildah and
52+
Ansible"](https://blog.tomecek.net/post/building-containers-with-buildah-and-ansible/)
53+
and [Ansible and Podman Can Play Together
54+
Now](https://blog.tomecek.net/post/ansible-and-podman-can-play-together-now/)
55+
and implemented in the
56+
[ansible-bender](https://github.com/ansible-community/ansible-bender) proof of
57+
concept (⚠️ Warning: currently unmaintained).
58+
59+
### Install Ansible and the system roles into the container
60+
61+
The `Containerfile` looks roughly like this:
62+
63+
```
64+
FROM quay.io/centos-bootc/centos-bootc:stream10
65+
RUN dnf -y install ansible-core rhel-system-roles
66+
COPY ./setup.yml .
67+
RUN ansible-playbook setup.yml
68+
```
69+
70+
Everything happens inside of the image build, and the playbooks run against
71+
`localhost`. This could use a [multi-stage
72+
build](https://docs.docker.com/build/building/multi-stage/) to avoid having
73+
Ansible and the roles in the final image. This is entirely self-contained and
74+
thus works well in automatic container build pipelines.
75+
76+
⚠️ Warning: Unfortunately this is currently broken for many/most roles because
77+
of an Ansible bug: [`service:` fails in a container build environment](https://github.com/ansible/ansible/issues/85380).
78+
Once that is fixed, this approach will work well and might often be the
79+
preferred choice.
80+
81+
## Status
82+
83+
This effort is tracked in the [RHEL-78157](https://issues.redhat.com/browse/RHEL-78157) epic.
84+
At the time of writing, 15 roles are already supported, the other 22 still need to be updated.
85+
86+
Roles which support image mode builds have the `containerbuild` tag, which you
87+
can see in the [Ansible Galaxy view](https://galaxy.ansible.com/ui/standalone/roles/linux-system-roles/firewall/) (expand the tag list at the top), or in the source code in [meta/main.yml](https://github.com/linux-system-roles/firewall/blob/main/meta/main.yml).
88+
89+
Note that some roles also have a `container` tag, which means that they are
90+
tested and supported in a running system container (i.e. a docker/podman
91+
container with the `/sbin/init` entry point, or LXC/nspawn etc.), but not
92+
during a non-booted container build.
93+
94+
## Steps for converting a role
95+
96+
Helping out with that effort is very much appreciated! If you are interested in
97+
making a particular role compatible with image mode builds, please follow these steps:
98+
99+
1. Clone the role's upstream git repository. Make sure that its `meta/main.yml`
100+
file does _not_ yet have a `containerbuild` tag – if it does, the role was
101+
already converted. In that case, please update the status in the epic.
102+
103+
1. Familiarize yourself with the purpose of the role, have a look at README.md,
104+
and think about whether running the role in a container generally makes
105+
sense. That should be the case for most of them, but e.g `storage` is
106+
hardware specific and for the most part does not make sense in a container
107+
build environment.
108+
109+
1. Make sure your developer machine can run tests in in general. Do the
110+
[integration test setup](https://github.com/linux-system-roles/tox-lsr?tab=readme-ov-file#integration-test-setup) and also read the following sections about running QEMU and container tests.
111+
E.g. running a QEMU test should work:
112+
```sh
113+
tox -e qemu-ansible-core-2.16 -- --image-name centos-9 --log-level=debug -- tests/tests_default.yml
114+
```
115+
116+
1. Do an initial run of the default or other test during a bootc container build, to get a first impression:
117+
```sh
118+
LSR_CONTAINER_PROFILE=false LSR_CONTAINER_PRETTY=false tox -e container-ansible-core-2.16 -- --image-name centos-9-bootc tests/tests_default.yml
119+
```
120+
121+
1. The most common causes of failures are `service_facts:` which just simply
122+
doesn't work in a container, and trying to set the `state:` of a unit in
123+
`service:`. The existing PRs linked from [RHEL-78157](https://issues.redhat.com/browse/RHEL-78157)
124+
have plenty of examples what to do with these.
125+
126+
The [logging role PR](https://github.com/linux-system-roles/logging/pull/444)
127+
is a good example for the standard approach of adding a
128+
`__rolename_is_booted` flag to the role variables, and use that to
129+
conditionalize operations and tests which
130+
can't work in a container. E.g. the above `service: status:` can be fixed
131+
with
132+
```yaml
133+
state: "{{ 'started' if __myrole_is_booted else omit }}"
134+
```
135+
136+
`service_facts:` can be replaced with `systemctl is-enabled` or similar, see e.g. the corresponding
137+
[mssql fix](https://github.com/linux-system-roles/mssql/commit/e9d16e0eafaf1859f65e28a00c3de6a5283b2536) or
138+
[firewall fix](https://github.com/linux-system-roles/firewall/commit/e88b15ea3821b6b90443d1c9f76987bafdad5595).
139+
140+
Do these "standard recipe" fixes to clear away the easy noise.
141+
142+
1. Create a branch on your fork, and add a
143+
[temporary commit to run tests on branch pushes](https://github.com/martinpitt/lsr-selinux/commit/58c1065b4751f13a9201ca767b7eaa0f09aaa92b), and another commit to
144+
[enable tests on container builds and in system containers](https://github.com/martinpitt/lsr-selinux/commit/56b18070f67c04d6f37a62bfa50f27cefd0a0779).
145+
With that you can iterate on your branch and get testing feedback without
146+
creating a lot of PR noise for other developers on the project. Push to your
147+
fork, go to the Actions page, and wait for the first test result.
148+
149+
1. As described above, the `container` tag means that the role is supported and
150+
works in (booted) system containers. In most cases this is fairly easy to
151+
fix, and nice to have, as running tests and iterating is faster, and
152+
debugging is also a bit easier. In some cases running in system containers
153+
is hard (like in the selinux or podman roles), in that case don't bother and
154+
remove that tag again.
155+
156+
1. Go through the other failures. You can download the log archive and/or run
157+
the individual tests locally. The following command helps for easier debugging – it
158+
keeps the container running for inspection after a failure, and removes
159+
containers and temp files from the previous run:
160+
161+
```sh
162+
buildah rm --all; rm -rf /tmp/runcontainer.*; LSR_DEBUG=1 LSR_CONTAINER_PROFILE=false LSR_CONTAINER_PRETTY=false tox -e container-ansible-core-2.16 -- --image-name centos-9-bootc tests/tests_default.yml
163+
```
164+
165+
You can enter the container and debug with `buildah run tests_default bash`.
166+
The container name corresponds to the test name; check `buildah ps`.
167+
168+
1. Fix the role and tests until you get a green result. Finally clean up and
169+
sort your commits into
170+
[fix: Skip runtime operations in non-systemd environments](https://github.com/linux-system-roles/postgresql/commit/089421478730b6bff88c42f1ac56eec9836ae852),
171+
and [feat: Support this role in container builds](https://github.com/linux-system-roles/postgresql/commit/fea9e802473805344d6e062f99961a4231b4f129).
172+
Any role specific or more intrusive and self-contained change should be in
173+
separate commits before these.
174+
175+
1. Add an end-to-end integration test which ensures that running the role
176+
during a container build actually works as intended in a QEMU deployment.
177+
If there is an existing integration test which has representative complexity
178+
and calls the role just once (i.e. tests one scenario), you can convert it
179+
like
180+
[sudo's bootc e2e test](https://github.com/linux-system-roles/sudo/commit/2a1569f846b24e427ba4bbe078ee5ce7bf81e13d).
181+
If there is no existing test, you can also add a specific bootc e2e test
182+
like in
183+
[this demo PR](https://github.com/linux-system-roles/sudo/pull/58/commits/42df7f14e54813e4d6d97bbc9d388f59cc25e09d)
184+
or the
185+
[postgresql role](https://github.com/linux-system-roles/postgresql/commit/18be022885c3953678c70278f7503f0df3283f04).
186+
187+
1. To locally run the bootc e2e test, see [Image mode testing tox-lsr docs](https://github.com/linux-system-roles/tox-lsr?tab=readme-ov-file#image-mode-testing).
188+
189+
1. Push the e2e test to your branch, iterate until green.
190+
191+
1. Send a PR, link it from the Jira epic, get it landed, update the list in the
192+
Jira epic again.
193+
194+
1. Celebrate 🎉 and brag about your contribution!

0 commit comments

Comments
 (0)