Skip to content

Commit 86d1f8c

Browse files
cgwaltersjeckersb
authored andcommitted
docs: Add plans for composefs-to-ostree pipeline
Update the experimental-unified-storage doc and add rustdocs to describe the planned upcoming three-store architecture (containers-storage -> composefs -> ostree). This will eliminate tar serialization and share physical disk blocks. Assisted-by: OpenCode (claude-sonnet-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
1 parent 84190cb commit 86d1f8c

5 files changed

Lines changed: 211 additions & 32 deletions

File tree

crates/lib/src/bootc_composefs/repo.rs

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,40 @@
1+
//! Composefs repository lifecycle and OCI pull paths.
2+
//!
3+
//! This module owns how OCI images get into the composefs object store.
4+
//! There are two pull paths, selected by the `use_unified` flag:
5+
//!
6+
//! ## Direct pull (`use_unified = false`)
7+
//!
8+
//! `pull_composefs_direct` fetches from the source transport (registry, OCI
9+
//! dir, etc.) straight into the composefs repo via `composefs_oci::pull` with
10+
//! default options. No containers-storage involvement.
11+
//!
12+
//! ## Unified pull (`use_unified = true`)
13+
//!
14+
//! `pull_composefs_unified` is the two-stage path that populates all three
15+
//! stores (see [`crate::store`] for the architecture overview):
16+
//!
17+
//! **Stage 1** — Pull into bootc-owned containers-storage via
18+
//! `CStorage::pull_with_progress` (or `pull_from_host_storage` if the image
19+
//! already exists in the default podman store, saving a network round-trip).
20+
//!
21+
//! **Stage 2** — `composefs_oci::pull` with `LocalFetchOpt::ZeroCopy` and
22+
//! `storage_root` pointing at the containers-storage directory. composefs-ctl
23+
//! walks the overlay `diff/` directories and FICLONEs each file into the
24+
//! composefs object store keyed by its SHA-512 fsverity digest. On a
25+
//! reflink-capable filesystem this is near-instantaneous and consumes no
26+
//! additional disk space.
27+
//!
28+
//! The caller provides `storage_path` as an absolute filesystem path string
29+
//! (not a `Dir` fd) because composefs-ctl passes it to a child skopeo process.
30+
//! It is derived from the physical root fd via `/proc/self/fd/{fd}` readlink.
31+
//!
32+
//! ## Entry points
33+
//!
34+
//! - [`pull_composefs_repo`] — upgrade/switch on a composefs-booted system.
35+
//! - [`initialize_composefs_repository`] — `bootc install` with the composefs
36+
//! backend.
37+
138
use fn_error_context::context;
239
use std::sync::Arc;
340

crates/lib/src/deploy.rs

Lines changed: 44 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,48 @@
1-
//! # Write deployments merging image with configmap
1+
//! Pull dispatch and deployment staging for the ostree backend.
22
//!
3-
//! Create a merged filesystem tree with the image and mounted configmaps.
3+
//! ## Planned Pull paths
4+
//!
5+
//! The top-level entry point for upgrade/switch will eventually select
6+
//! among three paths based on the `unified` flag and filesystem capability:
7+
//!
8+
//! - **Unified + reflinks** (`unified = true`, XFS/btrfs): `pull_via_composefs_unified`
9+
//! — the planned three-store pipeline. Pulls into containers-storage first, then
10+
//! ZeroCopy into composefs, then synthesizes the ostree commit via FICLONE.
11+
//! See [`crate::store`] for the architecture diagram.
12+
//!
13+
//! - **Non-unified + reflinks** (`unified = false`, XFS/btrfs): `pull_via_composefs`
14+
//! — fetches from registry directly into composefs (no containers-storage),
15+
//! then synthesizes the ostree commit via FICLONE.
16+
//!
17+
//! - **No reflinks** (ext4): `pull` — the legacy ostree-native tar importer
18+
//! (`ostree_container::store::ImageImporter`).
19+
//!
20+
//! ## Planned composefs → ostree synthesis
21+
//!
22+
//! The synthesis plan relies on `import_from_composefs_repo` from
23+
//! `ostree_ext::container::composefs_import` to walk the composefs
24+
//! filesystem tree and for each regular file:
25+
//!
26+
//! 1. Reads uid/gid/mode/xattrs from composefs metadata. SELinux labels are
27+
//! computed in bulk before the walk via `selabel()` and looked up per-file;
28+
//! a NUL terminator is appended because composefs-rs omits it but the kernel
29+
//! stores it.
30+
//! 2. Computes the ostree content checksum in-memory (SHA-256 of
31+
//! `uid:gid:mode:xattrs:file-content`).
32+
//! 3. Issues `ioctl(FICLONE)` from the composefs object fd into a new `O_TMPFILE`
33+
//! in the ostree object directory.
34+
//! 4. Applies metadata (`fchown`, `fchmod`, `fsetxattr`) and links the tmpfile
35+
//! into the ostree content-addressed path.
36+
//!
37+
//! `/etc` is remapped to `usr/etc`; virtual toplevel paths (`proc`, `sys`,
38+
//! `dev`, etc.) are excluded — matching the ostree-container tar importer.
39+
//!
40+
//! ## Auto-detection
41+
//!
42+
//! `image_exists_in_unified_storage` checks whether the target image is already
43+
//! present in bootc-owned containers-storage. Call sites use this to select
44+
//! `unified = true` automatically without requiring an explicit flag from the
45+
//! user once `bootc image set-unified` has been run.
446
547
use std::collections::HashSet;
648
use std::io::{BufRead, Write};

crates/lib/src/image.rs

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,17 @@
1-
//! # Controlling bootc-managed images
2-
//!
31
//! APIs for operating on container images in the bootc storage.
2+
//!
3+
//! ## `bootc image set-unified`
4+
//!
5+
//! `set_unified_entrypoint` dispatches to `set_unified` (ostree backend) or
6+
//! `set_unified_composefs` (composefs backend). Both pull the currently booted
7+
//! image into bootc-owned containers-storage so that future upgrade/switch
8+
//! operations can use the unified storage path.
9+
//!
10+
//! In the planned three-store architecture (see [`crate::store`]), this will
11+
//! require a reflink-capable filesystem (XFS or btrfs) by default to enable
12+
//! block sharing. The planned `--allow-copy` flag will opt into a byte copy
13+
//! for environments like ext4 where podman access to the OS image matters
14+
//! more than disk efficiency.
415
516
use anyhow::{Context, Result, bail};
617
use bootc_utils::CommandRunExt;

crates/lib/src/store/mod.rs

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,77 @@
11
//! The [`Storage`] type holds references to three different types of
2-
//! storage:
2+
//! storage that together implement the unified storage model.
3+
//!
4+
//! # Planned three-store architecture
5+
//!
6+
//! The planned architecture for unified storage involves three content stores that
7+
//! share physical disk blocks on a reflink-capable filesystem (XFS, btrfs):
8+
//!
9+
//! 1. **bootc-owned containers-storage** at `/sysroot/ostree/bootc/storage`
10+
//! (overlay driver) — the image is accessible to podman and shares layers
11+
//! with Logically Bound Images.
12+
//! 2. **composefs object store** at `/sysroot/composefs/objects/`
13+
//! (SHA-512 content-addressed) — used by composefs-boot to mount the
14+
//! rootfs as EROFS. Populated from containers-storage via `FICLONE`
15+
//! (`composefs_oci::pull` with `ZeroCopy`).
16+
//! 3. **ostree bare repo** at `/sysroot/ostree/repo/objects/`
17+
//! (SHA-256 content-addressed) — provides deployment, rollback, fsck, and
18+
//! delta updates. Populated from the composefs object store via `FICLONE`
19+
//! (`import_from_composefs_repo`).
20+
//!
21+
//! Each `FICLONE` ioctl lets the kernel mark source and destination extents as
22+
//! copy-on-write siblings with no userspace data movement. On ext4 (no
23+
//! reflinks), each step falls back to a byte copy.
24+
//!
25+
//! ## Implementation Plan
26+
//!
27+
//! The containers-storage → composefs step (arrow 1) is already implemented
28+
//! for the composefs boot backend in `crates/lib/src/bootc_composefs/repo.rs`
29+
//! via `pull_composefs_unified`.
30+
//!
31+
//! Wiring all three steps together for the ostree backend is the major planned work.
32+
//! The composefs → ostree step (arrow 2) was proven by the `composefs-to-ostree`
33+
//! spike branch. The planned implementation for the ostree backend will:
34+
//!
35+
//! 1. Perform a lazy cached probe (`reflinks_supported`) at install time.
36+
//! 2. Pull into containers-storage first (Stage 1).
37+
//! 3. Use `composefs_oci::pull` with `LocalFetchOpt::ZeroCopy` to populate composefs (Stage 2).
38+
//! 4. Finally, synthesize the ostree commit by walking the composefs tree,
39+
//! reading metadata, computing SELinux labels, computing the ostree checksum,
40+
//! and `FICLONE`ing into the ostree bare repo (Stage 3).
41+
//!
42+
//! ## Long-term: Global composefs store
43+
//!
44+
//! The ultimate planned state (the "composefs-as-storage" plan) is to have podman's
45+
//! composefs backend natively write objects to `/sysroot/composefs` directly, bypassing
46+
//! even `containers-storage`. This would mean flatpak, podman, and bootc all share exactly
47+
//! one global pool of content-addressed, deduplicated files.
48+
//!
49+
//! ## Why composefs in the middle
50+
//!
51+
//! The old unified storage path (containers-storage → skopeo tar → ostree)
52+
//! serialized layers twice. composefs-ctl's `ZeroCopy` pull mode instead walks
53+
//! the overlay `diff/` directories and FICLONEs each file into the composefs
54+
//! object store keyed by SHA-512 fsverity digest — no tar involved.
55+
//! See [container-libs#144](https://github.com/containers/container-libs/issues/144).
56+
//!
57+
//! ## Why reflink and not hardlink between composefs and ostree
58+
//!
59+
//! composefs is content-addressed by SHA-512 of raw bytes: two paths with
60+
//! identical content share one composefs inode. ostree bare mode stores
61+
//! uid/gid/mode/xattrs including `security.selinux` on each inode. Two files
62+
//! with the same bytes but different SELinux labels produce different ostree
63+
//! checksums but share one composefs object. One inode can hold only one
64+
//! `security.selinux` value, so hardlinking would silently corrupt labels.
65+
//! Reflink gives each ostree object its own inode while sharing disk extents.
66+
//!
67+
//! ## Reflink probe
68+
//!
69+
//! The reflink probe is performed lazily and cached. It creates
70+
//! two anonymous temporary files (via `O_TMPFILE`, no
71+
//! cleanup needed), writes one byte to the source, and attempts
72+
//! `ioctl(FICLONE)`. Returns `true` on success, `false` on `EOPNOTSUPP` or
73+
//! `EXDEV`. The probe directory is `composefs/objects` if it already exists,
74+
//! otherwise the physical root itself.
375
//!
476
//! # OSTree
577
//!

docs/src/experimental-unified-storage.md

Lines changed: 44 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,13 @@ Tracking issue: <https://github.com/bootc-dev/bootc/issues/20>
77

88
## Overview
99

10-
Unified storage is an experimental feature that allows bootc to fetch and store
11-
the default OS image in the same [containers/storage](https://github.com/containers/storage)
12-
backend used for [logically bound images](logically-bound-images.md) (and by podman).
13-
This enables several benefits:
10+
Unified storage is the goal of having all storage for bootc be "unified" with the storage
11+
used by a container runtime, such as podman.
12+
13+
Currently, bootc uses either ostree or composefs. [Logically bound images](logically-bound-images.md)
14+
use the podman container storage.
15+
16+
## Goals
1417

1518
- Direct support for zstd:chunked: Container images using zstd:chunked compression
1619
can be efficiently pulled with deduplication
@@ -21,20 +24,6 @@ This enables several benefits:
2124
- When used with `bootc image cmd build`, can support direct build into the bootc-owned
2225
storage without a copy from the podman (or other app container) storage.
2326

24-
## Background
25-
26-
Historically, bootc has used two separate storage backends:
27-
28-
1. **ostree**: For the booted host OS image, via [ostree-rs-ext](https://github.com/ostreedev/ostree-rs-ext/)
29-
2. **containers/storage**: For logically bound images (LBIs)
30-
31-
This split created challenges: the booted image couldn't be easily accessed
32-
by podman, and container layer sharing between the host and LBIs wasn't possible.
33-
34-
Unified storage addresses this by pulling the host image into the bootc-owned
35-
container storage (`/usr/lib/bootc/storage`) first, then importing from there
36-
into ostree and setting it up for booting (e.g. performing SELinux labeling).
37-
3827
## Current status
3928

4029
**Status**: Experimental. The unified storage feature is under active development.
@@ -56,16 +45,10 @@ from its container storage into ostree, or when copying between different
5645
container storage instances, each layer is fully re-serialized even when both
5746
storages are on the same filesystem.
5847

59-
With reflink support (as proposed in that issue), copies between storages on
60-
the same filesystem would be nearly instantaneous and use no additional disk
61-
space. Without it, unified storage works but involves redundant I/O and
62-
temporary disk space usage proportional to layer sizes. This is particularly
63-
noticeable with large non-chunked layers.
64-
6548
The architectural fix requires separating metadata from data in the copy path,
6649
allowing file descriptors to be passed and reflinked rather than streamed
67-
through tar. This is related to the composefs approach of content-addressed
68-
storage with distinct metadata and data channels.
50+
through tar. This will be solved by putting [composefs-rs](https://github.com/containers/composefs-rs)
51+
in the middle to orchestrate zero-copy pulls. See [Future plans: composefs-to-ostree](#future-plans-composefs-to-ostree).
6952

7053
## Enabling unified storage
7154

@@ -153,7 +136,41 @@ podman --storage-opt=additionalimagestore=/usr/lib/bootc/storage run localhost/b
153136
Unified storage is complementary to the [composefs backend](experimental-composefs.md).
154137
While unified storage changes *how images are pulled* (using containers/storage),
155138
the composefs backend changes *how the filesystem is stored and verified*.
156-
These features can potentially be combined in the future.
139+
140+
## Future plans: composefs-to-ostree
141+
142+
These features will be combined in upcoming work to build a "composefs-first"
143+
import pipeline. In this planned model, containers/storage will pull the image,
144+
composefs will import it via reflinks (`FICLONE`), and then ostree will
145+
synthesize its commit by `FICLONE`ing from the composefs objects.
146+
147+
This will eliminate tar serialization entirely, meaning only one physical copy
148+
of the image data will exist on disk, shared across all three stores.
149+
150+
## Future plans: composefs-as-storage
151+
152+
Looking further ahead, the ultimate evolution of unified storage is to make the host's `/sysroot/composefs` object store the single, global source of truth for all content-addressed files on the system.
153+
154+
Instead of `containers/storage` maintaining its own copy of application image layers and merely sharing the *host* OS layers, podman's composefs backend could be configured to write objects directly into `/sysroot/composefs` on bootc-managed systems.
155+
156+
This means there would be exactly one storage pool for:
157+
158+
1. The bootc host OS image
159+
2. Logically bound app containers
160+
3. Standard Podman app containers
161+
4. Flatpak apps (by having flatpak's system helper write to the same object store)
162+
163+
Every file across the entire system—whether part of the base OS, a containerized database, or a desktop application—would be deduplicated automatically and perfectly at the object level via fsverity digests.
164+
165+
### Implementation notes
166+
167+
For developers, the internal design and target architecture for this three-store
168+
unified storage model is documented in the rustdoc comments of the relevant source files:
169+
170+
- `crates/lib/src/store/mod.rs` — the target three-store architecture and reflink behavior
171+
- `crates/lib/src/bootc_composefs/repo.rs` — composefs unified pull path stages
172+
- `crates/lib/src/deploy.rs` — pull dispatch and ostree backend synthesis
173+
- `crates/lib/src/image.rs``bootc image set-unified` entrypoints
157174

158175
## Limitations
159176

0 commit comments

Comments
 (0)