Skip to content

Tritonadm nocloud import#27

Merged
nshalman merged 50 commits into
mainfrom
tritonadm-nocloud-import
May 6, 2026
Merged

Tritonadm nocloud import#27
nshalman merged 50 commits into
mainfrom
tritonadm-nocloud-import

Conversation

@nshalman
Copy link
Copy Markdown
Collaborator

@nshalman nshalman commented May 5, 2026

Portions generated with: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewed By: Travis Paul <tpaul@edgecast.io>
Reviewed by: Carlos Neira <cneira@edgecast.io>
[root@headnode (coal) ~]# tritonadm image fetch-nocloud --vendor alpine --release latest --target imgapi
Fetching Alpine releases.json ...
Downloading nocloud_alpine-3.23.4-x86_64-uefi-cloudinit-r0.qcow2
  URL: https://dl-cdn.alpinelinux.org/alpine/v3.23/releases/cloud/nocloud_alpine-3.23.4-x86_64-uefi-cloudinit-r0.qcow2
Hashing source image ...
Fetching https://dl-cdn.alpinelinux.org/alpine/v3.23/releases/cloud/nocloud_alpine-3.23.4-x86_64-uefi-cloudinit-r0.qcow2.sha512
Checksum OK (sha512): 849196e26640b33fb1f106d46cebdf341c715f7f49363564eacece8097464b6af7041618e94c4eb1a006394663252a86c0fb3158fee5d0ffb094899730679a71
Creating zvol: zones/tritonadm-nocloud-63a7870c-b7bd-4d3b-84ba-e8ca3420e703 (214 MiB virtual)
Writing image to zvol (224395264 bytes from qcow2) ...
Snapshotting zvol ...
Exporting ZFS stream → /var/tmp/tritonadm/nocloud/image/alpine-3.23/alpine-3.23-3.23.4.x86_64.zfs ...
Compressing image ...
Destroying zvol: zones/tritonadm-nocloud-63a7870c-b7bd-4d3b-84ba-e8ca3420e703

Build complete.
  Image:    /var/tmp/tritonadm/nocloud/image/alpine-3.23/alpine-3.23-3.23.4.x86_64.zfs.gz
  Manifest: /var/tmp/tritonadm/nocloud/image/alpine-3.23/alpine-3.23-3.23.4.json
  UUID:     2eb2ec14-55c6-5795-8ff6-710617f90e97

Importing image manifest 2eb2ec14-55c6-5795-8ff6-710617f90e97...
Imported: alpine-3.23-nocloud v3.23.4
Uploading image file...
Image file uploaded.
Activating image...
Image 2eb2ec14-55c6-5795-8ff6-710617f90e97 imported and activated.
[root@headnode (coal) ~]# imgadm avail | grep alpine
2eb2ec14-55c6-5795-8ff6-710617f90e97  alpine-3.23-nocloud             3.23.4                                        linux    zvol          2026-05-06

Vendored crates — reviewer guidance

Two third-party crates under libs/, both still load-bearing: qcow for 11 qcow2-shipping vendors, vmdk for OmniOS.

libs/qcow — panda-re/qcow-rs 1.2.0 (MIT), vendored in 0032523. Three deviations from upstream:

  • Drop zlib-ng-compat from the flate2 dep so flate2 uses pure-Rust miniz_oxide instead of CMake-built libz-sys. Upstream hardcodes the feature with no opt-out and Cargo's [patch] can't
    rewrite transitive features — hence the fork.
  • Silence unused_parens and clippy::all inside the crate (the modular-bitfield macro trips them on doc-commented fields).
  • #[ignore] one upstream test with a hardcoded path under /home/jamcleod/....

libs/vmdk — strozfriedberg/vmdk-rs 0.1.1 (Apache-2.0), vendored in 7b5eb5d. Three upstream layers stripped (we drive local file reads into a zvol, nothing more), taking transitive deps
from 318 → 112:

  • s3source.rs + the s3:// URL arm — drops rust-s3 / aws-sigv4 / hyper signing tree.
  • foyercache.rs — drops foyer + tempfile; the existing dummycache.rs covers us.
  • vmdkverify CLI — drops clap / bytesize / tracing-subscriber / hex; sha-1 demoted to dev-deps.

The async BytesSource trait architecture is untouched, so upstream merges replay as deletions, not rewires. All 32 fixture tests pass.

Common: both excluded from arch-lint.toml, both have [lints] overrides silencing workspace-wide strict lints, licenses preserved at libs/{qcow,vmdk}/LICENSE. Reviewer shortcut: git show
0032523 -- libs/qcow/Cargo.toml and git show 7b5eb5d -- libs/vmdk/Cargo.toml.

nshalman and others added 24 commits May 5, 2026 15:27
Adds a new `tritonadm image fetch-nocloud --vendor <name> --release <token>`
subcommand that fetches a CloudInit nocloud image from an upstream
vendor and converts it into a SmartOS/Triton zvol image + IMGAPI
manifest, in-process — no `qemu-img` dependency.

This is the Rust translation of the bash pipeline in
`target/triton-nocloud-images/build.sh`, with the goal of letting
SmartOS hosts ingest stock vendor images on demand instead of receiving
ones we have repackaged. POC ships Ubuntu (noble/jammy/focal/oracular,
plus `latest`); other vendors follow the same `VendorProfile` trait
shape.

Pipeline: download → TLS-fetched SHA256SUMS verify → open qcow2 in
memory via the `qcow` crate → create zvol of the qcow's actual virtual
disk size → stream decoded clusters into `/dev/zvol/rdsk/<ds>` → snap →
send → gzip → typed IMGAPI manifest. Two `indicatif` progress bars
(download + zvol write).

Zone-aware: GZ defaults to the `zones` parent dataset; NGZ requires a
delegated dataset (`zones/<zone>/data` with `zoned=on`) and bails
otherwise. `--dataset` overrides for either.

Design rationale and follow-ups in
`docs/design/tritonadm-nocloud-import.md`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CWD-relative defaults are brittle when running in the GZ from a
read-only or unexpected directory. Move both --workdir and
--output-dir defaults to a stable absolute location under
/var/tmp/tritonadm/nocloud/{cache,image}/<vendor>-<series>/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Ubuntu vendor profile now consults
https://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:download.json
(the same machine-readable feed cloud-init / MAAS / OpenStack
consume) to resolve `latest` and named codenames. This replaces the
hardcoded series table — though the table is kept as an air-gapped
fallback if streams is unreachable.

Three wins over the hardcoded path:

- `latest` is self-updating: when a new LTS ships, `--release latest`
  picks it up with no tool update.
- The manifest `version` is the canonical upstream build serial
  (e.g. `20260321`) instead of today's date, so two runs against the
  same upstream produce identical manifest versions.
- The streams JSON includes the sha256, so the verifier is now
  `Sha256Pinned` from one TLS roundtrip rather than a second roundtrip
  to fetch `SHA256SUMS`.

6 new unit tests against a small fixture cover latest-LTS selection,
codename and version-token resolution, non-LTS skipping, and URL
construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Manifest UUIDs are now `v5(NAMESPACE, source_image_sha256_hex)`, where
NAMESPACE is itself a stable v5 UUID derived from the URL
`https://tritondatacenter.com/tritonadm/nocloud`. Two runs against
the same upstream image produce the same manifest UUID, regardless of
when or where they run, which lets IMGAPI dedupe correctly.

The `Verifier` trait signature changes accordingly: it now takes a
precomputed sha256 hex string instead of a path, since the pipeline
needs the hash for both verification and UUID derivation and we don't
want to hash a 600 MB file twice.

Adds the `v5` feature to the workspace `uuid` dependency. Three new
unit tests cover stability, distinctness, and version-tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four robustness improvements for the nocloud build pipeline:

1. **Recognizable dataset prefix.** Build zvols are now named
   `<parent>/tritonadm-nocloud-<uuid>` instead of `<parent>/<uuid>`,
   so `zfs list | grep tritonadm-nocloud` is unambiguous and the next
   improvement can scope itself safely.

2. **Startup sweep.** Before creating a new build zvol, list the
   parent dataset for any leftover `tritonadm-nocloud-*` children
   from a previous interrupted run (SIGKILL, crash, host reboot)
   and destroy them.

3. **SIGINT handler.** A spawned task watches for Ctrl-C and sets a
   shared cancel flag. The download loop, the qcow→zvol copy loop,
   and the verifier all check this flag and bail cleanly, which lets
   the normal cleanup path (zfs destroy of the in-flight dataset)
   run before exit. Child shellouts (zfs send, gzip) inherit our
   process group so they receive SIGINT directly from the TTY.

4. **Cache mismatch retry.** If a cached file fails verification —
   common when the upstream serial moves between runs but the URL
   path-derived filename collides — log a warning, delete the cache,
   redownload once. Previous behavior bailed on first mismatch.

Verified manually: a stale dataset named `tritonadm-nocloud-DEADBEEF-stale-test`
was correctly detected and swept on the next run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`--dry-run` resolves vendor metadata and prints the full build plan
without downloading, hashing, or writing anything. For vendors whose
metadata feed includes the upstream sha256 (e.g. Ubuntu's Simple
Streams), the plan also shows the future manifest UUID — derivable
from the sha256 alone, since UUIDs are now v5(NS, sha256). For
vendors that fetch the hash at verification time, the plan notes
that the UUID becomes available after download.

Adds `expected_sha256: Option<String>` to ResolvedImage so the
streams path can surface the value while the SHA256SUMS-fallback
path leaves it None.

Also files several known limitations of the current implementation
in the design doc — parallel-build collisions, SIGKILL cleanup
gaps, and the deferred vendor/format/target list — so the next
iteration has a clean starting point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related robustness improvements that make concurrent builds safer.

The startup sweep now skips datasets younger than one hour, since
those are most likely owned by an actively-running concurrent build
rather than crash leftovers. Older datasets that turn out to be busy
are still detected (the destroy fails) and logged as
"busy/refused; leaving in place" rather than reported as cleaned up.

Same-(vendor, release) builds are now serialized via a
`std::fs::File::try_lock` (flock LOCK_EX | LOCK_NB) on
`<workdir>/.lock`. Since `File::try_lock` was stabilized in Rust
1.89, no third-party crate or `unsafe` block is needed. The lock
lives on the FD; the kernel releases it on any process exit, so a
SIGKILL'd run never leaves a stuck lock behind. Different
(vendor, release) pairs use different workdirs and run in parallel
without contention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the free-form `vendor: String` CLI argument with a `Vendor`
enum that derives `clap::ValueEnum`. The --help output now lists
known vendors automatically (`[possible values: ubuntu]`), bad
values are rejected before any I/O, and shell completion picks
them up. Adding a vendor is a single new variant — no parallel
match in lookup() needed because lookup is now infallible.

The variant→string mapping is derived from `serde::Serialize`
with `rename_all = "kebab-case"`, and `Display` delegates to the
existing `enum_to_display` helper, matching the pattern used
elsewhere in tritonadm. No string-matching boilerplate per variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Debian publishes generic cloud images at
`https://cloud.debian.org/images/cloud/<codename>/latest/`, with a
sibling `SHA512SUMS` file (SHA-512, not SHA-256). The new vendor
profile picks the `genericcloud` qcow2 — its cloud-init auto-detects
SmartOS's NoCloud datasource on bhyve.

Verifier work to support this:

- Generalize the SHA-256 sums-file parser into `parse_sums_file`
  (hash-agnostic; whatever hex appears in the first column wins).
- Add `Sha512SumsTls` alongside `Sha256SumsTls`. They share
  `fetch_and_parse_sums` for the URL fetch + filename match.
- Generalize `sha256_file` over a `Digest` type parameter and add a
  `sha512_file` companion.
- The `Verifier` trait now takes both `&Path` and the precomputed
  sha256 hex. Verifiers in the SHA-256 family ignore the path;
  `Sha512SumsTls` ignores the precomputed sha256 and hashes the
  file with SHA-512. The pipeline still pre-computes sha256 for
  manifest-UUID derivation regardless.

Releases supported in the table: trixie (current stable, default
for `--release latest`), bookworm (oldstable), bullseye (older
oldstable / LTS). Resolution accepts codename, major version
("13"), or `latest`.

Verified end-to-end on this builder zone: a fresh
`--vendor debian --release trixie` build runs in ~2m18s, derives
a stable v5 UUID from the file's sha256, and produces a valid
IMGAPI manifest pair.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hardcoded `RELEASES`/`LATEST_STABLE` table with a fetch
of `https://deb.debian.org/debian/dists/<suite>/Release` — the same
plain-RFC822 file apt itself uses to know what `stable` means today.

The user can now pass any of:

- `latest` — alias for `stable`
- symbolic suites — `stable`, `oldstable`, `oldoldstable`, `testing`,
  `unstable`
- codenames — `trixie`, `bookworm`, `bullseye`, `forky`, `sid`, ...

Each resolves at upstream by fetching `dists/<suite>/Release` and
parsing the `Codename` and `Version` header fields. The major number
for the cloud-images filename is parsed from the version string
(e.g. `"13.4"` → `13`). Bogus tokens get a clear 404 from the
Release-file fetch.

The manifest `version` field is now the Debian point-release
(e.g. `"13.4"`) instead of today's date, so two builds against the
same point release produce identical manifest versions. Output
files are named `debian-<codename>-<version>.x86_64.zfs.gz`
accordingly.

No offline-fallback path: we're downloading several hundred MB of
image data over the same network the Release file lives on, so the
extra ~3 KB fetch isn't a meaningful new failure mode.

Verified live for `latest`, `stable`, `oldstable`, `trixie`, and
`bookworm`; bogus tokens fail with the expected error chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three concrete reasons our Release-file-based resolver doesn't
handle development suites today (no Version field, different URL
prefix, different filename pattern), captured in the design doc
so we can come back to it without re-deriving the failure mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves via Alpine's release feed at
`https://alpinelinux.org/releases.json` — same file Alpine's web
site renders the "current stable" badge from. Accepts:

- `latest` — newest release in the `latest_stable` branch
- branch (`3.23` or `v3.23`) — newest release in that branch
- full version (`3.23.4`) — exact match

The image lives at
`https://dl-cdn.alpinelinux.org/alpine/v<branch>/releases/cloud/nocloud_alpine-<version>-x86_64-uefi-cloudinit-r0.qcow2`.
The verifier is a new `Sha512SidecarTls` — Alpine ships a per-image
`<file>.sha512` containing only the bare hex hash on a single line,
which is a different shape from a `<HASH>SUMS`-style listing.

Verified end-to-end on this builder zone: a fresh
`--vendor alpine --release latest` run finishes in ~42s (Alpine's
qcow2 is small) and produces a stable v5 UUID derived from the
file's sha256.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`make format` (= `cargo fmt`) across the nocloud module tree.
No behavior changes; tests and clippy still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the `Xz` source format end-to-end. Streams the decompressed
bytes straight into the zvol via `lzma-rs` (pure Rust, no liblzma
C dep), so there's no intermediate `.raw` file in the cache and
disk pressure stays bounded by zvol space alone.

To size the zvol correctly without decompressing first, the
pipeline parses the xz Stream Footer + Index up-front:

1. `seek(End - 12)`, `read_exact(12)` for the Footer.
2. Decode the `Backward Size` field to find the Index location.
3. `seek(End - (12 + index_size))`, read the Index, sum the
   per-record `Uncompressed Size` VLIs.

Two seeks and a few hundred bytes total — no full-file scan.
Single-stream xz is supported (the case for every cloud image
we've seen); multi-stream concatenated xz would need to walk
backward across stream boundaries.

A `ProgressWriter` wrapper around the zvol file reports
per-byte progress to indicatif and short-circuits on the SIGINT
cancel flag, so the streaming write integrates with the existing
signal handler.

Adds the FreeBSD vendor profile that uses this:

- Resolves `latest` by GETting
  `https://download.freebsd.org/releases/VM-IMAGES/` and picking
  the numerically highest `X.Y-RELEASE/` entry. Explicit version
  tokens (`15.0`, `15.0-RELEASE`) are also accepted.
- URL: `releases/VM-IMAGES/<ver>-RELEASE/amd64/Latest/FreeBSD-<ver>-RELEASE-amd64-BASIC-CLOUDINIT-zfs.raw.xz`.
- Verifier: new `Sha256BsdSumsTls`, which parses the BSD-traditional
  `SHA256 (filename) = hex` format used in FreeBSD's
  `CHECKSUM.SHA256` files (different shape from Linux SUMS files).

Verified end-to-end: `--vendor freebsd --release latest`
streamed a 6178 MiB zvol in ~10 min on this builder zone (lzma-rs
is pure-Rust slow vs. liblzma; acceptable for the POC).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Talos's nocloud images come from the dynamic Image Factory at
`https://factory.talos.dev/image/<schematic>/v<version>/nocloud-amd64.raw.xz`,
not from the upstream GitHub release. The factory does not publish
per-image sha256 or sha512 sidecars (the obvious URL paths return
HTTP 402), and the `sha256sum.txt` in the GitHub release covers only
metal/ISO assets, not factory-built images.

So the Talos vendor uses a new `TlsTrustOnly` verifier that explicitly
notes the trust model rather than silently skipping. For operators
who want a real hash check, the new general-purpose
`--expected-sha256 <hex>` CLI flag overrides whatever verifier the
vendor chose with `Sha256Pinned`. This works for any vendor — handy
for one-off pinning, audit-trail builds, or test fixtures.

The default (empty) Talos schematic is baked in:
`376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba`.
A future `--schematic <id>` flag could let users build customized
Talos images, but the POC ships the vanilla case.

Release resolution: `latest` consults
`https://api.github.com/repos/siderolabs/talos/releases/latest` and
strips the `v` prefix from `tag_name`. Explicit `1.12.7` /
`v1.12.7` are also accepted. Talos rejects cloud-init ssh-key
injection (kubelet/etcd is the only access path), so `ssh_key` is
`false` in the manifest requirements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Travis pointed out that Talos's Image Factory API documents
.sha256 / .sha512 checksum endpoints — but Sidero gates them
behind enterprise licensing on the public `factory.talos.dev`
(free-tier requests return `HTTP 402 enterprise not enabled`).
Self-hosted and enterprise factories return the checksum
normally.

So the Talos vendor now probes `<image>.sha256` at resolve time:

- 200 with body → use `Sha256SumsTls` (the response is the same
  Linux-style `<hex>  <filename>` format we already parse).
- 402 / any other status / network error → fall back to
  `TlsTrustOnly` with a note that explains the enterprise gating
  and recommends `--expected-sha256` for hash pinning.

Public-factory users see exactly the same behavior as before;
enterprise users get free hash verification with no extra flags.

The factory's normal image-download response also doesn't leak a
checksum via headers (no `Digest`, `ETag`, or `X-Content-SHA256`),
so the .sha256 endpoint is the only mechanism we have.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a defense-in-depth check after the download loop: if the server
sent a Content-Length and the bytes we wrote don't match it, bail
with a clear "download truncated" error rather than feed a short
file into the verifier (which would catch it as a checksum
mismatch — correct outcome, less actionable message).

reqwest/hyper should already error on body truncation when
Content-Length is set, but the explicit comparison costs nothing
and catches HTTP/2 stream resets, mid-stream proxy interruption,
and servers that lie about the size. No-op when Content-Length
isn't sent (chunked transfer or streaming response).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`make format` (= `cargo fmt`) across the nocloud module tree.
Whitespace and line-breaking tweaks; no behavior changes. 46 unit
tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two arch-lint no-error-swallowing warnings and three doc-lint
implementation-detail leaks were standing between the tree and a
clean `make quick-check`.

- `set_cookie_header` in triton-api-server now returns
  `Result<(), HttpError>` and propagates instead of logging-and-
  swallowing; all four call sites updated.
- `triton-cli logout` collapses its `if let Err` log-only block
  into a `match` whose `Err` arm both logs and recovers into
  `revoked = false`, which drives the user-facing message.
- The schemars/Progenitor-rationale paragraphs on `Datacenters`,
  `Services`, `MetadataObject`, and `Tags` move from `///` doc
  comments to `//` regular comments so they no longer leak into
  the generated OpenAPI specs as user-facing schema descriptions.

OpenAPI specs regenerated to reflect the trimmed descriptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream `qcow` 1.2.0 hardcodes flate2's `zlib-ng-compat` feature on
its dependency line and exposes no `[features]` knob to opt out. That
forces a CMake build of zlib-ng (libz-sys), which breaks any host
without `cmake` installed. Cargo's `[patch]` swaps a crate's source
but cannot rewrite a transitive dependency's feature flags, so a fork
is the cleanest path.

This vendors panda-re/qcow-rs 1.2.0 (MIT) into `libs/qcow` and rewires
the workspace dependency to the path. Three deviations from upstream:

- `Cargo.toml`: drop the `zlib-ng-compat` feature so flate2 falls back
  to its pure-Rust `miniz_oxide` backend. No more libz-sys / cmake.
- `Cargo.toml`: `[lints.rust] unused_parens = "allow"` and
  `[lints.clippy] all = "allow"` so the parent workspace's strict lint
  regime does not policy-check vendored upstream code. The bitfield
  macro expansion trips `unused_parens` on doc-commented fields.
- `tests/parse.rs`: `#[ignore]` the upstream integration test, which
  hardcodes a fixture path under `/home/jamcleod/.panda/...` that does
  not exist outside the original author's machine.

`arch-lint.toml` excludes `libs/qcow` (vendored, not ours).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the delivery-mode flag promised in the design doc's command
surface table:

  - `file` (default): leave artifacts in --output-dir and print the
    suggested `imgadm install` invocation.
  - `smartos`: shell `imgadm install -m <manifest> -f <gz>` against
    the local SmartOS image store. GZ-only; rejected in NGZs with a
    pointer to --target file + manual install.
  - `imgapi`: push to IMGAPI via the existing `tritonadm image
    import` code path. The Import body is factored into a shared
    `import_manifest_and_file` helper that both subcommands call,
    so origin-chain handling, manifest preservation, compression
    detection, and activation are kept in one place.

Also refreshes the design doc to match the current implementation:
xz is no longer "deferred" (lzma-rs streaming + Stream-Footer
size read), the duplicate ubuntu/freebsd/talos rows in the vendor
table are gone, the POC status is replaced with a snapshot of the
five vendors / three formats / 46 tests we ship today, and the
two follow-up lists are consolidated into one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nshalman nshalman marked this pull request as ready for review May 6, 2026 01:21
nshalman and others added 5 commits May 5, 2026 22:11
The code-layout block still listed `mod.rs`, `privileged.rs`, and only
ubuntu.rs; the actual layout has `nocloud.rs` + a per-vendor tree
(alpine/debian/freebsd/talos/ubuntu) and a `zfs.rs` shellout module.
The "Two implementations: PfexecPrivileged / FakePrivileged" sentence
was leftover from before the Privileged trait was removed. The
"Trust chain" section was scoped to Ubuntu only; rewrite it as a
per-vendor table so readers can see at a glance which sums file
each vendor uses.

Drop the stale `Xz/Raw not yet emitted` comment (and the now-false
`#[allow(dead_code)]` on Xz) in `SourceFormat` — Xz is emitted by
freebsd and talos. Raw still has no emitter, so its allow stays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vendor resolution is just HTTP, so it runs anywhere — moving it
above the SmartOS-specific preflights lets `--dry-run` exercise
release resolution and verifier wiring on a dev box.

Going one step further: when `uname -v` doesn't start with
`joyent_`, force `--dry-run` and print a stderr notice. The build
itself still requires zfs(8) + a delegated dataset, but the
common dev-box use case (`tritonadm image fetch-nocloud --vendor
ubuntu --release latest`) now does something useful instead of
erroring out.

The dry-run dataset display falls back to a placeholder string
when `default_dataset()` can't run (no `zonename` on macOS).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the Fedora Cloud_Base x86_64 qcow2 from the canonical
`releases.json` feed at https://fedoraproject.org/releases.json,
which carries the upstream sha256 inline — same shape as Ubuntu
Simple Streams, so the verifier is a plain `Sha256Pinned` and
`--dry-run` can show the manifest UUID without downloading.

Accepts `latest` (highest numeric major), bare integers (`42`),
and the conventional `f42` / `Fedora-42` prefixes. The manifest
version uses the build-bearing portion of the filename (e.g.
`44-1.7`) so distinct rebuilds of the same major dedupe sensibly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AlmaLinux publishes GenericCloud qcow2 images per major at
`https://repo.almalinux.org/almalinux/<n>/cloud/x86_64/images/`,
with a `-latest.x86_64.qcow2` rolling pointer in each major's
images directory. The sibling `CHECKSUM` is Linux-style
(`<sha256>  <filename>`) and lists both the latest pointer and its
dated alias under the same hash, so we resolve the dated build
once at metadata time and verify with a plain `Sha256Pinned`.

`--release latest` walks the auto-generated index at `/almalinux/`
to pick the highest major; `--release 8` / `9` / `10` pins to a
specific major. The manifest version uses the build identifier
(e.g. `9.7-20260501`) so distinct rebuilds dedupe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rocky publishes GenericCloud-Base qcow2 images per major at
`https://download.rockylinux.org/pub/rocky/<n>/images/x86_64/`,
with a per-file BSD-style `<filename>.CHECKSUM` sidecar — same
shape as FreeBSD's CHECKSUM.SHA256, so we reuse the existing
`Sha256BsdSumsTls` verifier. We pick the highest-versioned
dated `-Base-` build by parsing the directory listing,
ignoring rolling pointers and the LVM flavor.

`--release latest` walks `/pub/rocky/` to pick the highest
major (currently 10); `--release 8`/`9`/`10` pins to a major.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nshalman and others added 17 commits May 5, 2026 22:41
Promote `verify::parse_sums_file` and `verify::parse_bsd_sums_file`
to `pub(super)` so vendor profiles can call them directly. Drop
alma's inline copy of the Linux-style parser, which existed only
because the module-private originals weren't reachable from
sibling modules. The next commit needs the BSD parser from rocky's
release-resolution path; sharing it now keeps the two parsers
co-located with their tests in `verify.rs`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rocky's per-file `.CHECKSUM` sidecar is small and TLS-fetched, so
pulling it during release resolution costs one extra round-trip
in exchange for showing the upstream sha256 (and the derived
stable manifest UUID) in `--dry-run` output. The verifier swaps
from `Sha256BsdSumsTls` to `Sha256Pinned` since the hash is now
known at metadata time — same shape as Fedora and AlmaLinux.
The BSD-style parser comes from the now-shared
`verify::parse_bsd_sums_file`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Arch publishes per-build cloud-image directories at
`https://geo.mirror.pkgbuild.com/images/`, named
`v<YYYYMMDD>.<arch-build-id>/`, plus a `latest/` symlink. Each dir
has an `Arch-Linux-x86_64-cloudimg-<date>.<build>.qcow2` and a
Linux-style `<file>.SHA256` sidecar — same shape as the other
sidecar-bearing vendors, so we list the directory, pick the
highest version (lex sort works for the date-prefixed names),
fetch the sidecar at resolve time, and pin the hash with
`Sha256Pinned`.

`--release latest` picks the newest build; explicit
`v20260501.523211` / `20260501.523211` tokens pin to a specific
build. The series is the literal `rolling` so manifest names
read `arch-rolling-nocloud` rather than `arch-arch-nocloud`.

Detached GPG signatures (`<file>.sig`, `<file>.SHA256.sig`) are
published alongside but not yet consumed; the existing GPG-
verifier follow-up will pick those up across all sidecar vendors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Oracle's cloud-init-enabled KVM templates live at
`https://yum.oracle.com/templates/OracleLinux/OL<n>/u<u>/x86_64/`
but the only machine-readable index is the human-targeted landing
page at `oracle-linux-templates.html`. Per-image checksums are
embedded in the table HTML, paired with image links inside the
same `<tr>`. We split the HTML on `</tr>`, regex out the kvm-image
href and its sibling `kvm-sha256` `<tt>`, and use the result.

For x86_64 there is exactly one `kvm-b<build>.qcow2` per release
and Oracle's convention is that this image is cloud-init enabled
(the aarch64 builds split into separate `kvm` and `kvm-cloud`
variants, but x86_64 doesn't). Manifest version is
`<major>.<update>-b<build>` (e.g. `9.7-b269`).

`--release latest` walks the page rows to pick the highest major;
`--release 8` / `9` / `10` (with optional `OL` prefix) pins to a
major. Trust roots in TLS to `yum.oracle.com`; `Sha256Pinned`
verifier means dry-run shows the manifest UUID. HTML parsing is
fragile by nature — the parser bails clearly if the page layout
changes, leaving `--expected-sha256` as the manual escape hatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CentOS Stream publishes GenericCloud qcow2 images per stream at
`https://cloud.centos.org/centos/<n>-stream/x86_64/images/`,
with per-file BSD-style `<filename>.SHA256SUM` sidecars. The
release-resolution path lists `/centos/` for `<n>-stream/` dirs,
picks the highest dated build, and pre-fetches the sidecar so the
upstream sha256 is available at metadata time and `--dry-run`
shows the manifest UUID.

`cloud.centos.org` is fronted by CloudFront, which 403s requests
with no User-Agent — every GET sets the same `tritonadm-fetch-
nocloud` identifier the Talos profile already uses.

`--release latest` picks the highest active stream (currently 10);
`--release 9` / `9-stream` / `8` pin to a specific stream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Leap publishes per-version Minimal-VM Cloud qcow2 images at
`https://download.opensuse.org/distribution/leap/<X>.<Y>/appliances/`
with sibling Linux-style `.sha256` sidecars. `download.opensuse.org`
runs MirrorCache, which exposes `?json=1` directory listings — we
use that for both the version index and the per-version appliance
list rather than scraping HTML.

Filename naming changed between Leap 15.x and 16.x (the latter
dropped the `openSUSE-` prefix and the inner `<X.Y.Z>` block);
the resolver handles both. `--release latest` walks versions
descending and falls through empty appliance dirs (Leap 16.1 is
currently empty), so it correctly returns 16.0. The manifest
version stitches `<X.Y>` and the Build tag (e.g. `16.0-Build16.2`).

Tumbleweed is intentionally skipped — `/tumbleweed/appliances/`
ships only MicroOS-flavored immutable images today, which use
Combustion/Ignition rather than cloud-init nocloud.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Darwin caps `sun_path` at 104 bytes, and `t.TempDir()` on macOS
sits under `/var/folders/<hash>/T/<TestName><N>/001/` — paths
routinely exceeding that, so `net.Listen("unix", …)` returns
EINVAL and the test fails with `bind: invalid argument`.

Skip with `t.Skip()` when `runtime.GOOS == "darwin"` rather than
fight the path-length limit; the same code is exercised by Linux
CI which is the canonical environment for this package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The qcow2 reader returns zero-filled buffers for unallocated
clusters, so cloud images with sparse virtual disks (Ubuntu's
~3.6 GB cloudimg has ~600 MB of real data; Fedora's is similar)
were paying for gigabytes of redundant zero writes.

In `copy_with_progress` check whether each 1 MiB chunk is all
zeros — if it is, seek the zvol's char device forward instead of
writing. Fresh ZFS zvols are sparse, so unwritten regions stay
unallocated logically (no on-disk block) and the resulting
`zfs send` stream skips them entirely. The all-zero check is a
single linear scan per MiB; rustc autovectorises it.

Applies to qcow2 and raw paths, both of which use
`copy_with_progress`. The xz path goes through `lzma_rs::xz_-
decompress` driving a `Write` impl, so this optimization doesn't
naturally fit there — left as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenBSD images come from the bsd-cloud-image.org-blessed builds
at https://github.com/hcartiaux/openbsd-cloud-image. We always
pick the `min` flavor — slim cloud-image variant suited to
NoCloud, per the upstream README.

Release discovery uses the GitHub Releases API:
`latest` hits `/releases/latest`; explicit `7.8` / `v7.8`
walks the release list and matches the leading `v<X>.<Y>` of
each tag (tags look like `v7.8_2025-10-22-09-25`). The asset's
`.sha256` sidecar is a Linux-style line whose embedded filename
is `images/openbsd-min.qcow2` rather than the asset name, so
we skip filename matching and grab the first hex64 token —
same single-hash-sidecar pattern Alpine uses.

The full upstream tag (date-bearing) is the manifest version,
so distinct rebuilds of the same `<X>.<Y>` dedupe sensibly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the release-resolution side of the OmniOS pipeline:
list `https://downloads.omnios.org/media/<channel>/`, pick the
highest-versioned `omnios-*.cloud.vmdk` (lex sort handles the
LTS `r…r` refresh suffix), pre-fetch the bare-hash `<file>.sha256`
sidecar, and pin the hash with `Sha256Pinned`. Channels are
`stable`, `lts`, `bloody`; `latest` aliases to `stable`.

Adds `SourceFormat::Vmdk`, with the pipeline's read_virtual_size
and write_to_zvol arms returning a clear "not yet implemented"
error pointing at the pending vmdk-reader work. `--dry-run`
already short-circuits before those, so today's macOS / NGZ
smoke tests resolve metadata, sha256, and manifest UUID
end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `libs/vmdk` as a vendored fork of strozfriedberg/vmdk-rs.
The trim drops upstream layers we don't need for a single-pass
file read driving a zvol:

  - `s3source.rs` and the `s3://` arm in `vmdk_reader::source_for_url`
    (eliminates the rust-s3 client + its aws-sigv4 / hyper / signing
    transitive tree)
  - `foyercache.rs` (eliminates `foyer`, `foyer-common`, `tempfile`)
    in favor of the existing `dummycache.rs` impl already shipped
    by upstream
  - the `vmdkverify` CLI binary (eliminates `clap`, `bytesize`,
    `tracing-subscriber`, `hex`, plus `sha-1` from `[dependencies]`
    — moved to `[dev-dependencies]` since `test_helper.rs` uses it)

Net: 318 transitive crates → 112 (~65 % reduction). The async
`BytesSource` trait architecture is preserved so future merges
from upstream replay as a list of file deletions plus a Cargo.toml
edit, not a rewire of internals.

`make package-test` against the bundled `data/` fixtures passes
all 32 tests (vmfs thick + thin, two-gb sparse/flat, monolithic
sparse/flat, stream-optimized with and without markers).
`libs/vmdk` is added to `arch-lint.toml`'s exclude list so the
vendored code isn't held to the parent workspace's strict
no-sync-io / require-thiserror lints, matching the precedent
set for `libs/qcow`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`read_virtual_size` now opens the VMDK with `vmdkrs::VmdkReader`
(inside `tokio::task::spawn_blocking` because the crate spins its
own internal tokio runtime) and reads `image_size` from the
header chain.

The convert step uses a small `VmdkReadAdapter` that turns the
crate's offset-addressed `read_at_offset` API into a streaming
`Read` impl, so the existing `copy_with_progress` loop drives it
exactly like qcow2 and raw — and the all-zeros sparse-skip
optimization picks up unallocated grains for free.

OmniOS `--vendor omnios --release stable` (and `latest`/`lts`/
`bloody`) now resolves end-to-end on the dry-run path; on
SmartOS the full pipeline can write the zvol from the upstream
`.cloud.vmdk` directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SmartOS publishes per-release VMware VM tarballs at
`https://us-central.manta.mnx.io/Joyent_Dev/public/SmartOS/<rel>/`,
with a sibling `latest` text file pointing at the current dated
release directory. Each tarball ships a `SmartOS.vmwarevm/`
directory containing a monolithicFlat `SmartOS.vmdk` descriptor
plus the `smartos.img` extent.

This image does **not** support cloud-init NoCloud — SmartOS
provisions guests via `mdata-get`. Including it here is
ouroboros mode: the same machinery that fetches Linux/BSD
nocloud images can also turn the upstream SmartOS image into a
Triton-importable manifest. The `os` field reports `illumos`
(matching OmniOS) so consumers don't mistake this for a
Triton-native zone image, and `ssh_key=false` since cloud-init
key injection won't run.

Mechanically, this adds a `Smartos` vendor (latest pointer +
explicit timestamp tokens, sha256sums.txt verifier),
`SourceFormat::VmdkInTarGz` with an extraction step that
gunzips+untars into a sibling `.extracted/` directory and
points the existing vmdk-rs reader at the smallest `.vmdk`
(the descriptor), and `flate2`/`tar` workspace dependencies.
Idempotent: a re-run reuses the extracted dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The design narrative is unchanged — these are catch-up edits so
the doc matches what's been shipped on the branch:

  - Vendor list expands from 5 to 15 (alma, arch, centos-stream,
    fedora, omnios, openbsd, opensuse, oracle, rocky, smartos
    added; trust-chain table grows in lockstep). Each row
    documents release-discovery + verifier strategy.
  - Pipeline source-format steps cover `Vmdk` (via vendored
    libs/vmdk) and `VmdkInTarGz` (extract → smallest .vmdk =
    descriptor). Notes the all-zeros sparse-skip in
    copy_with_progress so reviewers know cloud-image sparseness
    propagates to the zfs send stream.
  - Code layout block lists every per-vendor module and calls out
    the two vendored libs (libs/qcow, libs/vmdk).
  - Command-surface flag table picks up `--target`, `--dataset`,
    `--expected-sha256`, `--dry-run`; `--keep-cache` was never
    actually shipped, so it's removed; `--profile-dir` is marked
    not-yet-implemented. Documents the auto-promote-to-dry-run
    on non-SmartOS hosts.
  - Status section: vendor count, source-format list, test count
    (46→99), and the SmartOS/OmniOS `os=illumos` + ouroboros
    notes. `imgadm install` flag order normalized to
    `-m <manifest> -f <gz>` to match the code.
  - Outstanding follow-ups: same three (Sha256SumsGpg, TOML
    profiles, Debian sid) plus the xz sparse-skip note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pick_verifier wildcarded every non-success response into TlsTrustOnly,
so a transport error or unexpected HTTP status would silently downgrade
verification from sha256-pinned to TLS-only. Match the documented 402
explicitly and propagate any other outcome as an error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the VMware-VM tarball detour in favor of the gzipped raw USB
image (smartos-<rel>-USB.img.gz) — same wire bytes you'd dd to a
USB stick, no untar / VMDK descriptor / extent indirection. The new
RawGz source format streams flate2 straight into the zvol via
copy_with_progress so all-zero chunks stay sparse, and virtual size
is read from the gzip ISIZE trailer (SmartOS USB images are well
under 4 GiB so the modulus does not bite). Removes VmdkInTarGz and
its tar+vmdk extract helpers along with the tar workspace dep on
this crate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nshalman nshalman requested a review from a team May 6, 2026 15:34
Wrap test_bugview_service_e2e in a 60s tokio::time::timeout. On timeout,
skip in CI and fail locally — Jenkins occasionally hangs the test for
>15min for environmental reasons we can't reproduce on developer machines,
and we'd rather skip-with-warning than block CI on noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread cli/tritonadm/src/commands/image/nocloud/vendor/alma/releases.rs Outdated
Six top-level profiles (alma, rocky, oracle, centosstream, fedora,
opensuse) all built the same `ResolvedImage` shape — linux qcow2,
ssh_key, `Sha256Pinned` driven by an upstream-extracted hash. Pull
that into a `PinnedQcow2` builder. Three release-discovery modules
(alma, rocky, centosstream) all listed a parent index and regexed
`(\d+)/?` subdirs; pull that into a shared `vendor::dirlist` helper
with optional User-Agent (cloud.centos.org's CloudFront still 403s
without one).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nshalman and others added 2 commits May 6, 2026 16:38
The TOML-profile section now spells out every field with its type
and required/optional status, names the supported `format` enum,
and documents the `file://` escape hatch for images that arrive in
unsupported wrapper formats (`.bz2`, `.zst`) or from origins without
a published per-file checksum.

Three example profiles live under docs/design/examples/nocloud-vendors/:

- alma-9.4-pinned.toml: pins the final AlmaLinux 9.4 GenericCloud
  build at vault.almalinux.org with the real upstream sha256.
  Demonstrates "I want a specific older build of a built-in vendor"
  — the live built-in only enumerates 9.7+.
- plan9-fossil-stanleylieber.toml: file:// + qcow2. Operator
  workflow (curl, bunzip2, sha256sum) is inline as a comment block.
- plan9-9legacy.toml: file:// + format = "raw". Different upstream
  from the stanleylieber profile; shows two profiles for the same
  os can coexist and exercises the raw-format path.

Pure design — none of this runs yet. The TOML loader, the
--vendor-toml flag, and file:// handling in the pipeline are
follow-up code work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a TOML profile loader and file:// scheme support to the
fetch-nocloud pipeline. Together these make the schema and worked
examples in docs/design/examples/nocloud-vendors/ executable, so
operators can pin specific upstream builds (e.g. AlmaLinux 9.4
from the vault) or stage decompressed artifacts locally for
formats the pipeline doesn't decode (Plan 9 .img.bz2, etc.).

- vendor/custom_toml.rs: load a TOML file into ResolvedImage with
  a Sha256Pinned verifier. deny_unknown_fields, normalize_sha256
  (64 hex, case-insensitive), tests for happy path / unknown
  format / short-or-non-hex sha256 / unknown field / missing
  required field / invalid url.
- vendor::SourceFormat: derive serde::Deserialize (snake_case);
  drop the now-misleading allow(dead_code) on Raw.
- pipeline::ensure_verified_source: returns (PathBuf, String).
  For file:// URLs, branches into verify_local_file: hash in
  place, run the verifier, no download / no caching / no
  redownload-on-cache-fail.
- Mutually-exclusive CLI flags: --vendor + --release vs
  --vendor-toml PATH, enforced by clap.
- print_plan: shows "Source file (read in place)" for file://
  sources instead of a misleading workdir cache path.

Adds the toml crate to workspace deps. End-to-end smoke tested
via the alma-9.4-pinned and plan9-9legacy example profiles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nshalman nshalman merged commit 727713f into main May 6, 2026
5 checks passed
@nshalman nshalman deleted the tritonadm-nocloud-import branch May 6, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants