Skip to content

feat: Add support for Python package ecosystem#2164

Draft
EyeCantCU wants to merge 12 commits intochainguard-dev:mainfrom
EyeCantCU:libraries
Draft

feat: Add support for Python package ecosystem#2164
EyeCantCU wants to merge 12 commits intochainguard-dev:mainfrom
EyeCantCU:libraries

Conversation

@EyeCantCU
Copy link
Copy Markdown
Member

And standardize introduction of other ecosystems

EyeCantCU and others added 10 commits April 3, 2026 13:45
Add a declarative ecosystem package system that allows installing
packages from non-APK ecosystems (starting with Python/PyPI) directly
into OCI images without shelling out to pip or any other tool.

Packages are resolved via the PEP 503 Simple Repository API, downloaded
as wheels, and extracted directly into the filesystem. The new
`ecosystems.python` config block supports custom indexes, version
constraints, and auto-detection of the installed Python version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the PyPI JSON API (pypi.org/pypi/{name}/{version}/json) to resolve
packages and discover transitive dependencies, instead of downloading
entire wheels just to read their METADATA files. The JSON API returns
clean requires_dist lists and wheel URLs with checksums in a single
request.

Falls back to the PEP 503 Simple API for non-PyPI indexes (private
registries), though without transitive resolution in that case.

Also adds environment marker evaluation (extra, os_name, sys_platform,
etc.) to correctly filter conditional dependencies, and pre-release
filtering to avoid resolving alpha/beta/rc versions unless pinned.

Tested with torch==2.6.0 which correctly resolves all 24 transitive
dependencies automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename the package directory and Go package from "pip" to "python" to
match the ecosystem name used in YAML config. Update all import paths
and log messages accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove readMetadata and parseRequiresDist, which are no longer used
after switching to the PyPI JSON API for dependency discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When `venv` is set in the python ecosystem config, packages are
installed into a virtual environment with proper pyvenv.cfg and
bin/python symlinks. The image environment is automatically configured
with VIRTUAL_ENV and PATH prepended with the venv bin directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dep resolution

Write SPDX 2.3 SBOMs into dist-info/sboms/sbom.spdx.json for
Chainguard-sourced packages, enabling chainctl libraries verify to
confirm provenance. Parse data-provenance and data-signature attributes
from Simple API HTML and thread them through to ResolvedPackage.

Add transitive dependency resolution for non-PyPI indexes by downloading
wheels and parsing METADATA for Requires-Dist entries.

Also fixes an off-by-one bug in parseSimpleIndex tag extraction that
caused data-requires-python (and provenance/signature) attributes to be
attributed to the wrong link.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend the existing origin-based layering strategy to support ecosystem
packages (e.g. Python pip packages) as separate layers, without treating
them as APK packages.

The approach generalizes file ownership in the filesystem to a string-based
"owner" concept. APK files get their owner from the existing tar entry
package metadata; ecosystem files get tagged via SetCurrentOwner during
installation. The splitLayers function routes files to layers using the
Owner() interface, which works for both.

Key changes:
- tarfs: Add owner field to nodes, SetCurrentOwner/OwnerSize on memFS,
  Owner() method on memFileInfo that returns APK pkg name or ecosystem owner
- ecosystem: Add OwnerTagger interface, OwnerName() on ResolvedPackage,
  InstalledSize populated after install. Installers tag files themselves.
- layers: Generalize group to carry owners[] alongside pkgs[]. Factor
  groupByOriginAndSize into groupAPKByOrigin + applyBudget so ecosystem
  groups participate in the shared budget without APK-specific logic.
- python installer: Tags files per-package around wheel extraction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix appendAssign: use apkGroups directly instead of allGroups
- Replace map loop with maps.Copy
- Fix import ordering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both ecosystem callers were passing nil for the authenticator, meaning
private Python indexes requiring authentication would fail. Use
bc.o.Auth (from options) in the build path and auth.DefaultAuthenticators
in the lock path, matching how APK repository auth is handled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: RJ Sampson <rj.sampson@chainguard.dev>
EyeCantCU and others added 2 commits April 6, 2026 16:31
Test data had wheel URLs pointing to files.example.com, causing
DNS lookups that Harden-Runner blocks. Rewrite all test URLs to
point back to the httptest server, which also serves dummy wheel
responses for dependency extraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stop hardcoding platform tag lists per architecture. Instead, parse
wheel platform tags dynamically by checking the machine suffix and
prefix (musllinux_, manylinux, linux_). Detect the image libc from
/etc/os-release (ID=alpine → musl, otherwise glibc) and only accept
wheels matching the correct libc. Replace the scoring system with
simple binary-over-pure-python preference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants