Skip to content

ENH: adopt PEP 770, add SBOM to wheel#63479

Closed
fangchenli wants to merge 9 commits into
pandas-dev:mainfrom
fangchenli:feat/pep770-sbom
Closed

ENH: adopt PEP 770, add SBOM to wheel#63479
fangchenli wants to merge 9 commits into
pandas-dev:mainfrom
fangchenli:feat/pep770-sbom

Conversation

@fangchenli
Copy link
Copy Markdown
Member

@fangchenli fangchenli commented Dec 24, 2025

We use auditwheel and delvewheel to generate SBOM for bundled native libraries and use the generate_sbom.py script to generate SBOM for vendered code. Then we inject the SBOM for vendered code into the wheels.

Needs mesonbuild/meson-python#843 to land first.

@fangchenli fangchenli added Enhancement Build Library building on various platforms labels Dec 24, 2025
@fangchenli fangchenli marked this pull request as ready for review December 24, 2025 20:40
Copy link
Copy Markdown
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this (and sorry for the lack of review).

I'm supportive on adding sboms to our wheels, but I'm a little hesitant maintaining all the supporting scripts to generate it for now. Based on numpy/numpy#29465 we might need more support from other tools before moving forward here?

Use auditwheel/delvewheel to generate SBOMs for bundled native libraries.
Add scripts/generate_sbom.py driven by LICENSES/vendored.toml to produce
a CycloneDX SBOM for pandas' vendored code, and scripts/cibw_repair_wheel.py
to inject it into the repaired wheel under .dist-info/sboms/.

The custom injection script is transitional; it can be removed once
meson-python grows native PEP 770 support (mesonbuild/meson-python#763).
Switch the PEP 770 SBOM injection mechanism from a custom
repair-wheel-command script to Meson's python.dist_info_install_dir()
helper (Meson >=1.12.0) wired up via meson-python (>=0.20.0).

The build-time SBOM generation now lives in a custom_target() in
meson.build. meson-python recognises the {py_distinfo} install
placeholder and routes the output into pandas-*.dist-info/sboms/ as
the wheel is packed -- no post-build wheel surgery, no custom
repair-wheel-command override.

Net effect on the PR:
  - delete scripts/cibw_repair_wheel.py (~200 lines)
  - drop the four repair-wheel-command overrides from pyproject.toml
    (revert to upstream cibuildwheel defaults)
  - add a 10-line custom_target() to meson.build
  - bump build-system requires for meson and meson-python

Addresses the reviewer concern about maintaining custom wheel-injection
machinery: the only pandas-side code is now the SBOM generator
(scripts/generate_sbom.py) and the vendored-component manifest
(LICENSES/vendored.toml).
Two blockers the earlier rework introduced:

1. The custom_target output was named 'pandas-vendored.cdx.json', but
   the Validate SBOM CI step in .github/workflows/wheels.yml looks for
   '*/sboms/pandas.cdx.json'. Rename the generator output to match —
   one canonical location for the vendored-code SBOM.

2. Dropping the custom repair-wheel-command override from the
   [*pyodide*] cibuildwheel override accidentally took the
   `repair-wheel-command = ""` line upstream had for Pyodide with it.
   Without that empty override, cibuildwheel falls back to its Linux
   default (auditwheel), which cannot repair emscripten wheels. Restore
   the empty override with a comment explaining why.
Five review-round-two findings, fixed together because they all touch
the SBOM pipeline:

- Restore repair-wheel-command = "delvewheel repair -w {dest_dir}
  {wheel}" under [tool.cibuildwheel.windows]. The rework dropped the
  custom script override without restoring the upstream default, so
  DLL-dependent Windows wheels were going un-bundled.
- Correct the ultrajson license expression in LICENSES/vendored.toml
  from "BSD-3-Clause" to "BSD-3-Clause AND TCL". The vendored ujson
  code carries TCL-licensed material from its double-to-ascii
  routine, as already noted in pyproject.toml's license-files list
  and LICENSES/ULTRAJSON_LICENSE.
- Make the generated SBOM reproducible: honour SOURCE_DATE_EPOCH for
  the metadata timestamp (matching meson-python's wheel timestamp
  convention), and derive the serialNumber deterministically from a
  SHA-256 of the manifest bytes + pandas version rather than uuid4().
  Same vendored.toml + same version now yields byte-identical SBOM
  output across builds.
- Add a bom-ref to metadata.component and reuse that exact value in
  dependencies[0].ref. CycloneDX dependency-graph consumers expect
  dependencies[].ref to resolve to a bom-ref in the BOM.
- Restore LICENSES/PYUPGRADE_LICENSE and its pyproject license-files
  entry, plus a pyupgrade component in LICENSES/vendored.toml. The
  earlier deletion was premature -- pyupgrade-derived code still
  lives in scripts/validate_unwanted_patterns.py, so the license
  file needs to ship alongside it. Audit and removal of that code
  is a separate exercise.
Earlier iteration of this PR required a new meson helper
(python.dist_info_install_dir()) that would have landed in meson 1.12.0.
Per reviewer feedback on the meson side, that helper is being dropped
in favour of a meson-python-only mechanism: meson-python now detects
files staged under {py_purelib}/<name>-<version>.dist-info/... in the
install plan and reroutes them into the wheel's own .dist-info/.

Pandas-side changes:

- meson.build: compute the distinfo directory locally from
  meson.project_name() / meson.project_version() and install the SBOM
  custom_target output to py.get_install_dir() / distinfo / 'sboms'.
- pyproject.toml: revert the meson>=1.12.0 pin; works with any meson
  version pandas already supports.

No new upstream meson dependency. Still requires meson-python >= 0.20.0
for the dist-info prefix detection.
Record the new SBOM shipping at
<distname>-<version>.dist-info/sboms/pandas.cdx.json for pandas 3.1.0.
Generated at build time from LICENSES/vendored.toml.
The original entry was 45 words; pandas whatsnew bullets are typically
one short sentence (~20 words). Compact to fit.
This is a scaffold commit so CI can resolve the build dependency
against mesonbuild/meson-python#843 while that PR is under review.

The pandas SBOM routing relies on path-prefix detection added in that
meson-python branch; once it lands in a released meson-python (0.20.0
or later), this commit should be dropped and the
"meson-python>=0.20.0,<1" pin restored.

NOT FOR MERGE.
@jbrockmendel
Copy link
Copy Markdown
Member

Can we Mothball this and set a reminder somewhere to revisit it on date XYZ?

@jbrockmendel
Copy link
Copy Markdown
Member

Mothballing.

@jbrockmendel jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build Library building on various platforms Enhancement Mothballed Temporarily-closed PR the author plans to return to

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants