Skip to content

feat: add GHA workflow to build opencv wheels without ffmpeg#4335

Merged
lawrence-u10d merged 6 commits intomainfrom
feat/opencv-wheel-build-workflow
Apr 13, 2026
Merged

feat: add GHA workflow to build opencv wheels without ffmpeg#4335
lawrence-u10d merged 6 commits intomainfrom
feat/opencv-wheel-build-workflow

Conversation

@lawrence-u10d
Copy link
Copy Markdown
Contributor

@lawrence-u10d lawrence-u10d commented Apr 12, 2026

Summary

  • Adds a workflow_dispatch GHA workflow that builds opencv-python-headless from source with WITH_FFMPEG=OFF on amd64 and arm64
  • Wheels are uploaded as a GitHub release so Dockerfiles can download them at build time
  • Eliminates the 14 bundled ffmpeg CVEs present in the stock PyPI wheels

Test plan

  • Trigger the workflow manually with the default inputs (opencv-python-headless==4.12.0.88)
  • Verify wheels are produced for both architectures
  • Verify the GitHub release is created with both wheels attached
  • Confirm the built wheels have no .libs directory (no bundled ffmpeg)

🤖 Generated with Claude Code


Note

Low Risk
Low risk: changes are confined to GitHub Actions workflows, with the main impact being additional CI/release automation and a simple tag-based guard on PyPI publishing.

Overview
Adds a new workflow_dispatch GitHub Actions workflow to build opencv-contrib-python-headless wheels from source for amd64 and arm64 with WITH_FFMPEG=OFF, validate the resulting cv2 install has no bundled .libs and includes key contrib modules, and publish the wheels via a GitHub Release.

Updates the existing release workflow to skip tags prefixed with opencv-, preventing these wheel-only releases from triggering the normal PyPI publish job.

Reviewed by Cursor Bugbot for commit f03df11. Bugbot is set up for automated code reviews on this repo. Configure here.

… ffmpeg

Builds opencv-python-headless from source with WITH_FFMPEG=OFF on both
amd64 and arm64 using Chainguard wolfi-base. Wheels are uploaded as a
GitHub release so Dockerfiles can pull them at build time, eliminating
the 14 bundled ffmpeg CVEs from the stock PyPI wheels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented Apr 12, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedactions/​download-artifact@​d3f86a106a0bac45b974a628896c90dbdf5c809310010010010060

View full report

Comment thread .github/workflows/build-opencv-wheels.yml
lawrence-u10d and others added 2 commits April 13, 2026 09:19
The build-opencv-wheels workflow creates GitHub releases tagged
`opencv-<version>` to publish prebuilt wheels for Docker consumption.
Those releases would otherwise trigger release.yml and fail the
package-version validation, producing a spurious PyPI publish failure
for every OpenCV wheel build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on-headless

The PyPI opencv-python and opencv-contrib-python packages, which are
pulled in transitively via unstructured-paddleocr (and unstructured-
inference on py<3.12), ship the same bundled ffmpeg CVEs that the
workflow was originally built to eliminate. Building only the plain
opencv-python-headless variant left ~2/3 of the CVE surface untouched.

Switch the build to opencv-contrib-python-headless, which is a strict
superset of the other three Python-level APIs (core + contrib modules,
no GUI / X11). A single wheel can then be used to replace all four
opencv-* package names in downstream Dockerfiles, eliminating every
bundled ffmpeg CVE.

Validated locally on arm64 against wolfi-base:
- wheel is 22MB (vs 60-76MB PyPI wheels), linked against glibc 2.43
- cv2.getBuildInformation() reports GUI: NONE, no FFMPEG section,
  GStreamer: NO, only cv2.abi3.so on disk (no .libs/)
- all contrib modules present (ximgproc, aruco, xfeatures2d, text,
  bgsegm, dnn_superres)
- full PaddleOCR smoke test passes end-to-end (detection + recognition
  + angle classification on a real document image), with the wheel
  substituted for opencv-python, opencv-python-headless, and
  opencv-contrib-python

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2d16d94. Configure here.

Comment thread .github/workflows/build-opencv-wheels.yml
Without this flag gh defaults to marking the release as "Latest", which
would cause these auxiliary wheel releases to displace the actual
unstructured package release on the repo's Releases page and confuse
downstream tools that key off the "latest" release endpoint.

Addresses cursor bugbot feedback on PR #4335.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lawrence-u10d added a commit that referenced this pull request Apr 13, 2026
…stall

The previous opencv substitution had two bugs:

1. It pointed at opencv_python_headless-*.whl, but #4335 was updated to
   build opencv-contrib-python-headless wheels (a strict superset that
   covers opencv-python, opencv-python-headless, and opencv-contrib-
   python in one wheel). The download URL would 404 against the new
   release.

2. It used `uv pip install --reinstall-package opencv-python ...` with a
   single wheel whose metadata name is opencv-contrib-python-headless.
   --reinstall-package silently no-ops when the wheel's name doesn't
   match the named package, so opencv-python and opencv-contrib-python
   would have remained at their PyPI versions (still bundling the
   ffmpeg CVEs we were trying to eliminate).

Switch to explicit uninstall of all four opencv-* variants followed by
`uv pip install --no-deps` of the contrib-headless wheel. This pattern
was validated end-to-end against unstructured-paddleocr (model load +
detection + recognition on a real document image).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lawrence-u10d and others added 2 commits April 13, 2026 17:05
ubuntu-latest-arm-X-cores doesn't ship docker preinstalled. Install via
the official get.docker.com script and chmod the socket so the runner
user can use docker without a session re-login.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The chainguard apk mirror occasionally returns mid-install errors when
fetching packages (e.g. py3.12-numpy). Wrap apk update/add in a 3-
attempt retry loop, mirroring the pattern already used in
unstructured/Dockerfile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lawrence-u10d lawrence-u10d merged commit d0aa8eb into main Apr 13, 2026
12 checks passed
@lawrence-u10d lawrence-u10d deleted the feat/opencv-wheel-build-workflow branch April 13, 2026 22:06
shaneholloman pushed a commit to shaneholloman/unstructured that referenced this pull request Apr 20, 2026
…es (Unstructured-IO#4336)

## Summary
- After `uv sync`, the Dockerfile downloads a source-built
`opencv-contrib-python-headless` wheel (compiled with `WITH_FFMPEG=OFF`,
`ENABLE_CONTRIB=1`, `ENABLE_HEADLESS=1`) from the GitHub release and
substitutes it for all PyPI opencv-python variants
- The contrib-headless variant is a strict superset of the cv2 API (core
+ contrib modules, no GUI), so a single wheel replaces `opencv-python`,
`opencv-python-headless`, and `opencv-contrib-python`, eliminating 14
bundled ffmpeg CVEs
- Validated end-to-end: PaddleOCR model load + detection + recognition
on a real document image succeeds with the substituted wheel on
wolfi-base arm64

## Dependencies
> **Depends on Unstructured-IO#4335** — the GHA workflow that builds and publishes the
opencv wheels must be merged and run first so the GitHub release exists
for the Dockerfile to download from. ✅ Merged and release published.

## Test plan
- [x] Merge and run the workflow in Unstructured-IO#4335 to create the wheel release
- [x] Local smoke test: PaddleOCR OCR on a real document image with
substituted wheel (wolfi-base arm64)
- [ ] Build the main `Dockerfile` and verify opencv imports work without
`.libs`
- [ ] Confirm no ffmpeg-related CVEs remain in a container scan

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Medium risk because it changes a core transitive native dependency in
the Docker image (OpenCV), which could affect OCR/inference behavior or
break builds if the wheel/arch tag mismatch occurs, despite hash
verification.
> 
> **Overview**
> **Hardens the Docker image against ffmpeg CVEs** by replacing all PyPI
`opencv-*` packages after `uv sync` with a downloaded, hash-verified
`opencv-contrib-python-headless` wheel built with `WITH_FFMPEG=OFF`.
> 
> The Docker build now selects the wheel by architecture
(`x86_64`/`aarch64`), uninstalls any existing OpenCV variants, installs
the pinned wheel version, and bumps the library version/changelog to
`0.22.22`.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
a58f842. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants