feat: add GHA workflow to build opencv wheels without ffmpeg#4335
Merged
lawrence-u10d merged 6 commits intomainfrom Apr 13, 2026
Merged
feat: add GHA workflow to build opencv wheels without ffmpeg#4335lawrence-u10d merged 6 commits intomainfrom
lawrence-u10d merged 6 commits intomainfrom
Conversation
… ffmpeg Builds opencv-python-headless from source with WITH_FFMPEG=OFF on both amd64 and arm64 using Chainguard wolfi-base. Wheels are uploaded as a GitHub release so Dockerfiles can pull them at build time, eliminating the 14 bundled ffmpeg CVEs from the stock PyPI wheels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
The build-opencv-wheels workflow creates GitHub releases tagged `opencv-<version>` to publish prebuilt wheels for Docker consumption. Those releases would otherwise trigger release.yml and fail the package-version validation, producing a spurious PyPI publish failure for every OpenCV wheel build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on-headless The PyPI opencv-python and opencv-contrib-python packages, which are pulled in transitively via unstructured-paddleocr (and unstructured- inference on py<3.12), ship the same bundled ffmpeg CVEs that the workflow was originally built to eliminate. Building only the plain opencv-python-headless variant left ~2/3 of the CVE surface untouched. Switch the build to opencv-contrib-python-headless, which is a strict superset of the other three Python-level APIs (core + contrib modules, no GUI / X11). A single wheel can then be used to replace all four opencv-* package names in downstream Dockerfiles, eliminating every bundled ffmpeg CVE. Validated locally on arm64 against wolfi-base: - wheel is 22MB (vs 60-76MB PyPI wheels), linked against glibc 2.43 - cv2.getBuildInformation() reports GUI: NONE, no FFMPEG section, GStreamer: NO, only cv2.abi3.so on disk (no .libs/) - all contrib modules present (ximgproc, aruco, xfeatures2d, text, bgsegm, dnn_superres) - full PaddleOCR smoke test passes end-to-end (detection + recognition + angle classification on a real document image), with the wheel substituted for opencv-python, opencv-python-headless, and opencv-contrib-python Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2d16d94. Configure here.
Without this flag gh defaults to marking the release as "Latest", which would cause these auxiliary wheel releases to displace the actual unstructured package release on the repo's Releases page and confuse downstream tools that key off the "latest" release endpoint. Addresses cursor bugbot feedback on PR #4335. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lawrence-u10d
added a commit
that referenced
this pull request
Apr 13, 2026
…stall The previous opencv substitution had two bugs: 1. It pointed at opencv_python_headless-*.whl, but #4335 was updated to build opencv-contrib-python-headless wheels (a strict superset that covers opencv-python, opencv-python-headless, and opencv-contrib- python in one wheel). The download URL would 404 against the new release. 2. It used `uv pip install --reinstall-package opencv-python ...` with a single wheel whose metadata name is opencv-contrib-python-headless. --reinstall-package silently no-ops when the wheel's name doesn't match the named package, so opencv-python and opencv-contrib-python would have remained at their PyPI versions (still bundling the ffmpeg CVEs we were trying to eliminate). Switch to explicit uninstall of all four opencv-* variants followed by `uv pip install --no-deps` of the contrib-headless wheel. This pattern was validated end-to-end against unstructured-paddleocr (model load + detection + recognition on a real document image). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
qued
approved these changes
Apr 13, 2026
ubuntu-latest-arm-X-cores doesn't ship docker preinstalled. Install via the official get.docker.com script and chmod the socket so the runner user can use docker without a session re-login. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The chainguard apk mirror occasionally returns mid-install errors when fetching packages (e.g. py3.12-numpy). Wrap apk update/add in a 3- attempt retry loop, mirroring the pattern already used in unstructured/Dockerfile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
shaneholloman
pushed a commit
to shaneholloman/unstructured
that referenced
this pull request
Apr 20, 2026
…es (Unstructured-IO#4336) ## Summary - After `uv sync`, the Dockerfile downloads a source-built `opencv-contrib-python-headless` wheel (compiled with `WITH_FFMPEG=OFF`, `ENABLE_CONTRIB=1`, `ENABLE_HEADLESS=1`) from the GitHub release and substitutes it for all PyPI opencv-python variants - The contrib-headless variant is a strict superset of the cv2 API (core + contrib modules, no GUI), so a single wheel replaces `opencv-python`, `opencv-python-headless`, and `opencv-contrib-python`, eliminating 14 bundled ffmpeg CVEs - Validated end-to-end: PaddleOCR model load + detection + recognition on a real document image succeeds with the substituted wheel on wolfi-base arm64 ## Dependencies > **Depends on Unstructured-IO#4335** — the GHA workflow that builds and publishes the opencv wheels must be merged and run first so the GitHub release exists for the Dockerfile to download from. ✅ Merged and release published. ## Test plan - [x] Merge and run the workflow in Unstructured-IO#4335 to create the wheel release - [x] Local smoke test: PaddleOCR OCR on a real document image with substituted wheel (wolfi-base arm64) - [ ] Build the main `Dockerfile` and verify opencv imports work without `.libs` - [ ] Confirm no ffmpeg-related CVEs remain in a container scan 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Medium risk because it changes a core transitive native dependency in the Docker image (OpenCV), which could affect OCR/inference behavior or break builds if the wheel/arch tag mismatch occurs, despite hash verification. > > **Overview** > **Hardens the Docker image against ffmpeg CVEs** by replacing all PyPI `opencv-*` packages after `uv sync` with a downloaded, hash-verified `opencv-contrib-python-headless` wheel built with `WITH_FFMPEG=OFF`. > > The Docker build now selects the wheel by architecture (`x86_64`/`aarch64`), uninstalls any existing OpenCV variants, installs the pinned wheel version, and bumps the library version/changelog to `0.22.22`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit a58f842. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
workflow_dispatchGHA workflow that buildsopencv-python-headlessfrom source withWITH_FFMPEG=OFFon amd64 and arm64Test plan
opencv-python-headless==4.12.0.88).libsdirectory (no bundled ffmpeg)🤖 Generated with Claude Code
Note
Low Risk
Low risk: changes are confined to GitHub Actions workflows, with the main impact being additional CI/release automation and a simple tag-based guard on PyPI publishing.
Overview
Adds a new
workflow_dispatchGitHub Actions workflow to buildopencv-contrib-python-headlesswheels from source foramd64andarm64withWITH_FFMPEG=OFF, validate the resultingcv2install has no bundled.libsand includes key contrib modules, and publish the wheels via a GitHub Release.Updates the existing release workflow to skip tags prefixed with
opencv-, preventing these wheel-only releases from triggering the normal PyPI publish job.Reviewed by Cursor Bugbot for commit f03df11. Bugbot is set up for automated code reviews on this repo. Configure here.