Skip to content

Commit 03b57e0

Browse files
fix(docker): replace PyPI opencv wheel with ffmpeg-free build [security] (#569)
## Summary Mirrors [Unstructured-IO/unstructured#4336](Unstructured-IO/unstructured#4336) in this repo so the `quay.io/unstructured-io/unstructured-api` image no longer ships the 14 ffmpeg 5.1.x CVEs bundled in PyPI `opencv-python` wheels. After `uv sync`, the Dockerfile now: - Downloads the architecture-specific `opencv-contrib-python-headless` wheel (built with `WITH_FFMPEG=OFF` + `ENABLE_CONTRIB=1` + `ENABLE_HEADLESS=1`) from the upstream `Unstructured-IO/unstructured` GitHub release (`opencv-4.12.0.88`) - SHA-256-verifies against the hashes published by the upstream `build-opencv-wheels.yml` workflow - Uninstalls any installed PyPI opencv variants and installs the verified wheel with `--no-deps` The contrib-headless variant is a strict superset of the `cv2` API exposed by `opencv-python`, `opencv-python-headless`, and `opencv-contrib-python`, so a single wheel transparently replaces whichever variant is present. ## One deviation from upstream Upstream uninstalls all four opencv variants in a single `uv pip uninstall …` call because their image pulls all four transitively (via `unstructured-paddleocr`). Our `uv.lock` currently only resolves `opencv-python`, so a single combined uninstall would fail on the three that aren't installed. Replaced with a per-package loop using `|| true` — same end state, robust if transitive deps change. ## Version / Changelog - Bumps service version `0.1.3` → `0.1.4` - `CHANGELOG.md` entry under `0.1.4` → Security - No `uv lock` changes needed; the lockfile still resolves `opencv-python 4.13.0.92`, and we overlay the 4.12.0.88 contrib-headless wheel only at image build time (upstream 4.13.0.92 has no sdist on PyPI, which is why the build-from-source workflow is pinned to 4.12.0.88). ## Test plan - [ ] `make docker-build` succeeds on `amd64` and `arm64`; the opencv replacement step resolves the architecture-specific wheel and the SHA-256 check passes - [ ] `docker run … python -c "import cv2; print(cv2.__version__)"` prints `4.12.0.88` inside the built image - [ ] `make docker-test` passes against the rebuilt image - [ ] Container scan of the rebuilt image no longer flags the 14 ffmpeg CVEs called out by upstream PR #4336 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Medium risk because it changes a core binary dependency (`opencv`) at image build time via an external wheel download and forced uninstall/reinstall, which could impact image build reliability or runtime CV2 behavior across architectures. > > **Overview** > Updates the Docker build to **remove vulnerable ffmpeg-bundled PyPI OpenCV wheels** by downloading an arch-specific, SHA-256-verified `opencv-contrib-python-headless` wheel built with `WITH_FFMPEG=OFF`, uninstalling any installed OpenCV variants, and reinstalling the verified wheel. > > Bumps the service version to `0.1.4` and adds a `CHANGELOG.md` security entry documenting the OpenCV/ffmpeg CVE mitigation. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 7e23afc. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent e7e87ec commit 03b57e0

3 files changed

Lines changed: 47 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
## 0.1.4
2+
3+
### Security
4+
5+
- **Replace PyPI opencv wheels with ffmpeg-free builds in Docker image**: After `uv sync`, the Dockerfile now substitutes the installed PyPI opencv-python variant with a source-built `opencv-contrib-python-headless` wheel compiled with `WITH_FFMPEG=OFF`, eliminating 14 bundled ffmpeg CVEs. The contrib-headless variant is a strict superset of the cv2 API (core + contrib modules, no GUI) and can transparently replace `opencv-python`, `opencv-python-headless`, or `opencv-contrib-python`. Wheel is downloaded from the upstream `Unstructured-IO/unstructured` release and hash-verified. Mirrors [unstructured#4336](https://github.com/Unstructured-IO/unstructured/pull/4336).
6+
17
## 0.1.3
28

39
### Security

Dockerfile

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,46 @@ RUN ${PYTHON} -c "from unstructured.nlp.tokenize import _load_spacy_model; _load
7575
${PYTHON} -c "from unstructured.partition.model_init import initialize; initialize()" && \
7676
${PYTHON} -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"
7777

78+
# Replace PyPI opencv wheels (which bundle vulnerable ffmpeg 5.1.x with 14 CVEs)
79+
# with a source-built opencv-contrib-python-headless wheel compiled with
80+
# WITH_FFMPEG=OFF + ENABLE_CONTRIB=1 + ENABLE_HEADLESS=1.
81+
#
82+
# The contrib-headless variant is a strict superset of the cv2 API exposed by
83+
# opencv-python, opencv-python-headless, and opencv-contrib-python, so a
84+
# single wheel can replace any of them. Because the wheel's metadata name
85+
# only matches opencv-contrib-python-headless, any other variant has to be
86+
# uninstalled first - `uv pip install --reinstall-package` would silently
87+
# no-op for the non-matching names. We uninstall each variant individually
88+
# with `|| true` to tolerate variants that aren't present (our lockfile
89+
# currently only resolves opencv-python, but this stays robust if transitive
90+
# deps change).
91+
#
92+
# See: https://github.com/opencv/opencv-python/issues/1212
93+
#
94+
# Note: uv.lock resolves opencv packages to 4.13.0.92, but our wheel is pinned
95+
# to 4.12.0.88 because 4.13.0.92 has no sdist on PyPI — the upstream
96+
# Unstructured-IO/unstructured GHA workflow (build-opencv-wheels.yml)
97+
# compiles from source and requires an sdist. Bump this when a newer version
98+
# publishes an sdist.
99+
ARG OPENCV_WHEEL_TAG=opencv-4.12.0.88
100+
ARG OPENCV_WHEEL_VERSION=4.12.0.88
101+
# SHA-256 hashes of the wheels published in the upstream
102+
# Unstructured-IO/unstructured release. Update these when bumping
103+
# OPENCV_WHEEL_VERSION.
104+
ARG OPENCV_SHA256_aarch64=498fbb787dbfe7d6bc853ddad4ea1154e8fbefbfafd05aafb417f576e27850d5
105+
ARG OPENCV_SHA256_x86_64=50545ffc1efabf06cd70894b65a7fbca56786f560f452bf67a42c1bbd7a85961
106+
RUN ARCH=$(uname -m) && \
107+
WHEEL="opencv_contrib_python_headless-${OPENCV_WHEEL_VERSION}-cp312-cp312-linux_${ARCH}.whl" && \
108+
wget -q -O /tmp/"${WHEEL}" \
109+
"https://github.com/Unstructured-IO/unstructured/releases/download/${OPENCV_WHEEL_TAG}/${WHEEL}" && \
110+
EXPECTED=$(eval echo "\$OPENCV_SHA256_${ARCH}") && \
111+
echo "${EXPECTED} /tmp/${WHEEL}" | sha256sum -c - && \
112+
for pkg in opencv-python opencv-python-headless opencv-contrib-python opencv-contrib-python-headless; do \
113+
uv pip uninstall "$pkg" 2>/dev/null || true; \
114+
done && \
115+
uv pip install --no-deps /tmp/"${WHEEL}" && \
116+
rm /tmp/"${WHEEL}"
117+
78118
COPY --chown=${NB_USER}:${NB_USER} CHANGELOG.md CHANGELOG.md
79119
COPY --chown=${NB_USER}:${NB_USER} logger_config.yaml logger_config.yaml
80120
COPY --chown=${NB_USER}:${NB_USER} prepline_${PIPELINE_PACKAGE}/ prepline_${PIPELINE_PACKAGE}/
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.1.3" # pragma: no cover
1+
__version__ = "0.1.4" # pragma: no cover

0 commit comments

Comments
 (0)