Skip to content

Commit 7e23afc

Browse files
lawrence-u10dclaude
andcommitted
fix(docker): replace PyPI opencv wheel with ffmpeg-free build [security]
Mirrors Unstructured-IO/unstructured#4336. After uv sync, the Dockerfile now downloads a source-built opencv-contrib-python-headless wheel (WITH_FFMPEG=OFF) from the upstream release, hash-verifies it, and substitutes it for the PyPI opencv variant installed from uv.lock. This eliminates the 14 bundled ffmpeg 5.1.x CVEs shipped in PyPI opencv wheels. Bumps service version 0.1.3 -> 0.1.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent e7e87ec commit 7e23afc

3 files changed

Lines changed: 47 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
## 0.1.4
2+
3+
### Security
4+
5+
- **Replace PyPI opencv wheels with ffmpeg-free builds in Docker image**: After `uv sync`, the Dockerfile now substitutes the installed PyPI opencv-python variant with a source-built `opencv-contrib-python-headless` wheel compiled with `WITH_FFMPEG=OFF`, eliminating 14 bundled ffmpeg CVEs. The contrib-headless variant is a strict superset of the cv2 API (core + contrib modules, no GUI) and can transparently replace `opencv-python`, `opencv-python-headless`, or `opencv-contrib-python`. Wheel is downloaded from the upstream `Unstructured-IO/unstructured` release and hash-verified. Mirrors [unstructured#4336](https://github.com/Unstructured-IO/unstructured/pull/4336).
6+
17
## 0.1.3
28

39
### Security

Dockerfile

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,46 @@ RUN ${PYTHON} -c "from unstructured.nlp.tokenize import _load_spacy_model; _load
7575
${PYTHON} -c "from unstructured.partition.model_init import initialize; initialize()" && \
7676
${PYTHON} -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"
7777

78+
# Replace PyPI opencv wheels (which bundle vulnerable ffmpeg 5.1.x with 14 CVEs)
79+
# with a source-built opencv-contrib-python-headless wheel compiled with
80+
# WITH_FFMPEG=OFF + ENABLE_CONTRIB=1 + ENABLE_HEADLESS=1.
81+
#
82+
# The contrib-headless variant is a strict superset of the cv2 API exposed by
83+
# opencv-python, opencv-python-headless, and opencv-contrib-python, so a
84+
# single wheel can replace any of them. Because the wheel's metadata name
85+
# only matches opencv-contrib-python-headless, any other variant has to be
86+
# uninstalled first - `uv pip install --reinstall-package` would silently
87+
# no-op for the non-matching names. We uninstall each variant individually
88+
# with `|| true` to tolerate variants that aren't present (our lockfile
89+
# currently only resolves opencv-python, but this stays robust if transitive
90+
# deps change).
91+
#
92+
# See: https://github.com/opencv/opencv-python/issues/1212
93+
#
94+
# Note: uv.lock resolves opencv packages to 4.13.0.92, but our wheel is pinned
95+
# to 4.12.0.88 because 4.13.0.92 has no sdist on PyPI — the upstream
96+
# Unstructured-IO/unstructured GHA workflow (build-opencv-wheels.yml)
97+
# compiles from source and requires an sdist. Bump this when a newer version
98+
# publishes an sdist.
99+
ARG OPENCV_WHEEL_TAG=opencv-4.12.0.88
100+
ARG OPENCV_WHEEL_VERSION=4.12.0.88
101+
# SHA-256 hashes of the wheels published in the upstream
102+
# Unstructured-IO/unstructured release. Update these when bumping
103+
# OPENCV_WHEEL_VERSION.
104+
ARG OPENCV_SHA256_aarch64=498fbb787dbfe7d6bc853ddad4ea1154e8fbefbfafd05aafb417f576e27850d5
105+
ARG OPENCV_SHA256_x86_64=50545ffc1efabf06cd70894b65a7fbca56786f560f452bf67a42c1bbd7a85961
106+
RUN ARCH=$(uname -m) && \
107+
WHEEL="opencv_contrib_python_headless-${OPENCV_WHEEL_VERSION}-cp312-cp312-linux_${ARCH}.whl" && \
108+
wget -q -O /tmp/"${WHEEL}" \
109+
"https://github.com/Unstructured-IO/unstructured/releases/download/${OPENCV_WHEEL_TAG}/${WHEEL}" && \
110+
EXPECTED=$(eval echo "\$OPENCV_SHA256_${ARCH}") && \
111+
echo "${EXPECTED} /tmp/${WHEEL}" | sha256sum -c - && \
112+
for pkg in opencv-python opencv-python-headless opencv-contrib-python opencv-contrib-python-headless; do \
113+
uv pip uninstall "$pkg" 2>/dev/null || true; \
114+
done && \
115+
uv pip install --no-deps /tmp/"${WHEEL}" && \
116+
rm /tmp/"${WHEEL}"
117+
78118
COPY --chown=${NB_USER}:${NB_USER} CHANGELOG.md CHANGELOG.md
79119
COPY --chown=${NB_USER}:${NB_USER} logger_config.yaml logger_config.yaml
80120
COPY --chown=${NB_USER}:${NB_USER} prepline_${PIPELINE_PACKAGE}/ prepline_${PIPELINE_PACKAGE}/
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.1.3" # pragma: no cover
1+
__version__ = "0.1.4" # pragma: no cover

0 commit comments

Comments
 (0)