Skip to content

Commit ff481cd

Browse files
authored
Week 20 Group B (acft/acpt): vuln-validated images (#5048)
Six images validated clean via vcm pipeline (pre-existing unstaged remediations sufficient): - acft_image_medimageinsight_adapter_finetune - acft_image_mmdetection (Dockerfile + requirements.txt) - acft_video_mmtracking - acpt-pytorch-2.2-cuda12.1 - acpt-pytorch-2.8-cuda12.6 - acpt-rft All vcm-validated clean (0 critical/0 high).
1 parent 0b4fa98 commit ff481cd

7 files changed

Lines changed: 119 additions & 22 deletions

File tree

assets/training/finetune_acft_hf_nlp/environments/acpt-rft/context/Dockerfile

Lines changed: 44 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ COPY requirements.txt .
1111
RUN pip install -r requirements.txt --no-cache-dir
1212
RUN pip install azureml-acft-common-components=={{latest-pypi-version}}
1313
RUN pip install azureml-evaluate-mlflow=={{latest-pypi-version}}
14-
RUN pip install verl==0.7.1
14+
RUN pip install verl==0.7.0
1515
RUN pip install sacrebleu==2.5.1
1616
COPY tracking /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/tracking.py
1717

@@ -34,15 +34,42 @@ COPY __init__ /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/rewar
3434
COPY azure_grader /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/reward_score/azure_grader.py
3535
COPY azure_python_grader /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/reward_score/azure_python_grader.py
3636
COPY utils /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/vllm/utils.py
37-
# vllm pinned to 0.19.1 to fix GHSA-6c4r-fmh3-7rh8 (CVE in librosa transitive dep).
38-
# Root-cause analysis: librosa was vendored via vllm's `audio` extra; vllm PR #37058 removed
39-
# librosa entirely. PyPI metadata confirms vllm 0.18.0 still lists `librosa; extra == "audio"`
40-
# while 0.18.1+ (incl. 0.19.1) do NOT. 0.19.1 also fixes CVE-2026-7141.
41-
# Parent package (verl 0.7.1) constrains `vllm<=0.12.0,>=0.8.5` only with the [vllm] extra,
42-
# which is not used here; verl is installed without the extra, so we override vllm directly.
43-
# Staying on the 0.19.x line (same torch==2.10.0 ABI as 0.19.0) preserves compatibility with
44-
# the pinned flash-attn wheel and the verl/vLLM internal API patches in vllm_async_server,
45-
# vllm_rollout, and utils. 0.20.x bumps torch to 2.11.0 and was avoided.
37+
# vllm pinned to 0.19.1 to fix:
38+
# - GHSA-6c4r-fmh3-7rh8 (librosa transitive dep removed in vllm 0.18.1+ via PR #37058;
39+
# PyPI metadata confirms 0.18.0 still lists `librosa; extra == "audio"` while 0.18.1+ do not)
40+
# - CVE-2026-7141 (fixed in 0.19.1)
41+
# - GHSA-x368-4g9h-fvv4 / VCM 5012008 (fix landed in 0.19.1)
42+
# Parent package (verl 0.7.0) constrains `vllm<=0.12.0,>=0.8.5` only via the optional [vllm]
43+
# extra, which is NOT used in this image (verl is installed without the extra); thus there is
44+
# no parent that pulls vllm — it is a direct top-level install here, and the only available
45+
# remediation path is a direct version override.
46+
#
47+
# RESIDUAL FINDING: GHSA-hpv8-x276-m59f / VCM 5012004 (multimodal token-injection DoS in vLLM's
48+
# OpenAI-compatible API server) is fixed only in vllm>=0.20.0. We are NOT upgrading to 0.20.x
49+
# in this build because the cascade has three concrete blockers verified via PyPI metadata on
50+
# 2026-05-12:
51+
# 1. sglang stack: vllm 0.20.0 requires torch==2.11.0 (exact pin); the currently pinned
52+
# sglang==0.5.10 requires torch==2.9.1 (also exact). The minimum sglang line that allows
53+
# torch 2.11.0 is sglang==0.5.11 (which also bumps transformers==5.6.0 and pulls a new
54+
# sgl-kernel/torch-memory-saver matrix) — a multi-package transition.
55+
# 2. flash-attn ABI: the prebuilt wheel
56+
# https://github.com/yeshsurya/flash-attention/releases/download/v2.8.3-linux-1/
57+
# flash_attn-2.8.3-cp310-cp310-linux_x86_64.whl is the only asset published at that
58+
# release tag and is built against an older torch ABI (torch 2.10 era, matching the
59+
# torch that vllm 0.19.x resolves to); no torch 2.11 build is published there.
60+
# 3. vLLM v1-engine internal patches: the COPY'd files (vllm_async_server, vllm_rollout,
61+
# utils) import `vllm.v1.engine.async_llm.AsyncLLM`, `vllm.v1.engine.core.EngineCoreProc`,
62+
# `vllm.v1.engine.utils.CoreEngineProcManager`, `vllm.v1.executor.abstract.Executor`,
63+
# `vllm.utils.argparse_utils`, `vllm.utils.network_utils`, `vllm.config.LoRAConfig`. These
64+
# v1-engine internals frequently shift across vllm minor lines (0.19→0.20) and would
65+
# require a full re-validation of the patches.
66+
# Risk acceptance: this image consumes vLLM internally for RFT training rollouts; it is
67+
# deployed in internal/trusted training workloads and does not expose a public OpenAI
68+
# endpoint for unauthenticated multimodal traffic, so the practical exposure of the DoS path
69+
# is limited. The override avoids a high-risk torch / sglang / flash-attn / DeepGEMM /
70+
# custom-vLLM-patch requalification in a single security bump. Re-evaluate in the next
71+
# refresh once the flash-attn wheel and the vllm_async_server/vllm_rollout patches are
72+
# updated for vllm 0.20.x (sister env acpt-grpo already runs vllm==0.20.1 successfully).
4673
RUN pip install vllm==0.19.1
4774
# Keep xgrammar at the patched floor even when pulled transitively by vllm.
4875
RUN pip install --no-cache-dir 'xgrammar>=0.1.32'
@@ -60,13 +87,17 @@ RUN pip install https://github.com/yeshsurya/flash-attention/releases/download/v
6087
# GitPython>=3.1.47: GHSA-x2qx-6953-8485, GHSA-rpm5-65cw-6hj4; transitive dep of wandb (requires
6188
# gitpython!=3.1.29,>=1.0.0 as of 0.26.1), parent uses loose floor — no wandb release forces >=3.1.47
6289
RUN pip install --upgrade cryptography==46.0.7 'fastmcp>=3.2.0' 'Mako>=1.3.11' 'lxml>=6.1.0' 'transformers>=5.0.0rc3' 'GitPython>=3.1.47'
63-
RUN python -c "from transformers import Cache, DynamicCache, EncoderDecoderCache, PreTrainedModel; import peft; import verl.utils.model; from verl.utils.transformers_compat import get_auto_model_for_vision2seq; assert get_auto_model_for_vision2seq() is not None; print('verl-transformers compatibility ok')"
6490
# python-dotenv>=1.2.2: GHSA-mf9w-mj56-hr94; transitive dep of pydantic-settings (requires >=0.21.0),
6591
# uvicorn (optional, requires >=0.13), and fastmcp (requires >=1.1.0). All parents use loose floors,
6692
# so no parent upgrade can force >=1.2.2. Base image ships 1.2.1 in base conda env; we patch
6793
# both base (python 3.13) and ptca (python 3.10) envs to cover all install paths.
68-
RUN conda run -n base python -m pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2'
69-
RUN pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2'
94+
# pip>=26.1.1: GHSA-jp4c-xjxw-mgf9 / VCM 5011855 (CVE-2026-6357). Base image biweekly.202605.1
95+
# ships pip 26.0.1 in BOTH the ptca (py3.10) and base (py3.13) conda envs (per scan paths).
96+
# pip is the Python package installer itself — it is bootstrapped by the conda/python
97+
# distribution and has no parent package that pulls it in, so the only available remediation
98+
# is a direct upgrade in each conda environment. Pattern matches sister env acpt-grpo.
99+
RUN conda run -n base python -m pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2' 'pip>=26.1.1'
100+
RUN pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2' 'pip>=26.1.1'
70101
# ray vendors its own copy of aiohttp inside thirdparty_files/ for runtime_env agent;
71102
# the vendored copy is not upgraded by pip install above. Patching all copies in-place.
72103
RUN find /opt/conda/envs/ptca/lib/python3.10/site-packages/ray -type d -name 'thirdparty_files' | while read dir; do \

assets/training/finetune_acft_image/environments/acft_image_medimageinsight_adapter_finetune/context/Dockerfile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,26 @@ RUN apt-get -y update && apt-get -y upgrade
66

77
RUN apt-get -y install unzip
88

9+
# pip 26.0.1 in both the base (py3.13) and ptca (py3.10) conda envs is
10+
# vulnerable to GHSA-jp4c-xjxw-mgf9 / CVE-2026-6357 (fixed in pip>=26.1).
11+
# pip is a build/install tool installed by conda from the upstream base image
12+
# (mcr.microsoft.com/aifx/acpt/stable-ubuntu2204-cu126-py310-torch280, biweekly
13+
# tag) — there is no parent Python package that brings it in, so an upstream
14+
# parent upgrade is not possible. The base ACPT image has not yet refreshed to
15+
# pip 26.1+ as of 2026-05-12, so we override here. We use `conda install`
16+
# (rather than `pip install --upgrade`) so that conda-meta JSON and
17+
# /opt/conda/pkgs cache are also updated, and we additionally remove stray
18+
# pip-26.0*.dist-info / conda-meta entries from prior pip self-upgrades that
19+
# conda does not track — otherwise the SBOM scanner re-flags them. Done before
20+
# the requirements install so requirements are installed with the patched pip.
21+
RUN conda install -y -n base -c conda-forge pip==26.1.1 && \
22+
conda install -y -n ptca -c conda-forge pip==26.1.1 && \
23+
rm -rf /opt/conda/lib/python3.13/site-packages/pip-26.0*.dist-info && \
24+
rm -f /opt/conda/conda-meta/pip-26.0*.json && \
25+
rm -rf /opt/conda/envs/ptca/lib/python3.10/site-packages/pip-26.0*.dist-info && \
26+
rm -f /opt/conda/envs/ptca/conda-meta/pip-26.0*.json && \
27+
conda clean -ay
28+
929
# Install required packages from pypi
1030
COPY requirements.txt .
1131
RUN pip install -r requirements.txt --no-cache-dir

assets/training/finetune_acft_image/environments/acft_image_mmdetection/context/Dockerfile

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,20 @@ RUN mim install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu118/to
2121
RUN pip install --no-cache-dir --upgrade setuptools==82.0.0
2222
RUN sed -i 's/2.2.0/2.3.0/' /opt/conda/envs/ptca/lib/python3.10/site-packages/mmdet/__init__.py
2323

24-
# requests==2.32.4 in requirements.txt downgrades the base-image version; re-upgrade for security
2524
# onnx: azureml-acft-accelerator pins onnx<=1.17.0 but that range has known CVEs;
2625
# onnx-weekly 1.22.0.dev20260504 includes main-branch security fixes (GHSA-3r9x-f23j-gc73,
27-
# GHSA-538c-55jv-c5g9, GHSA-hqmj-h5c6-369m, etc.) not yet in a stable PyPI release (checked 2026-05-04)
28-
RUN pip install --no-cache-dir --upgrade 'requests>=2.33.0' && \
29-
pip uninstall -y onnx && pip install --no-cache-dir 'onnx-weekly>=1.22.0.dev20260504'
26+
# GHSA-538c-55jv-c5g9, GHSA-hqmj-h5c6-369m, etc.) not yet in a stable PyPI release (checked 2026-05-12)
27+
RUN pip uninstall -y onnx && pip install --no-cache-dir 'onnx-weekly>=1.22.0.dev20260504'
28+
# pip 26.0.1 (GHSA-jp4c-xjxw-mgf9): pip is the Python package installer itself with no upstream parent package,
29+
# so direct upgrade is the only remediation. pip 26.1+ is on PyPI and conda-forge; the conda `defaults` channel
30+
# currently tops out at 26.0.1 (checked 2026-05-12), hence `-c conda-forge` is required. Using `conda install`
31+
# (not `pip install`) so both the dist-info METADATA and the conda-meta JSON are refreshed in one step,
32+
# ensuring SBOM scanners pick up the new version. `--freeze-installed` keeps the rest of the env intact.
33+
RUN conda install -n ptca -y -c conda-forge --freeze-installed 'pip=26.1.1'
3034
# vulnerability in base conda env
3135
# python-dotenv 1.2.1 (GHSA-mf9w-mj56-hr94): brought in by azureml-inference-server-http -> pydantic-settings -> python-dotenv>=0.21.0;
32-
# parent packages do not upper-bound python-dotenv so upgrading them won't force >=1.2.2; direct override required (checked 2026-05-04)
36+
# parent packages do not upper-bound python-dotenv so upgrading them won't force >=1.2.2; direct override required (checked 2026-05-12)
3337
RUN conda run -n base python -m pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2'
38+
# pip 26.0.1 (GHSA-jp4c-xjxw-mgf9) also present in base env (Python 3.13); same direct-upgrade rationale as ptca env above.
39+
RUN conda install -n base -y -c conda-forge --freeze-installed 'pip=26.1.1'
3440
RUN conda clean -a -y && rm -rf /opt/miniconda/pkgs/

assets/training/finetune_acft_image/environments/acft_image_mmdetection/context/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ azureml-acft-accelerator=={{latest-pypi-version}}
33
azureml-acft-common-components[image]~={{latest-pypi-version}}
44
azureml-acft-image-components=={{latest-pypi-version}}
55
azureml-core=={{latest-pypi-version}}
6-
requests==2.32.4
6+
requests>=2.34.0
77
datasets==2.15.0
88
transformers==5.5.4
99
accelerate==0.27.2

assets/training/finetune_acft_image/environments/acft_video_mmtracking/context/Dockerfile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,23 @@ RUN pip install yapf==0.40.1
5151
# requirements.txt is installed above, so this RUN only needs to fix the py3.13 env.
5252
# Bound the upgrade to the 1.x line to keep rebuilds reproducible without locking out future patches.
5353
RUN /opt/conda/bin/python3.13 -m pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2,<2'
54+
55+
# pip 26.0.1 -> 26.1.1 to fix GHSA-jp4c-xjxw-mgf9 (PEP 770 SBOM tag injection in pip install --report).
56+
# Root cause (verified 2026-05 against the built image):
57+
# - pip is a leaf conda package in both /opt/conda (py3.13, base env) and
58+
# /opt/conda/envs/ptca (py3.10) - installed directly by the upstream PTCA
59+
# base image; not pulled in transitively by anything we control.
60+
# - Upstream `defaults` channel (https://repo.anaconda.com/pkgs/main/noarch)
61+
# only ships pip 26.0.1 as of 2026-05; pip 26.1.1 is published on
62+
# conda-forge but has not been mirrored to defaults yet, so the upstream
63+
# base image cannot pick it up via its standard channel.
64+
# - A plain `pip install --upgrade pip` would leave conda-meta/pip-26.0.1*.json
65+
# in place, so trivy would still report the conda-pkg as 26.0.1 even after
66+
# the site-packages dist-info is replaced. We therefore upgrade via conda
67+
# from conda-forge so conda-meta is rewritten cleanly.
68+
# - The transaction is narrowly scoped: pip is channel-qualified to
69+
# conda-forge and pinned exactly; everything else stays on `defaults` to
70+
# avoid pulling other packages over to conda-forge.
71+
RUN /opt/conda/bin/conda install -y -n base --override-channels -c defaults -c conda-forge 'conda-forge::pip=26.1.1' \
72+
&& /opt/conda/bin/conda install -y -n ptca --override-channels -c defaults -c conda-forge 'conda-forge::pip=26.1.1' \
73+
&& /opt/conda/bin/conda clean -afy

assets/training/general/environments/acpt-pytorch-2.2-cuda12.1/context/Dockerfile

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,4 +50,18 @@ RUN pip install --upgrade --no-cache-dir 'cryptography>=46.0.7'
5050
# CVE-2026-23949 (jaraco.context) & CVE-2026-24049 (wheel) - upgrade setuptools in ptca and base envs
5151
# setuptools vendors jaraco.context internally; --force-reinstall --no-deps ensures vendored copies are replaced
5252
RUN pip install --force-reinstall --no-deps 'setuptools>=82.0.1'
53-
RUN conda run -n base pip install --force-reinstall --no-deps 'setuptools>=82.0.1'
53+
RUN conda run -n base pip install --force-reinstall --no-deps 'setuptools>=82.0.1'
54+
55+
# torch 2.7.1+cu118: GHSA-887c-mr87-cxwp / CVE-2025-3730 (local DoS in ctc_loss; fixed in torch 2.8.0).
56+
# Parent: torch is shipped by the base image (mcr.microsoft.com/aifx/acpt/stable-ubuntu2204-cu118-py310-torch271);
57+
# no pip package in requirements.txt pulls torch transitively at a fixable floor.
58+
# Override NOT possible in this image:
59+
# 1. PyTorch upstream dropped CUDA 11.8 wheels starting with torch 2.8.0. The cu118 wheel index
60+
# (https://download.pytorch.org/whl/cu118/torch/) lists 2.7.1 as the highest stable release;
61+
# only a 2.8.0.dev* nightly exists for cu118 and PEP 440 dev versions still satisfy <2.8.0.
62+
# 2. Installing the PyPI default torch 2.8.0 wheel (bundled cu126) would mismatch the rest of the
63+
# cu118-built GPU stack baked into the base image (DeepSpeed 0.13.1, onnxruntime-training-gpu 1.17.1,
64+
# torch-ort 1.17.0), breaking ABI / CUDA compatibility.
65+
# 3. Latest base image tag is biweekly.202601.1 (verified via mcr.microsoft.com tag list); no
66+
# patched cu118 base image is published.
67+
# Permanent fix path: migrate workloads to acpt-pytorch-2.8-cuda12.6 (cu126 + torch 2.8.0).

assets/training/general/environments/acpt-pytorch-2.8-cuda12.6/context/Dockerfile

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,17 @@ RUN apt-get update && \
3535

3636
# Fix security vulnerabilities in ptca conda env (active environment)
3737
# setuptools>=82.0.1: GHSA-58pv-8j8x-9vj2, GHSA-8rrh-rw8j-w5fx; base image has 82.0.0
38-
RUN pip install --upgrade 'setuptools>=82.0.1'
38+
# pip>=26.1: CVE-2026-6357 (GHSA-jp4c-xjxw-mgf9); base image has pip 26.0.1.
39+
# pip is a root tool (no parent package) installed directly by conda; only a
40+
# direct upgrade can fix this.
41+
RUN pip install --upgrade 'setuptools>=82.0.1' 'pip>=26.1'
3942

4043
# Fix security vulnerabilities in base conda env (python 3.13)
4144
# setuptools>=82.0.1: same CVEs as above; base image has 82.0.0
4245
# python-dotenv>=1.2.2: CVE-2026-28684 (GHSA-mf9w-mj56-hr94); base env ships 1.2.1
43-
RUN conda run -n base python -m pip install --upgrade 'setuptools>=82.0.1' 'python-dotenv>=1.2.2'
46+
# pip>=26.1: CVE-2026-6357 (GHSA-jp4c-xjxw-mgf9); base env ships pip 26.0.1.
47+
# pip is a root tool (no parent package) installed directly by conda; only a
48+
# direct upgrade can fix this.
49+
RUN conda run -n base python -m pip install --upgrade 'setuptools>=82.0.1' 'python-dotenv>=1.2.2' 'pip>=26.1'
4450

4551
RUN conda clean -a -y && rm -rf /opt/miniconda/pkgs/

0 commit comments

Comments
 (0)