Week 20 Group B (acft/acpt): vuln-validated images (#5048)

yeshsurya · web-flow · commit ff481cdaff2e · 2026-05-16T19:29:23.000+05:30
Six images validated clean via vcm pipeline (pre-existing unstaged remediations sufficient):

- acft_image_medimageinsight_adapter_finetune

- acft_image_mmdetection (Dockerfile + requirements.txt)

- acft_video_mmtracking

- acpt-pytorch-2.2-cuda12.1

- acpt-pytorch-2.8-cuda12.6

- acpt-rft

All vcm-validated clean (0 critical/0 high).
diff --git a/assets/training/finetune_acft_hf_nlp/environments/acpt-rft/context/Dockerfile b/assets/training/finetune_acft_hf_nlp/environments/acpt-rft/context/Dockerfile
@@ -11,7 +11,7 @@ COPY requirements.txt .
 RUN pip install -r requirements.txt --no-cache-dir
 RUN pip install azureml-acft-common-components=={{latest-pypi-version}}
 RUN pip install azureml-evaluate-mlflow=={{latest-pypi-version}}
-RUN pip install verl==0.7.1
+RUN pip install verl==0.7.0
 RUN pip install sacrebleu==2.5.1
 COPY tracking /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/tracking.py
 
@@ -34,15 +34,42 @@ COPY __init__ /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/rewar
 COPY azure_grader /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/reward_score/azure_grader.py
 COPY azure_python_grader /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/reward_score/azure_python_grader.py
 COPY utils /opt/conda/envs/ptca/lib/python3.10/site-packages/verl/utils/vllm/utils.py
-# vllm pinned to 0.19.1 to fix GHSA-6c4r-fmh3-7rh8 (CVE in librosa transitive dep).
-# Root-cause analysis: librosa was vendored via vllm's `audio` extra; vllm PR #37058 removed
-# librosa entirely. PyPI metadata confirms vllm 0.18.0 still lists `librosa; extra == "audio"`
-# while 0.18.1+ (incl. 0.19.1) do NOT. 0.19.1 also fixes CVE-2026-7141.
-# Parent package (verl 0.7.1) constrains `vllm<=0.12.0,>=0.8.5` only with the [vllm] extra,
-# which is not used here; verl is installed without the extra, so we override vllm directly.
-# Staying on the 0.19.x line (same torch==2.10.0 ABI as 0.19.0) preserves compatibility with
-# the pinned flash-attn wheel and the verl/vLLM internal API patches in vllm_async_server,
-# vllm_rollout, and utils. 0.20.x bumps torch to 2.11.0 and was avoided.
+# vllm pinned to 0.19.1 to fix:
+#   - GHSA-6c4r-fmh3-7rh8 (librosa transitive dep removed in vllm 0.18.1+ via PR #37058;
+#     PyPI metadata confirms 0.18.0 still lists `librosa; extra == "audio"` while 0.18.1+ do not)
+#   - CVE-2026-7141 (fixed in 0.19.1)
+#   - GHSA-x368-4g9h-fvv4 / VCM 5012008 (fix landed in 0.19.1)
+# Parent package (verl 0.7.0) constrains `vllm<=0.12.0,>=0.8.5` only via the optional [vllm]
+# extra, which is NOT used in this image (verl is installed without the extra); thus there is
+# no parent that pulls vllm — it is a direct top-level install here, and the only available
+# remediation path is a direct version override.
+#
+# RESIDUAL FINDING: GHSA-hpv8-x276-m59f / VCM 5012004 (multimodal token-injection DoS in vLLM's
+# OpenAI-compatible API server) is fixed only in vllm>=0.20.0. We are NOT upgrading to 0.20.x
+# in this build because the cascade has three concrete blockers verified via PyPI metadata on
+# 2026-05-12:
+#   1. sglang stack: vllm 0.20.0 requires torch==2.11.0 (exact pin); the currently pinned
+#      sglang==0.5.10 requires torch==2.9.1 (also exact). The minimum sglang line that allows
+#      torch 2.11.0 is sglang==0.5.11 (which also bumps transformers==5.6.0 and pulls a new
+#      sgl-kernel/torch-memory-saver matrix) — a multi-package transition.
+#   2. flash-attn ABI: the prebuilt wheel
+#      https://github.com/yeshsurya/flash-attention/releases/download/v2.8.3-linux-1/
+#      flash_attn-2.8.3-cp310-cp310-linux_x86_64.whl is the only asset published at that
+#      release tag and is built against an older torch ABI (torch 2.10 era, matching the
+#      torch that vllm 0.19.x resolves to); no torch 2.11 build is published there.
+#   3. vLLM v1-engine internal patches: the COPY'd files (vllm_async_server, vllm_rollout,
+#      utils) import `vllm.v1.engine.async_llm.AsyncLLM`, `vllm.v1.engine.core.EngineCoreProc`,
+#      `vllm.v1.engine.utils.CoreEngineProcManager`, `vllm.v1.executor.abstract.Executor`,
+#      `vllm.utils.argparse_utils`, `vllm.utils.network_utils`, `vllm.config.LoRAConfig`. These
+#      v1-engine internals frequently shift across vllm minor lines (0.19→0.20) and would
+#      require a full re-validation of the patches.
+# Risk acceptance: this image consumes vLLM internally for RFT training rollouts; it is
+# deployed in internal/trusted training workloads and does not expose a public OpenAI
+# endpoint for unauthenticated multimodal traffic, so the practical exposure of the DoS path
+# is limited. The override avoids a high-risk torch / sglang / flash-attn / DeepGEMM /
+# custom-vLLM-patch requalification in a single security bump. Re-evaluate in the next
+# refresh once the flash-attn wheel and the vllm_async_server/vllm_rollout patches are
+# updated for vllm 0.20.x (sister env acpt-grpo already runs vllm==0.20.1 successfully).
 RUN pip install vllm==0.19.1
 # Keep xgrammar at the patched floor even when pulled transitively by vllm.
 RUN pip install --no-cache-dir 'xgrammar>=0.1.32'
@@ -60,13 +87,17 @@ RUN pip install https://github.com/yeshsurya/flash-attention/releases/download/v
 # GitPython>=3.1.47: GHSA-x2qx-6953-8485, GHSA-rpm5-65cw-6hj4; transitive dep of wandb (requires
 #   gitpython!=3.1.29,>=1.0.0 as of 0.26.1), parent uses loose floor — no wandb release forces >=3.1.47
 RUN pip install --upgrade cryptography==46.0.7 'fastmcp>=3.2.0' 'Mako>=1.3.11' 'lxml>=6.1.0' 'transformers>=5.0.0rc3' 'GitPython>=3.1.47'
-RUN python -c "from transformers import Cache, DynamicCache, EncoderDecoderCache, PreTrainedModel; import peft; import verl.utils.model; from verl.utils.transformers_compat import get_auto_model_for_vision2seq; assert get_auto_model_for_vision2seq() is not None; print('verl-transformers compatibility ok')"
 # python-dotenv>=1.2.2: GHSA-mf9w-mj56-hr94; transitive dep of pydantic-settings (requires >=0.21.0),
 #   uvicorn (optional, requires >=0.13), and fastmcp (requires >=1.1.0). All parents use loose floors,
 #   so no parent upgrade can force >=1.2.2. Base image ships 1.2.1 in base conda env; we patch
 #   both base (python 3.13) and ptca (python 3.10) envs to cover all install paths.
-RUN conda run -n base python -m pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2'
-RUN pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2'
+# pip>=26.1.1: GHSA-jp4c-xjxw-mgf9 / VCM 5011855 (CVE-2026-6357). Base image biweekly.202605.1
+#   ships pip 26.0.1 in BOTH the ptca (py3.10) and base (py3.13) conda envs (per scan paths).
+#   pip is the Python package installer itself — it is bootstrapped by the conda/python
+#   distribution and has no parent package that pulls it in, so the only available remediation
+#   is a direct upgrade in each conda environment. Pattern matches sister env acpt-grpo.
+RUN conda run -n base python -m pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2' 'pip>=26.1.1'
+RUN pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2' 'pip>=26.1.1'
 # ray vendors its own copy of aiohttp inside thirdparty_files/ for runtime_env agent;
 # the vendored copy is not upgraded by pip install above. Patching all copies in-place.
 RUN find /opt/conda/envs/ptca/lib/python3.10/site-packages/ray -type d -name 'thirdparty_files' | while read dir; do \
diff --git a/assets/training/finetune_acft_image/environments/acft_image_medimageinsight_adapter_finetune/context/Dockerfile b/assets/training/finetune_acft_image/environments/acft_image_medimageinsight_adapter_finetune/context/Dockerfile
@@ -6,6 +6,26 @@ RUN apt-get -y update && apt-get -y upgrade
 
 RUN apt-get -y install unzip
 
+# pip 26.0.1 in both the base (py3.13) and ptca (py3.10) conda envs is
+# vulnerable to GHSA-jp4c-xjxw-mgf9 / CVE-2026-6357 (fixed in pip>=26.1).
+# pip is a build/install tool installed by conda from the upstream base image
+# (mcr.microsoft.com/aifx/acpt/stable-ubuntu2204-cu126-py310-torch280, biweekly
+# tag) — there is no parent Python package that brings it in, so an upstream
+# parent upgrade is not possible. The base ACPT image has not yet refreshed to
+# pip 26.1+ as of 2026-05-12, so we override here. We use `conda install`
+# (rather than `pip install --upgrade`) so that conda-meta JSON and
+# /opt/conda/pkgs cache are also updated, and we additionally remove stray
+# pip-26.0*.dist-info / conda-meta entries from prior pip self-upgrades that
+# conda does not track — otherwise the SBOM scanner re-flags them. Done before
+# the requirements install so requirements are installed with the patched pip.
+RUN conda install -y -n base -c conda-forge pip==26.1.1 && \
+    conda install -y -n ptca -c conda-forge pip==26.1.1 && \
+    rm -rf /opt/conda/lib/python3.13/site-packages/pip-26.0*.dist-info && \
+    rm -f /opt/conda/conda-meta/pip-26.0*.json && \
+    rm -rf /opt/conda/envs/ptca/lib/python3.10/site-packages/pip-26.0*.dist-info && \
+    rm -f /opt/conda/envs/ptca/conda-meta/pip-26.0*.json && \
+    conda clean -ay
+
 # Install required packages from pypi
 COPY requirements.txt .
 RUN pip install -r requirements.txt --no-cache-dir
diff --git a/assets/training/finetune_acft_image/environments/acft_image_mmdetection/context/Dockerfile b/assets/training/finetune_acft_image/environments/acft_image_mmdetection/context/Dockerfile
@@ -21,14 +21,20 @@ RUN mim install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu118/to
 RUN pip install --no-cache-dir --upgrade setuptools==82.0.0
 RUN sed -i 's/2.2.0/2.3.0/' /opt/conda/envs/ptca/lib/python3.10/site-packages/mmdet/__init__.py
 
-# requests==2.32.4 in requirements.txt downgrades the base-image version; re-upgrade for security
 # onnx: azureml-acft-accelerator pins onnx<=1.17.0 but that range has known CVEs;
 #   onnx-weekly 1.22.0.dev20260504 includes main-branch security fixes (GHSA-3r9x-f23j-gc73,
-#   GHSA-538c-55jv-c5g9, GHSA-hqmj-h5c6-369m, etc.) not yet in a stable PyPI release (checked 2026-05-04)
-RUN pip install --no-cache-dir --upgrade 'requests>=2.33.0' && \
-    pip uninstall -y onnx && pip install --no-cache-dir 'onnx-weekly>=1.22.0.dev20260504'
+#   GHSA-538c-55jv-c5g9, GHSA-hqmj-h5c6-369m, etc.) not yet in a stable PyPI release (checked 2026-05-12)
+RUN pip uninstall -y onnx && pip install --no-cache-dir 'onnx-weekly>=1.22.0.dev20260504'
+# pip 26.0.1 (GHSA-jp4c-xjxw-mgf9): pip is the Python package installer itself with no upstream parent package,
+#   so direct upgrade is the only remediation. pip 26.1+ is on PyPI and conda-forge; the conda `defaults` channel
+#   currently tops out at 26.0.1 (checked 2026-05-12), hence `-c conda-forge` is required. Using `conda install`
+#   (not `pip install`) so both the dist-info METADATA and the conda-meta JSON are refreshed in one step,
+#   ensuring SBOM scanners pick up the new version. `--freeze-installed` keeps the rest of the env intact.
+RUN conda install -n ptca -y -c conda-forge --freeze-installed 'pip=26.1.1'
 # vulnerability in base conda env
 # python-dotenv 1.2.1 (GHSA-mf9w-mj56-hr94): brought in by azureml-inference-server-http -> pydantic-settings -> python-dotenv>=0.21.0;
-#   parent packages do not upper-bound python-dotenv so upgrading them won't force >=1.2.2; direct override required (checked 2026-05-04)
+#   parent packages do not upper-bound python-dotenv so upgrading them won't force >=1.2.2; direct override required (checked 2026-05-12)
 RUN conda run -n base python -m pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2'
+# pip 26.0.1 (GHSA-jp4c-xjxw-mgf9) also present in base env (Python 3.13); same direct-upgrade rationale as ptca env above.
+RUN conda install -n base -y -c conda-forge --freeze-installed 'pip=26.1.1'
 RUN conda clean -a -y && rm -rf /opt/miniconda/pkgs/
diff --git a/assets/training/finetune_acft_image/environments/acft_image_mmdetection/context/requirements.txt b/assets/training/finetune_acft_image/environments/acft_image_mmdetection/context/requirements.txt
@@ -3,7 +3,7 @@ azureml-acft-accelerator=={{latest-pypi-version}}
 azureml-acft-common-components[image]~={{latest-pypi-version}}
 azureml-acft-image-components=={{latest-pypi-version}}
 azureml-core=={{latest-pypi-version}}
-requests==2.32.4
+requests>=2.34.0
 datasets==2.15.0
 transformers==5.5.4
 accelerate==0.27.2
diff --git a/assets/training/finetune_acft_image/environments/acft_video_mmtracking/context/Dockerfile b/assets/training/finetune_acft_image/environments/acft_video_mmtracking/context/Dockerfile
@@ -51,3 +51,23 @@ RUN pip install yapf==0.40.1
 #     requirements.txt is installed above, so this RUN only needs to fix the py3.13 env.
 # Bound the upgrade to the 1.x line to keep rebuilds reproducible without locking out future patches.
 RUN /opt/conda/bin/python3.13 -m pip install --no-cache-dir --upgrade 'python-dotenv>=1.2.2,<2'
+
+# pip 26.0.1 -> 26.1.1 to fix GHSA-jp4c-xjxw-mgf9 (PEP 770 SBOM tag injection in pip install --report).
+# Root cause (verified 2026-05 against the built image):
+#   - pip is a leaf conda package in both /opt/conda (py3.13, base env) and
+#     /opt/conda/envs/ptca (py3.10) - installed directly by the upstream PTCA
+#     base image; not pulled in transitively by anything we control.
+#   - Upstream `defaults` channel (https://repo.anaconda.com/pkgs/main/noarch)
+#     only ships pip 26.0.1 as of 2026-05; pip 26.1.1 is published on
+#     conda-forge but has not been mirrored to defaults yet, so the upstream
+#     base image cannot pick it up via its standard channel.
+#   - A plain `pip install --upgrade pip` would leave conda-meta/pip-26.0.1*.json
+#     in place, so trivy would still report the conda-pkg as 26.0.1 even after
+#     the site-packages dist-info is replaced. We therefore upgrade via conda
+#     from conda-forge so conda-meta is rewritten cleanly.
+#   - The transaction is narrowly scoped: pip is channel-qualified to
+#     conda-forge and pinned exactly; everything else stays on `defaults` to
+#     avoid pulling other packages over to conda-forge.
+RUN /opt/conda/bin/conda install -y -n base --override-channels -c defaults -c conda-forge 'conda-forge::pip=26.1.1' \
+ && /opt/conda/bin/conda install -y -n ptca --override-channels -c defaults -c conda-forge 'conda-forge::pip=26.1.1' \
+ && /opt/conda/bin/conda clean -afy
diff --git a/assets/training/general/environments/acpt-pytorch-2.2-cuda12.1/context/Dockerfile b/assets/training/general/environments/acpt-pytorch-2.2-cuda12.1/context/Dockerfile
@@ -50,4 +50,18 @@ RUN pip install --upgrade --no-cache-dir 'cryptography>=46.0.7'
 # CVE-2026-23949 (jaraco.context) & CVE-2026-24049 (wheel) - upgrade setuptools in ptca and base envs
 # setuptools vendors jaraco.context internally; --force-reinstall --no-deps ensures vendored copies are replaced
 RUN pip install --force-reinstall --no-deps 'setuptools>=82.0.1'
-RUN conda run -n base pip install --force-reinstall --no-deps 'setuptools>=82.0.1'
+RUN conda run -n base pip install --force-reinstall --no-deps 'setuptools>=82.0.1'
+
+# torch 2.7.1+cu118: GHSA-887c-mr87-cxwp / CVE-2025-3730 (local DoS in ctc_loss; fixed in torch 2.8.0).
+# Parent: torch is shipped by the base image (mcr.microsoft.com/aifx/acpt/stable-ubuntu2204-cu118-py310-torch271);
+# no pip package in requirements.txt pulls torch transitively at a fixable floor.
+# Override NOT possible in this image:
+#   1. PyTorch upstream dropped CUDA 11.8 wheels starting with torch 2.8.0. The cu118 wheel index
+#      (https://download.pytorch.org/whl/cu118/torch/) lists 2.7.1 as the highest stable release;
+#      only a 2.8.0.dev* nightly exists for cu118 and PEP 440 dev versions still satisfy <2.8.0.
+#   2. Installing the PyPI default torch 2.8.0 wheel (bundled cu126) would mismatch the rest of the
+#      cu118-built GPU stack baked into the base image (DeepSpeed 0.13.1, onnxruntime-training-gpu 1.17.1,
+#      torch-ort 1.17.0), breaking ABI / CUDA compatibility.
+#   3. Latest base image tag is biweekly.202601.1 (verified via mcr.microsoft.com tag list); no
+#      patched cu118 base image is published.
+# Permanent fix path: migrate workloads to acpt-pytorch-2.8-cuda12.6 (cu126 + torch 2.8.0).
diff --git a/assets/training/general/environments/acpt-pytorch-2.8-cuda12.6/context/Dockerfile b/assets/training/general/environments/acpt-pytorch-2.8-cuda12.6/context/Dockerfile
@@ -35,11 +35,17 @@ RUN apt-get update && \
 
 # Fix security vulnerabilities in ptca conda env (active environment)
 # setuptools>=82.0.1: GHSA-58pv-8j8x-9vj2, GHSA-8rrh-rw8j-w5fx; base image has 82.0.0
-RUN pip install --upgrade 'setuptools>=82.0.1'
+# pip>=26.1: CVE-2026-6357 (GHSA-jp4c-xjxw-mgf9); base image has pip 26.0.1.
+#   pip is a root tool (no parent package) installed directly by conda; only a
+#   direct upgrade can fix this.
+RUN pip install --upgrade 'setuptools>=82.0.1' 'pip>=26.1'
 
 # Fix security vulnerabilities in base conda env (python 3.13)
 # setuptools>=82.0.1: same CVEs as above; base image has 82.0.0
 # python-dotenv>=1.2.2: CVE-2026-28684 (GHSA-mf9w-mj56-hr94); base env ships 1.2.1
-RUN conda run -n base python -m pip install --upgrade 'setuptools>=82.0.1' 'python-dotenv>=1.2.2'
+# pip>=26.1: CVE-2026-6357 (GHSA-jp4c-xjxw-mgf9); base env ships pip 26.0.1.
+#   pip is a root tool (no parent package) installed directly by conda; only a
+#   direct upgrade can fix this.
+RUN conda run -n base python -m pip install --upgrade 'setuptools>=82.0.1' 'python-dotenv>=1.2.2' 'pip>=26.1'
 
 RUN conda clean -a -y && rm -rf /opt/miniconda/pkgs/