Update dependency vllm to v0.19.0 [SECURITY]#54
Open
renovate[bot] wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
0.10.1→0.19.0vllm API endpoints vulnerable to Denial of Service Attacks
CVE-2025-48956 / GHSA-rxc4-3w6r-4v47
More information
Details
Summary
A Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user.
Details
The vulnerability leverages the abuse of HTTP headers. By setting a header such as
X-Forwarded-Forto a very large value like("A" * 5_800_000_000), the server's HTTP parser or application logic may attempt to load the entire request into memory, overwhelming system resources.Impact
What kind of vulnerability is it? Who is impacted?
Type of vulnerability: Denial of Service (DoS)
Resolution
Upgrade to a version of vLLM that includes appropriate HTTP limits by deafult, or use a proxy in front of vLLM which provides protection against this issue.
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM has remote code execution vulnerability in the tool call parser for Qwen3-Coder
CVE-2025-9141 / GHSA-79j6-g2m3-jgfw
More information
Details
Summary
An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call.
Details
vLLM's Qwen3 Coder tool parser contains a code execution path that uses Python's
eval()function to parse tool call parameters. This occurs during the parameter conversion process when the parser attempts to handle unknown data types.This code path is reached when:
--enable-auto-tool-choice)--tool-call-parser qwen3_coder)Impact
Remote Code Execution via Python's
eval()function.Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM is vulnerable to timing attack at bearer auth
CVE-2025-59425 / GHSA-wr9h-g72x-mwhm
More information
Details
Summary
The API key support in vLLM performed validation using a method that was vulnerable to a timing attack. This could potentially allow an attacker to discover a valid API key using an approach more efficient than brute force.
Details
https://github.com/vllm-project/vllm/blob/4b946d693e0af15740e9ca9c0e059d5f333b1083/vllm/entrypoints/openai/api_server.py#L1270-L1274
API key validation used a string comparison that will take longer the more characters the provided API key gets correct. Data analysis across many attempts can allow an attacker to determine when it finds the next correct character in the key sequence.
Impact
Deployments relying on vLLM's built-in API key validation are vulnerable to authentication bypass using this technique.
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:NReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server
CVE-2025-61620 / GHSA-6fvq-23cw-5628
More information
Details
Summary
A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the
chat_templateandchat_template_kwargsparameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources.Details
When using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In
hf/transformer, this rendering is performed using a Jinja template. The OpenAI-Compatible Server launched by vllm serve exposes achat_templateparameter that lets users specify that template. In addition, the server accepts achat_template_kwargsparameter to pass extra keyword arguments to the rendering function.Because Jinja templates support programming-language-like constructs (loops, nested iterations, etc.), a crafted template can consume extremely large amounts of CPU and memory and thereby trigger a denial-of-service condition.
Importantly, simply forbidding the
chat_templateparameter does not fully mitigate the issue. The implementation constructs a dictionary of keyword arguments forapply_hf_chat_templateand then updates that dictionary with the user-suppliedchat_template_kwargsviadict.update. Sincedict.updatecan overwrite existing keys, an attacker can place achat_templatekey insidechat_template_kwargsto replace the template that will be used byapply_hf_chat_template.Impact
If an OpenAI-Compatible Server exposes endpoints that accept
chat_templateorchat_template_kwargsfrom untrusted clients, an attacker can submit a malicious Jinja template (directly or by overridingchat_templateinsidechat_template_kwargs) that consumes excessive CPU and/or memory. This can result in a resource-exhaustion denial-of-service that renders the server unresponsive to legitimate requests.Fixes
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM is vulnerable to Server-Side Request Forgery (SSRF) through
MediaConnectorclassCVE-2025-6242 / GHSA-3f6c-7fw2-ppm4
More information
Details
Summary
A Server-Side Request Forgery (SSRF) vulnerability exists in the
MediaConnectorclass within the vLLM project's multimodal feature set. Theload_from_urlandload_from_url_asyncmethods fetch and process media from user-provided URLs without adequate restrictions on the target hosts. This allows an attacker to coerce the vLLM server into making arbitrary requests to internal network resources.This vulnerability is particularly critical in containerized environments like
llm-d, where a compromised vLLM pod could be used to scan the internal network, interact with other pods, and potentially cause denial of service or access sensitive data. For example, an attacker could make the vLLM pod send malicious requests to an internalllm-dmanagement endpoint, leading to system instability by falsely reporting metrics like the KV cache state.Vulnerability Details
The core of the vulnerability lies in the
MediaConnector.load_from_urlmethod and its asynchronous counterpart. These methods accept a URL string to fetch media content (images, audio, video).https://github.com/vllm-project/vllm/blob/119f683949dfed10df769fe63b2676d7f1eb644e/vllm/multimodal/utils.py#L97-L113
The function directly processes URLs with
http,https, andfileschemes. An attacker can supply a URL pointing to an internal IP address or alocalhostendpoint. The vLLM server will then initiate a connection to this internal resource.{"image_url": "http://127.0.0.1:8080/internal_api"}. The vLLM server will send a GET request to this internal endpoint._load_file_urlmethod attempts to restrict file access to a subdirectory defined by--allowed-local-media-path. While this is a good security measure for local file access, it does not prevent network-based SSRF attacks.Impact in
llm-dEnvironmentsThe risk is significantly amplified in orchestrated environments such as
llm-d, where multiple pods communicate over an internal network.Denial of Service (DoS): An attacker could target internal management endpoints of other services within the
llm-dcluster. For instance, if a monitoring or metrics service is exposed internally, an attacker could send malformed requests to it. A specific example is an attacker causing the vLLM pod to call an internal API that reports a false KV cache utilization, potentially triggering incorrect scaling decisions or even a system shutdown.Internal Network Reconnaissance: Attackers can use the vulnerability to scan the internal network for open ports and services by providing URLs like
http://10.0.0.X:PORTand observing the server's response time or error messages.Interaction with Internal Services: Any unsecured internal service becomes a potential target. This could include databases, internal APIs, or other model pods that might not have robust authentication, as they are not expected to be directly exposed.
Delegating this security responsibility to an upper-level orchestrator like
llm-dis problematic. The orchestrator cannot easily distinguish between legitimate requests initiated by the vLLM engine for its own purposes and malicious requests originating from user input, thus complicating traffic filtering rules and increasing management overhead.Fix
See the
--allowed-media-domainsoption discussed here: https://docs.vllm.ai/en/latest/usage/security.html#4-restrict-domains-access-for-media-urlsSeverity
CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:L/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM vulnerable to DoS with incorrect shape of multimodal embedding inputs
CVE-2025-62372 / GHSA-pmqf-x6x8-p7qw
More information
Details
Summary
Users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct
ndimbut incorrectshape(e.g. hidden dimension is wrong), regardless of whether the model is intended to support such inputs (as defined in the Supported Models page).The issue has existed ever since we added support for image embedding inputs, i.e. #6613 (released in v0.5.5)
Details
Using image embeddings as an example:
inputs_embeds(mismatched shape)get_input_embeddings(validation fails).This happens because we only validate
ndimof the tensor, but not the full shape, in input processor (viaMultiModalDataParser).Impact
Mitigation
--limit-mm-per-promptto 0 for all non-text modalities to ban multimodal inputs, which includes multimodal embedding inputs. However, the model would then only accept text, defeating the purpose of using a multi-modal model.Resolution
Severity
CVSS:4.0/AV:N/AC:L/AT:N/PR:L/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM vulnerable to DoS via large Chat Completion or Tokenization requests with specially crafted
chat_template_kwargsCVE-2025-62426 / GHSA-69j4-grxj-j64p
More information
Details
Summary
The /v1/chat/completions and /tokenize endpoints allow a
chat_template_kwargsrequest parameter that is used in the code before it is properly validated against the chat template. With the rightchat_template_kwargsparameters, it is possible to block processing of the API server for long periods of time, delaying all other requestsDetails
In serving_engine.py, the chat_template_kwargs are unpacked into kwargs passed to chat_utils.py
apply_hf_chat_templatewith no validation on the keys or values in that chat_template_kwargs dict. This means they can be used to override optional parameters in theapply_hf_chat_templatemethod, such astokenize, changing its default from False to True.https://github.com/vllm-project/vllm/blob/2a6dc67eb520ddb9c4138d8b35ed6fe6226997fb/vllm/entrypoints/openai/serving_engine.py#L809-L814
https://github.com/vllm-project/vllm/blob/2a6dc67eb520ddb9c4138d8b35ed6fe6226997fb/vllm/entrypoints/chat_utils.py#L1602-L1610
Both serving_chat.py and serving_tokenization.py call into this
_preprocess_chatmethod ofserving_engine.pyand they both pass inchat_template_kwargs.So, a
chat_template_kwargslike{"tokenize": True}makes tokenization happen as part of applying the chat template, even though that is not expected. Tokenization is a blocking operation, and with sufficiently large input can block the API server's event loop, which blocks handling of all other requests until this tokenization is complete.This optional
tokenizeparameter toapply_hf_chat_templatedoes not appear to be used, so one option would be to just hard-code that to always be False instead of allowing it to be optionally overridden by callers. A better option may be to not passchat_template_kwargsas unpacked kwargs but instead as a dict, and only unpack them after the logic inapply_hf_chat_templatethat resolves the kwargs against the chat template.Impact
Any authenticated user can cause a denial of service to a vLLM server with Chat Completion or Tokenize requests.
Fix
https://github.com/vllm-project/vllm/pull/27205
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM vulnerable to remote code execution via transformers_utils/get_config
CVE-2025-66448 / GHSA-8fr4-5q9j-m8gm
More information
Details
Summary
vllmhas a critical remote code execution vector in a config class namedNemotron_Nano_VL_Config. Whenvllmloads a model config that contains anauto_mapentry, the config class resolves that mapping withget_class_from_dynamic_module(...)and immediately instantiates the returned class. This fetches and executes Python from the remote repository referenced in theauto_mapstring. Crucially, this happens even when the caller explicitly setstrust_remote_code=Falseinvllm.transformers_utils.config.get_config. In practice, an attacker can publish a benign-looking frontend repo whoseconfig.jsonpoints viaauto_mapto a separate malicious backend repo; loading the frontend will silently run the backend’s code on the victim host.Details
The vulnerable code resolves and instantiates classes from
auto_mapentries without checking whether those entries point to a different repo or whether remote code execution is allowed.get_class_from_dynamic_module(...)is capable of fetching and importing code from the Hugging Face repo specified in the mapping.trust_remote_codeis not enforced for this code path. As a result, a frontend repo can redirect the loader to any backend repo and cause code execution, bypassing thetrust_remote_codeguard.Impact
This is a critical vulnerability because it breaks the documented
trust_remote_codesafety boundary in a core model-loading utility. The vulnerable code lives in a common loading path, so any application, service, CI job, or developer machine that usesvllm’s transformer utilities to load configs can be affected. The attack requires only two repos and no user interaction beyond loading the frontend model. A successful exploit can execute arbitrary commands on the host.Fixes
Severity
CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions
CVE-2026-22773 / GHSA-grg2-63fw-f2qr
More information
Details
Summary
Users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This causes a tensor dimension mismatch that results in an unhandled runtime error, leading to complete server termination.
Details
The vulnerability is triggered when the image processor encounters a 1x1 pixel image with shape (1, 1, 3) in HWC (Height, Width, Channel) format. Due to the ambiguous dimensions, the processor incorrectly assumes the image is in CHW (Channel, Height, Width) format with shape (3, H, W). This misinterpretation causes an incorrect calculation of the number of image patches, resulting in a fatal tensor split operation failure.
Crash location:
vllm/model_executor/models/idefics3.pyline 672:The
split()call fails because the computednum_patchesvalue (17) does not match the actual tensor dimension (9):This unhandled exception terminates the EngineCore process, crashing the server.
Affected Models
Any model using the Idefics3 architecture. The vulnerability was tested with
HuggingFaceTB/SmolVLM-Instruct.Impact
Denial of service by crashing the engine
Mitigation
Validating the input:
Managing the exception:
Fixes
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM affected by RCE via auto_map dynamic module loading during model initialization
CVE-2026-22807 / GHSA-2pc9-4j83-qjmr
More information
Details
Summary
vLLM loads Hugging Face
auto_mapdynamic modules during model resolution without gating ontrust_remote_code, allowing attacker-controlled Python code in a model repo/path to execute at server startup.Impact
An attacker who can influence the model repo/path (local directory or remote Hugging Face repo) can achieve arbitrary code execution on the vLLM host during model load.
This happens before any request handling and does not require API access.
Affected Versions
All versions where
vllm/model_executor/models/registry.pyresolvesauto_mapentries withtry_get_class_from_dynamic_modulewithout checkingtrust_remote_code(at least currentmain).Details
During model resolution, vLLM unconditionally iterates
auto_mapentries from the model config and callstry_get_class_from_dynamic_module, which delegates to Transformers’get_class_from_dynamic_moduleand executes the module code.This occurs even when
trust_remote_codeisfalse, allowing a malicious model repo to embed code in a referenced module and have it executed during initialization.Relevant code
vllm/model_executor/models/registry.py:856— auto_map resolutionvllm/transformers_utils/dynamic_module.py:13— delegates toget_class_from_dynamic_module, which executes codeFixes
Credits
Reported by bugbunny.ai
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM vulnerable to Server-Side Request Forgery (SSRF) through MediaConnector
CVE-2026-24779 / GHSA-qh4c-xf7m-gxfc
More information
Details
Summary
A Server-Side Request Forgery (SSRF) vulnerability exists in the
MediaConnectorclass within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods obtain and process media from URLs provided by users, using different Python parsing libraries when restricting the target host. These two parsing libraries have different interpretations of backslashes, which allows the host name restriction to be bypassed. This allows an attacker to coerce the vLLM server into making arbitrary requests to internal network resources.This vulnerability is particularly critical in containerized environments like
llm-d, where a compromised vLLM pod could be used to scan the internal network, interact with other pods, and potentially cause Denial of Service or access sensitive data. For example, an attacker could make the vLLM pod send malicious requests to an internalllm-dmanagement endpoint, leading to system instability by falsely reporting metrics like the KV cache state.Details
The core of the vulnerability lies in the
MediaConnector.load_from_urlmethod and its asynchronous counterpart. These methods accept a URL string to fetch media content (images, audio, video).The URL validation uses the
urlparsefunction from Python'surllibmodule, while the request is made using therequestfunction from Python'srequestsmodule. Therequestsmodule's underlying URL parsing is implemented using theparse_urlfunction from Python'surllib3. These two parsing functions follow different URL specifications; one is implemented according to the RFC 3986 specification, and the other is implemented according to the WHATWG Living Standard. There is a difference in how the two functions handle backslashes (\) in URLs, which allows the hostname restriction to be bypassed.Fix
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:LReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM has RCE In Video Processing
CVE-2026-22778 / GHSA-4r2x-xpjr-7cvv
More information
Details
Summary
A chain of vulnerabilities in vLLM allow Remote Code Execution (RCE):
Result: Send a malicious video URL to vLLM Completions or Invocations for a video model -> Execute arbitrary commands on the server
Completely default vLLM instance directly from pip, or docker, does not have authentication so "None" privileges are required, but even with non-default api-key enabled configuration this exploit is feasible through invocations route that allows payload to execute pre-auth.
Example heap target is provided, other heap targets can be exploited as well to achieve rce. Leak allows for simple ASLR bypass. Leak + heap overflow achieves RCE on versions prior to 0.14.1.
Deployments not serving a video model are not affected.
1. Vulnerability Overview
1.1 The Bug: JPEG2000 cdef Box Heap Overflow
The JPEG2000 decoder used by OpenCV (cv2) honors a
cdefbox that can remap color channels. When Y (luma) is mapped into the U (chroma) plane buffer, the decoder writes a large Y plane into the smaller U buffer, causing a heap overflow.Root Cause
cdefallows channel remapping (e.g., Y→U, U→Y).W×H; U plane size:(W/2)×(H/2).W×H - (W/2×H/2)=0.75 × W × Hbytes.Example (150×64)
1.2 Malicious cdef Box
Key control:
Asoc=2for channel 0 forces Y data into the U buffer, triggering the overflow.Vulnerable Code Chain
1) Entry: vLLM accepts a remote
video_urland downloads raw bytesvLLM’s OpenAI-compatible API supports a
video_urlcontent part:Source:
src/vllm/entrypoints/chat_utils.py.When the URL is HTTP(S), vLLM downloads it as raw bytes and passes the bytes into the modality loader:
Source:
src/vllm/multimodal/utils.py(MediaConnector.load_from_url).2) Decode: vLLM uses OpenCV (cv2) VideoCapture on an in-memory byte stream
The default video backend is OpenCV, and it constructs
cv2.VideoCaptureover aBytesIObuffer containing the downloaded bytes:Source:
src/vllm/multimodal/video.py(OpenCVVideoBackend.load_bytes).The backend is selected from OpenCV’s stream-buffered backends registry:
Source:
src/vllm/multimodal/video.py(OpenCVVideoBackend.get_cv2_video_api).Implication: vLLM is delegating container parsing + codec decode to OpenCV’s Video I/O stack (which, in typical builds, is backed by FFmpeg for MOV/MP4 and codecs like JPEG2000).
3) The actual overflow: Y (full-res) written into U (quarter-res)
When the decoder honors the remap and writes Y into the U-plane buffer, it writes too many bytes:
Concrete example tried (150×64):
This is a heap buffer overflow into whatever allocations follow the U-plane buffer in the decoder’s heap layout (structures, metadata, other buffers, etc.). The exact victims depend on build + runtime allocator layout.
The Exploit Chain
Vuln 1: PIL BytesIO Address Leak (ASLR Bypass)
When you send an invalid image to vLLM's multimodal endpoint, PIL throws an error like:
vLLM returns this error to the client, leaking a heap address. This address is ~10.33 GB before
libcin memory. With this leak, we reduce ASLR from 4 billion guesses to ~8 guesses.Vuln 2: JPEG2000 cdef Heap Overflow (RCE)
vLLM uses OpenCV (cv2) to decode videos. OpenCV bundles FFmpeg 5.1.x which has a heap overflow in the JPEG2000 decoder. The OpenCV is used for video decoding so if we build a video from JPEG2000 frames it will reach the vuln:
How the overflow works:
cdefbox that remaps color channelsAVBufferstructure containing afree()function pointer. This could be any function pointer or other targets.free = system()andopaque = "command string"system("our command")executesvLLM Attack Surface
Affected Endpoints
Both multimodal endpoints are vulnerable:
Request Flow
Versions Affected
Fixes
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM has Hardcoded Trust Override in Model Files Enables RCE Despite Explicit User Opt-Out
CVE-2026-27893 / GHSA-7972-pg2x-xr59
More information
Details
Summary
Two model implementation files hardcode
trust_remote_code=Truewhen loading sub-components, bypassing the user's explicit--trust-remote-code=Falsesecurity opt-out. This enables remote code execution via malicious modelrepositories even when the user has explicitly disabled remote code trust.
Details
Affected files (latest main branch):
vllm/model_executor/models/nemotron_vl.py:430Both pass a hardcoded trust_remote_code=True to HuggingFace API calls, overriding the user's global --trust-remote-code=False setting.
Relation to prior CVEs:
Impact
Remote code execution. An attacker can craft a malicious model repository that executes arbitrary Python code when loaded by vLLM, even when the user has explicitly set --trust-remote-code=False. This undermines the security guarantee
that trust_remote_code=False is intended to provide.
Remediation: Replace hardcoded trust_remote_code=True with self.config.model_config.trust_remote_code in both files. Raise a clear error if the model component requires remote code but the user hasn't opted in.
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM: Unauthenticated OOM Denial of Service via Unbounded
nParameter in OpenAI API ServerCVE-2026-34756 / GHSA-3mwp-wvh9-7528
More information
Details
Summary
A Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the
nparameter in theChatCompletionRequestandCompletionRequestPydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically largenvalue. This completely blocks the Pythonasyncioevent loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue.Details
The root cause of this vulnerability lies in the missing upper bound checks across the request parsing and asynchronous scheduling layers:
In
vllm/entrypoints/openai/chat_completion/protocol.py, thenparameter is defined simply as an integer without anypydantic.Fieldconstraints for an upper bound.When the API request is converted to internal
SamplingParamsinvllm/sampling_params.py, the_verify_argsmethod only checks the lower bound (self.n < 1), entirely omitting an upper bounds check.When the malicious request reaches the core engine (
vllm/v1/engine/async_llm.py), the engine attempts to fan out the requestntimes to generate identical independent sequences within a synchronous loop.Because Python's
asyncioruns on a single thread and event loop, this monolithicfor-loop monopolizes the CPU thread. The server stops responding to all other connections (including liveness probes). Simultaneously, the memory allocator is overwhelmed by cloning millions of request object instances viacopy(request), driving the host's Resident Set Size (RSS) up by gigabytes per second until the OSOOM-killerterminates the vLLM process.Impact
Vulnerability Type: Resource Exhaustion / Denial of Service
Impacted Parties:
vllm.entrypoints.openai.api_server), which happens to be the primary entrypoint for OpenAI-compatible setups.Because this vulnerability exploits the control plane rather than the data plane, an unauthenticated remote attacker can achieve a high success rate in taking down production inference hosts with a single HTTP request. This effectively circumvents any hardware-level capacity planning and conventional bandwidth stress limitations.
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
vLLM: Denial of Service via Unbounded Frame Count in video/jpeg Base64 Processing
CVE-2026-34755 / GHSA-pq5c-rjhq-qp7p
More information
Details
Summary
The
VideoMediaIO.load_base64()method atvllm/multimodal/media/video.py:51-62splitsvideo/jpegdata URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. Thenum_framesparameter (default: 32), which is enforced by theload_bytes()code path at line 47-48, is completely bypassed in thevideo/jpegbase64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM.Details
Vulnerable code
The
load_bytes()path (line 47-48) properly delegates to a video loader that respectsself.num_frames(default 32). Theload_base64("video/jpeg", ...)path bypasses this limit entirely —data.split(",")produces an unbounded list and every frame is decoded into a numpy array.video/jpeg is part of vLLM's public API
video/jpegis a vLLM-specific MIME type, not IANA-registered. However it is part of the public API surface:encode_video_url()atvllm/multimodal/utils.py:96-108generatesdata:video/jpeg;base64,...URLstests/entrypoints/openai/test_video.py:62andtests/entrypoints/test_chat_utils.py:153both use this formatMemory amplification
Each JPEG frame decodes to a full numpy array. For 640x480 RGB images, each frame is ~921 KB decoded. 5000 frames = ~4.6 GB.
np.stack()then creates an additional copy. The compressed JPEG payload is small (~100 KB for 5000 frames) but decompresses to gigabytes.Data flow