[https://nvbugs/5911304][fix] Add URL validation and request hardening for media input loading by yibinl-nvidia · Pull Request #12748 · NVIDIA/TensorRT-LLM

yibinl-nvidia · 2026-04-03T21:33:18Z

Summary by CodeRabbit

Bug Fixes
- Enhanced security for remote content loading by validating URLs and blocking requests to private/reserved IP addresses
- Added response size limits and redirect handling controls for remote file fetches
- Restricted remote content loading to HTTP/HTTPS protocols only

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-03T21:38:25Z

📝 Walkthrough

Walkthrough

Added security validation for remote URL fetching in tensorrt_llm/inputs/utils.py. Introduced helper functions to enforce HTTP/HTTPS-only URLs, validate IP addresses against private/reserved ranges, limit response sizes to 200 MB, and restrict redirects to 5. Updated media loading and content encoding functions to use these safe wrappers and reject unsupported schemes.

Changes

Cohort / File(s)	Summary
Security validation for remote URL fetching `tensorrt_llm/inputs/utils.py`	Added internal helper functions `_validate_url`, `_safe_request_get`, and `_safe_aiohttp_get` to enforce HTTP/HTTPS-only URLs, validate IP addresses, limit responses to 200 MB, and restrict redirects to 5. Updated `load_image`, `async_load_image`, `load_video`, `async_load_video`, `load_audio`, `async_load_audio`, and `encode_base64_content_from_url` to use these wrappers. Removed implicit empty-scheme handling in `load_video`.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Calling Code
    participant Validator as URL Validator
    participant Resolver as IP Resolver
    participant HTTPClient as HTTP Client
    participant RemoteServer as Remote Server

    Client->>Validator: Request with URL
    Validator->>Validator: Check scheme (http/https only)
    alt Invalid scheme
        Validator-->>Client: Raise ValueError
    end
    Validator->>Resolver: Resolve URL to IP address
    alt Resolution fails
        Resolver-->>Validator: Error
        Validator-->>Client: Raise error
    end
    Resolver-->>Validator: Return IP address
    Validator->>Validator: Validate IP (not private/loopback/reserved)
    alt Invalid IP range
        Validator-->>Client: Raise error
    end
    Validator->>HTTPClient: Make safe request
    HTTPClient->>RemoteServer: HTTP GET request
    RemoteServer-->>HTTPClient: Response
    HTTPClient->>HTTPClient: Check response size ≤ 200 MB
    alt Oversized response
        HTTPClient-->>Client: Raise error
    end
    HTTPClient->>HTTPClient: Validate redirect target (≤5 redirects)
    alt Redirect limit exceeded
        HTTPClient-->>Client: Raise error
    end
    HTTPClient-->>Validator: Valid response
    Validator-->>Client: Return content

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is almost entirely template boilerplate with no actual content filled in; the Description, Test Coverage, and PR Checklist sections are blank, leaving reviewers without explanation of the issue, solution, or test coverage.	Fill in the Description section explaining the security issue and the solution, add the Test Coverage section listing relevant tests, and complete the PR Checklist items.
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding URL validation and request hardening for media input loading, directly matching the changeset's primary modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tensorrt_llm/inputs/utils.py (1)
107-131: SSRF validation is solid, with a known DNS rebinding caveat.

The implementation correctly validates all resolved IPs against private/loopback/link-local/reserved ranges. However, there's an inherent TOCTOU (time-of-check to time-of-use) window: DNS could resolve to a different IP between validation and the actual HTTP request due to TTL expiration or DNS rebinding attacks.

This is a known limitation of application-layer SSRF defenses. For defense-in-depth, consider documenting this limitation or adding a comment noting that network-level controls (firewall rules blocking egress to private ranges) provide the strongest protection.
📝 Suggested documentation comment
 def _validate_url(url: str) -> None:
     """Validate that *url* points to a public, non-internal HTTP(S) resource.

     Raises ``ValueError`` for URLs that target private, loopback, or
     link-local addresses, or that use a scheme other than http / https.
+
+    Note: This provides application-layer SSRF protection but cannot fully
+    prevent DNS rebinding attacks. Network-level egress controls blocking
+    traffic to internal IP ranges provide additional defense-in-depth.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/inputs/utils.py` around lines 107 - 131, The SSRF validation in
_validate_url correctly checks resolved IPs but has a TOCTOU/DNS-rebinding gap
between DNS resolution and actual HTTP use; update the code by adding a clear
comment above the _validate_url function explaining this limitation (DNS can
change after validation), and mention recommended defense-in-depth measures
(e.g., enforce network-level egress controls/firewall rules to block private
ranges, or perform request-time validation/connection-level checks) so future
maintainers know this is not a complete protection.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/inputs/utils.py`:
- Around line 134-162: The redirect loop in _safe_request_get can leak
connections because intermediate resp objects aren’t closed when stream=True;
before reassigning resp inside the for loop (just prior to calling
requests.get(redirect_url,...)), call resp.close() for the previous response
when stream is True (ensure resp is not None), validate redirect_url with
_validate_url as already done, then proceed to request the redirect; leave the
final resp open to return so callers can consume/close it.
- Around line 165-179: The function _safe_aiohttp_get currently lets aiohttp
follow redirects automatically which allows intermediate redirect targets to be
requested before validation; change it to perform manual redirect handling: call
session.get(..., allow_redirects=False), validate the initial URL with
_validate_url, then if response is a 3xx redirect, read the Location header,
resolve it to an absolute URL (preserving response.url base), call _validate_url
on that new URL, and repeat up to _MAX_REDIRECTS; only when you receive a
non-redirect (2xx) response proceed to read response.content (enforcing
_MAX_RESPONSE_BYTES and response.raise_for_status()). Ensure you reference and
update _safe_aiohttp_get, use _validate_url, and respect _MAX_REDIRECTS and
_MAX_RESPONSE_BYTES while handling relative Location headers correctly.

---

Nitpick comments:
In `@tensorrt_llm/inputs/utils.py`:
- Around line 107-131: The SSRF validation in _validate_url correctly checks
resolved IPs but has a TOCTOU/DNS-rebinding gap between DNS resolution and
actual HTTP use; update the code by adding a clear comment above the
_validate_url function explaining this limitation (DNS can change after
validation), and mention recommended defense-in-depth measures (e.g., enforce
network-level egress controls/firewall rules to block private ranges, or perform
request-time validation/connection-level checks) so future maintainers know this
is not a complete protection.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4ab187e6-c8fc-471e-ae15-41ce2936cea9

📥 Commits

Reviewing files that changed from the base of the PR and between 7ee9e8b and 9ecc0e4.

📒 Files selected for processing (1)

tensorrt_llm/inputs/utils.py

tensorrt_llm/inputs/utils.py

yibinl-nvidia · 2026-04-04T02:09:49Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-04T02:16:10Z

PR_Github #41789 [ run ] triggered by Bot. Commit: 1a5a0ea Link to invocation

tensorrt-cicd · 2026-04-04T06:20:54Z

PR_Github #41789 [ run ] completed with state SUCCESS. Commit: 1a5a0ea
/LLM/main/L0_MergeRequest_PR pipeline #32683 completed with status: 'SUCCESS'

CI Report

Link to invocation

yibinl-nvidia · 2026-04-07T16:15:35Z

@2ez4bz @yechank-nvidia @Wanli-Jiang I saw your recent work in this file and figured you might be good reviewer of this PR, could you please take a look? Thank you!

2ez4bz

Can we add tests for the new code? 🙏

2ez4bz · 2026-04-08T06:37:36Z

tensorrt_llm/inputs/utils.py

+        allow_redirects=False,
+    )
+    for _ in range(_MAX_REDIRECTS):
+        if resp.status_code not in (301, 302, 303, 307, 308):


Aren't 307 and 308 the only safe ones to redirect? I think the others could technically have method changes between GET / POST.

You are right, 307/308 are the only codes that guarantee method preservation. However, since we exclusively make GET requests here, a method change is never a concern. 301/302/303 all redirect GET→GET in practice, and excluding them would break a large number of legitimate endpoints that commonly uses 301/302. The critical security property we're enforcing is validating the redirect destination via _validate_url() regardless of which redirect code is used.

2ez4bz · 2026-04-08T06:38:25Z

tensorrt_llm/inputs/utils.py

+    timeout = aiohttp.ClientTimeout(total=timeout_sec)
+    current_url = url
+    async with aiohttp.ClientSession(timeout=timeout) as session:
+        for _ in range(_MAX_REDIRECTS + 1):


Maybe stupid question, why the +1 here?

+1 is needed so we can follow exactly _MAX_REDIRECTS hops and still fetch the final destination. As there are _MAX_REDIRECTS number of redirects permitted, we need _MAX_REDIRECTS+1 times to fetch the URL.

tensorrt_llm/inputs/utils.py

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

… loading Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

…uest_get Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

yibinl-nvidia · 2026-04-11T00:56:07Z

/bot run

github-actions bot assigned yibinl-nvidia Apr 3, 2026

yibinl-nvidia force-pushed the fix/inputs-url-validation-5911304 branch from 9ecc0e4 to 8da414a Compare April 3, 2026 21:36

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

tensorrt_llm/inputs/utils.py Show resolved Hide resolved

tensorrt_llm/inputs/utils.py Outdated Show resolved Hide resolved

tensorrt_llm/inputs/utils.py Show resolved Hide resolved

yibinl-nvidia requested review from 2ez4bz, Wanli-Jiang and yechank-nvidia April 7, 2026 16:15

2ez4bz reviewed Apr 8, 2026

View reviewed changes

yibinl-nvidia added 4 commits April 10, 2026 21:24

Add URL validation and request hardening for media input loading

9fd8e0a

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

fix(inputs): validate each redirect hop and eliminate TOCTOU in video…

50bd197

… loading Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

fix(inputs): close intermediate response before redirect in _safe_req…

1e80391

…uest_get Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

fix(inputs): use RuntimeError in URL validation helpers and add tests

f534614

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

yibinl-nvidia force-pushed the fix/inputs-url-validation-5911304 branch from 1a5a0ea to f534614 Compare April 10, 2026 21:26

Conversation

yibinl-nvidia commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Apr 3, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yibinl-nvidia commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

yibinl-nvidia commented Apr 7, 2026

Uh oh!

2ez4bz left a comment

Choose a reason for hiding this comment

Uh oh!

2ez4bz Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

yibinl-nvidia Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

2ez4bz Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

yibinl-nvidia Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yibinl-nvidia commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yibinl-nvidia commented Apr 3, 2026 •

edited by coderabbitai bot

Loading