Skip to content

[https://nvbugs/5911304][fix] Add URL validation and request hardening for media input loading#12748

Open
yibinl-nvidia wants to merge 4 commits intoNVIDIA:mainfrom
yibinl-nvidia:fix/inputs-url-validation-5911304
Open

[https://nvbugs/5911304][fix] Add URL validation and request hardening for media input loading#12748
yibinl-nvidia wants to merge 4 commits intoNVIDIA:mainfrom
yibinl-nvidia:fix/inputs-url-validation-5911304

Conversation

@yibinl-nvidia
Copy link
Copy Markdown
Collaborator

@yibinl-nvidia yibinl-nvidia commented Apr 3, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced security for remote content loading by validating URLs and blocking requests to private/reserved IP addresses
    • Added response size limits and redirect handling controls for remote file fetches
    • Restricted remote content loading to HTTP/HTTPS protocols only

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@yibinl-nvidia yibinl-nvidia force-pushed the fix/inputs-url-validation-5911304 branch from 9ecc0e4 to 8da414a Compare April 3, 2026 21:36
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

Added security validation for remote URL fetching in tensorrt_llm/inputs/utils.py. Introduced helper functions to enforce HTTP/HTTPS-only URLs, validate IP addresses against private/reserved ranges, limit response sizes to 200 MB, and restrict redirects to 5. Updated media loading and content encoding functions to use these safe wrappers and reject unsupported schemes.

Changes

Cohort / File(s) Summary
Security validation for remote URL fetching
tensorrt_llm/inputs/utils.py
Added internal helper functions _validate_url, _safe_request_get, and _safe_aiohttp_get to enforce HTTP/HTTPS-only URLs, validate IP addresses, limit responses to 200 MB, and restrict redirects to 5. Updated load_image, async_load_image, load_video, async_load_video, load_audio, async_load_audio, and encode_base64_content_from_url to use these wrappers. Removed implicit empty-scheme handling in load_video.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Calling Code
    participant Validator as URL Validator
    participant Resolver as IP Resolver
    participant HTTPClient as HTTP Client
    participant RemoteServer as Remote Server

    Client->>Validator: Request with URL
    Validator->>Validator: Check scheme (http/https only)
    alt Invalid scheme
        Validator-->>Client: Raise ValueError
    end
    Validator->>Resolver: Resolve URL to IP address
    alt Resolution fails
        Resolver-->>Validator: Error
        Validator-->>Client: Raise error
    end
    Resolver-->>Validator: Return IP address
    Validator->>Validator: Validate IP (not private/loopback/reserved)
    alt Invalid IP range
        Validator-->>Client: Raise error
    end
    Validator->>HTTPClient: Make safe request
    HTTPClient->>RemoteServer: HTTP GET request
    RemoteServer-->>HTTPClient: Response
    HTTPClient->>HTTPClient: Check response size ≤ 200 MB
    alt Oversized response
        HTTPClient-->>Client: Raise error
    end
    HTTPClient->>HTTPClient: Validate redirect target (≤5 redirects)
    alt Redirect limit exceeded
        HTTPClient-->>Client: Raise error
    end
    HTTPClient-->>Validator: Valid response
    Validator-->>Client: Return content
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is almost entirely template boilerplate with no actual content filled in; the Description, Test Coverage, and PR Checklist sections are blank, leaving reviewers without explanation of the issue, solution, or test coverage. Fill in the Description section explaining the security issue and the solution, add the Test Coverage section listing relevant tests, and complete the PR Checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding URL validation and request hardening for media input loading, directly matching the changeset's primary modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tensorrt_llm/inputs/utils.py (1)

107-131: SSRF validation is solid, with a known DNS rebinding caveat.

The implementation correctly validates all resolved IPs against private/loopback/link-local/reserved ranges. However, there's an inherent TOCTOU (time-of-check to time-of-use) window: DNS could resolve to a different IP between validation and the actual HTTP request due to TTL expiration or DNS rebinding attacks.

This is a known limitation of application-layer SSRF defenses. For defense-in-depth, consider documenting this limitation or adding a comment noting that network-level controls (firewall rules blocking egress to private ranges) provide the strongest protection.

📝 Suggested documentation comment
 def _validate_url(url: str) -> None:
     """Validate that *url* points to a public, non-internal HTTP(S) resource.

     Raises ``ValueError`` for URLs that target private, loopback, or
     link-local addresses, or that use a scheme other than http / https.
+
+    Note: This provides application-layer SSRF protection but cannot fully
+    prevent DNS rebinding attacks. Network-level egress controls blocking
+    traffic to internal IP ranges provide additional defense-in-depth.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/inputs/utils.py` around lines 107 - 131, The SSRF validation in
_validate_url correctly checks resolved IPs but has a TOCTOU/DNS-rebinding gap
between DNS resolution and actual HTTP use; update the code by adding a clear
comment above the _validate_url function explaining this limitation (DNS can
change after validation), and mention recommended defense-in-depth measures
(e.g., enforce network-level egress controls/firewall rules to block private
ranges, or perform request-time validation/connection-level checks) so future
maintainers know this is not a complete protection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/inputs/utils.py`:
- Around line 134-162: The redirect loop in _safe_request_get can leak
connections because intermediate resp objects aren’t closed when stream=True;
before reassigning resp inside the for loop (just prior to calling
requests.get(redirect_url,...)), call resp.close() for the previous response
when stream is True (ensure resp is not None), validate redirect_url with
_validate_url as already done, then proceed to request the redirect; leave the
final resp open to return so callers can consume/close it.
- Around line 165-179: The function _safe_aiohttp_get currently lets aiohttp
follow redirects automatically which allows intermediate redirect targets to be
requested before validation; change it to perform manual redirect handling: call
session.get(..., allow_redirects=False), validate the initial URL with
_validate_url, then if response is a 3xx redirect, read the Location header,
resolve it to an absolute URL (preserving response.url base), call _validate_url
on that new URL, and repeat up to _MAX_REDIRECTS; only when you receive a
non-redirect (2xx) response proceed to read response.content (enforcing
_MAX_RESPONSE_BYTES and response.raise_for_status()). Ensure you reference and
update _safe_aiohttp_get, use _validate_url, and respect _MAX_REDIRECTS and
_MAX_RESPONSE_BYTES while handling relative Location headers correctly.

---

Nitpick comments:
In `@tensorrt_llm/inputs/utils.py`:
- Around line 107-131: The SSRF validation in _validate_url correctly checks
resolved IPs but has a TOCTOU/DNS-rebinding gap between DNS resolution and
actual HTTP use; update the code by adding a clear comment above the
_validate_url function explaining this limitation (DNS can change after
validation), and mention recommended defense-in-depth measures (e.g., enforce
network-level egress controls/firewall rules to block private ranges, or perform
request-time validation/connection-level checks) so future maintainers know this
is not a complete protection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4ab187e6-c8fc-471e-ae15-41ce2936cea9

📥 Commits

Reviewing files that changed from the base of the PR and between 7ee9e8b and 9ecc0e4.

📒 Files selected for processing (1)
  • tensorrt_llm/inputs/utils.py

@yibinl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41789 [ run ] triggered by Bot. Commit: 1a5a0ea Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41789 [ run ] completed with state SUCCESS. Commit: 1a5a0ea
/LLM/main/L0_MergeRequest_PR pipeline #32683 completed with status: 'SUCCESS'

CI Report

Link to invocation

@yibinl-nvidia
Copy link
Copy Markdown
Collaborator Author

@2ez4bz @yechank-nvidia @Wanli-Jiang I saw your recent work in this file and figured you might be good reviewer of this PR, could you please take a look? Thank you!

Copy link
Copy Markdown
Collaborator

@2ez4bz 2ez4bz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add tests for the new code? 🙏

allow_redirects=False,
)
for _ in range(_MAX_REDIRECTS):
if resp.status_code not in (301, 302, 303, 307, 308):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't 307 and 308 the only safe ones to redirect? I think the others could technically have method changes between GET / POST.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, 307/308 are the only codes that guarantee method preservation. However, since we exclusively make GET requests here, a method change is never a concern. 301/302/303 all redirect GET→GET in practice, and excluding them would break a large number of legitimate endpoints that commonly uses 301/302. The critical security property we're enforcing is validating the redirect destination via _validate_url() regardless of which redirect code is used.

timeout = aiohttp.ClientTimeout(total=timeout_sec)
current_url = url
async with aiohttp.ClientSession(timeout=timeout) as session:
for _ in range(_MAX_REDIRECTS + 1):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe stupid question, why the +1 here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 is needed so we can follow exactly _MAX_REDIRECTS hops and still fetch the final destination. As there are _MAX_REDIRECTS number of redirects permitted, we need _MAX_REDIRECTS+1 times to fetch the URL.

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
… loading

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
…uest_get

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
@yibinl-nvidia yibinl-nvidia force-pushed the fix/inputs-url-validation-5911304 branch from 1a5a0ea to f534614 Compare April 10, 2026 21:26
@yibinl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants