Update dependency langchain-text-splitters to v1 [SECURITY]#220
Open
renovate[bot] wants to merge 1 commit into
Open
Update dependency langchain-text-splitters to v1 [SECURITY]#220renovate[bot] wants to merge 1 commit into
renovate[bot] wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.3.8→==1.1.2LangChain Text Splitters is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing
CVE-2025-6985 / GHSA-m42m-m8cr-8m58
More information
Details
The HTMLSectionSplitter class in langchain-text-splitters is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing. This vulnerability arises because the class allows the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to read arbitrary local files or perform outbound HTTP(S) fetches. In lxml versions 5.0 and above, while entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl is applied. This vulnerability allows remote attackers to gain read-only access to any file the LangChain process can reach, including sensitive files such as SSH keys, environment files, source code, or cloud metadata. No authentication, special privileges, or user interaction are required, and the issue is exploitable in default deployments that enable custom XSLT.
Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:NReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
LangChain Text Splitters: HTMLHeaderTextSplitter.split_text_from_url SSRF Redirect Bypass
CVE-2026-41481 / GHSA-fv5p-p927-qmxr
More information
Details
Summary
HTMLHeaderTextSplitter.split_text_from_url()validated the initial URL usingvalidate_safe_url()but then performed the fetch withrequests.get()with redirects enabled (the default). Because redirect targets were not revalidated, a URL pointing to an attacker-controlled server could redirect to internal, localhost, or cloud metadata endpoints, bypassing SSRF protections.The response body is parsed and returned as
Documentobjects to the calling application code. Whether this constitutes a data exfiltration path depends on the application: if it exposes Document contents (or derivatives) back to the requester who supplied the URL, sensitive data from internal endpoints could be leaked. Applications that store or process Documents internally without returning raw content to the requester are not directly exposed to data exfiltration through this issue.Affected versions
langchain-text-splitters< 1.1.2Patched versions
langchain-text-splitters>= 1.1.2 (requireslangchain-core>= 1.2.31)Affected code
File:
libs/text-splitters/langchain_text_splitters/html.py—split_text_from_url()The vulnerable pattern validated the URL once then fetched with redirects enabled:
Attack scenario
split_text_from_url(), relying on itsbuilt-in
validate_safe_url()check to block requests to internal networks.passes
validate_safe_url()(public hostname, public IP).302redirect to an internal endpoint(e.g., an unauthenticated internal admin API, or a cloud instance metadata
service that does not require request headers — such as AWS IMDSv1).
requests.get()follows the redirect automatically. The redirect target isnot revalidated.
Documentobjects to theapplication.
Notes:
split_text_from_url()includedvalidate_safe_url()specifically to besafe with untrusted URLs — the redirect loophole defeated that guarantee.
are not reachable through this bug because the attacker does not control
request headers. AWS IMDSv1, which requires no headers, is reachable.
party that supplied the URL. The SSRF itself — forcing the server to issue a
request to an internal endpoint — does not require this.
Fix
The fix replaces
requests.get()with an SSRF-safe httpx transport (SSRFSafeSyncTransportfromlangchain-core) that validates DNS results and pins connections to validated IPs on every request, including redirect targets, eliminating redirect-based bypasses.Additionally,
split_text_from_url()has been deprecated. Users should fetch HTML content themselves and pass it tosplit_text()directly.Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:N/A:NReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
Configuration
📅 Schedule: (in timezone Europe/Oslo)
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.