feat: cap fetched body size in LinkContentFetcher#11228
Open
etairl wants to merge 2 commits intodeepset-ai:mainfrom
Open
feat: cap fetched body size in LinkContentFetcher#11228etairl wants to merge 2 commits intodeepset-ai:mainfrom
etairl wants to merge 2 commits intodeepset-ai:mainfrom
Conversation
Both the sync and async fetch paths read entire response bodies into memory via response.text / response.content with no upper bound. A remote that returns or streams an unexpectedly large body therefore forces a proportional Python allocation, which can pressure or exhaust the worker's memory. Add a max_response_size constructor parameter (default 10 MiB) and switch both fetch paths to httpx.Client.stream / httpx.AsyncClient.stream so the connection is torn down as soon as the cap is reached. The captured bytes are stashed back on the response object so existing content handlers (text and binary) keep working unchanged. Pass max_response_size=None to restore the previous unbounded behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@etairl is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
Member
|
See #11226 (review) |
Release-note linter rejects single backticks for inline code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LinkContentFetcherpreviously read full response bodies into memory viaresponse.text/response.contentwith no upper bound. A remote that returns an unexpectedly large body therefore caused a proportional Python allocation.max_response_sizeconstructor parameter (default 10 MiB).httpx.Client.stream/AsyncClient.streamand abort the request when:Content-Lengthexceeds the cap, orhttpx.Responseobject so existing handlers (response.text,response.content) keep working with no further changes.max_response_size=Noneto restore the previous unbounded behavior.Test plan
LinkContentFetcherunit tests against the streamed sync and async paths.max_response_size— request is aborted with anhttpx.RequestErrorinstead of materializing the body in memory.max_response_size=None— large bodies are read as before.🤖 Generated with Claude Code