Skip to content

reduce memory#2521

Merged
dogancanbakir merged 1 commit into
devfrom
reduce-response-memory-footprint
Jun 25, 2026
Merged

reduce memory#2521
dogancanbakir merged 1 commit into
devfrom
reduce-response-memory-footprint

Conversation

@Mzack9999

@Mzack9999 Mzack9999 commented Jun 24, 2026

Copy link
Copy Markdown
Member

Reduces httpx memory usage by removing redundant response body copies and lowering the default response read size to 50mb

Summary by CodeRabbit

  • Bug Fixes

    • Updated response body handling to keep raw and decoded content available more reliably.
    • Improved word and line counting so results stay consistent across different body formats and whitespace.
  • Documentation

    • Updated CLI help text to show the new default response body size limits.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The PR lowers the documented and computed default response body size to 50mb, changes HTTPX.Do to preserve undecoded raw bytes while decoding response data, and switches word and line metrics to byte-based counting. New tests cover body handling, metric equivalence, and allocation behavior.

Changes

HTTPX response body handling

Layer / File(s) Summary
Default response size
common/httpx/option.go, README.md
DefaultMaxResponseBodySize now initializes to 50mb, and the CLI docs show the matching response save/read defaults.
Raw body retention and byte metrics
common/httpx/httpx.go
HTTPX.Do keeps the undecoded body for RawData and computes Words and Lines from byte counts or sanitized text when HTML stripping is enabled.
Local body behavior tests
common/httpx/response_memory_test.go
Local httptest cases cover raw versus decoded body data, empty bodies, GBK decoding, backing-array sharing, and the non-zero read-size default.
Metric equivalence and allocations
common/httpx/response_memory_test.go
The remaining tests compare byte-based word and line counts to legacy splitting across whitespace edge cases and assert the counting path allocates zero memory.

Sequence Diagram(s)

sequenceDiagram
  participant HTTPXDo as HTTPX.Do
  participant DecodeData
  participant Response
  HTTPXDo->>DecodeData: decode respbody into Data
  DecodeData-->>HTTPXDo: decoded Data
  HTTPXDo->>Response: assign RawData from rawbody
  alt VHostStripHTML enabled
    HTTPXDo->>Response: count Words and Lines from sanitized respbody
  else VHostStripHTML disabled
    HTTPXDo->>Response: count Words and Lines from respbody bytes
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hopped through bytes by lantern glow,
kept raw carrot-crunch for the show.
Words and lines go boing in tune,
under a quiet README moon,
and tests thump softly: hop, hop, ho!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is related to the change, but it is too vague to explain that response body copies and default read size were reduced. Use a more specific title like "Reduce httpx response memory usage" or mention the lowered default response read size.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch reduce-response-memory-footprint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
common/httpx/httpx.go (1)

334-338: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Use strings.Count in the VHostStripHTML branch too.

Lines 334-338 still allocate separator slices via strings.Split. Since respbodystr is already materialized there, strings.Count(respbodystr, " ")+1 and strings.Count(strings.TrimSpace(respbodystr), "\n")+1 preserve the current semantics without the extra allocation.

♻️ Proposed change
 	if h.Options.VHostStripHTML {
 		respbodystr := h.htmlPolicy.Sanitize(string(respbody))
 		if respbodystr != "" {
-			resp.Words = len(strings.Split(respbodystr, " "))
-			resp.Lines = len(strings.Split(strings.TrimSpace(respbodystr), "\n"))
+			resp.Words = strings.Count(respbodystr, " ") + 1
+			resp.Lines = strings.Count(strings.TrimSpace(respbodystr), "\n") + 1
 		}
 	} else if len(respbody) > 0 {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@common/httpx/httpx.go` around lines 334 - 338, The VHostStripHTML branch in
httpx’s response counting still uses strings.Split and allocates unnecessary
slices; update the resp.Words and resp.Lines calculations in the
h.Options.VHostStripHTML block to use strings.Count on respbodystr (with the
same TrimSpace handling for lines) so it preserves the existing semantics
without extra allocations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@common/httpx/httpx.go`:
- Around line 334-338: The VHostStripHTML branch in httpx’s response counting
still uses strings.Split and allocates unnecessary slices; update the resp.Words
and resp.Lines calculations in the h.Options.VHostStripHTML block to use
strings.Count on respbodystr (with the same TrimSpace handling for lines) so it
preserves the existing semantics without extra allocations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9f5b139f-9ed8-4a0e-a6c8-0ff7111ef1b7

📥 Commits

Reviewing files that changed from the base of the PR and between 64e6442 and d2cb859.

📒 Files selected for processing (4)
  • README.md
  • common/httpx/httpx.go
  • common/httpx/option.go
  • common/httpx/response_memory_test.go

@Mzack9999

Copy link
Copy Markdown
Member Author

@coderabbitai create an issue to track this PR

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo="projectdiscovery/httpx"
pr_number="2521"
requester="Mzack9999"
pr_url="https://github.com/projectdiscovery/httpx/pull/${pr_number}"

comment_url="$(gh api "repos/${repo}/issues/${pr_number}/comments" --paginate \
  --jq '[.[] | select(.user.login == "'"${requester}"'" and (.body | contains("create an issue to track this PR")))][-1].html_url // empty')"

tmp_body="$(mktemp)"
cat > "$tmp_body" <<EOF
## Summary
Track the work proposed in PR #${pr_number} to reduce \`httpx\` memory usage by:
- removing redundant response body copies
- lowering the default response read size to **50 MB**
- preserving raw and decoded response-body behavior
- adding regression coverage for response-memory behavior and body metrics

## Rationale
PR #${pr_number} aims to reduce peak memory consumption in response handling without regressing decoding behavior, raw body access, or body-derived metrics.

## Affected areas
- \`common/httpx/httpx.go\`
- \`common/httpx/option.go\`
- \`common/httpx/response_memory_test.go\`
- \`README.md\`

## Acceptance criteria
- The default response read size is updated to **50 MB** and documentation reflects the new default.
- Redundant response body copies are removed from the response handling path.
- Raw response bytes remain available where required, including when decoded data differs from raw bytes.
- Existing or added tests cover raw body handling, decoded body handling, and word/line metric behavior.
- The change is tracked against PR #${pr_number} for review and follow-up.

## Backlinks
- Pull request: ${pr_url}
EOF

if [[ -n "${comment_url}" ]]; then
  printf -- "- Requesting comment: %s\n" "${comment_url}" >> "$tmp_body"
fi

issue_url="$(gh issue create \
  --repo "$repo" \
  --title "Track PR #${pr_number}: reduce httpx response memory footprint" \
  --body-file "$tmp_body" \
  --assignee "$requester")"

echo "$issue_url"

Length of output: 2100



@Mzack9999 Created the tracking issue: #2522#2522

It links back to PR #2521 and includes the follow-up summary, affected areas, and acceptance criteria.

@Mzack9999 Mzack9999 linked an issue Jun 24, 2026 that may be closed by this pull request
@Mzack9999 Mzack9999 requested a review from dogancanbakir June 24, 2026 22:20
@dogancanbakir dogancanbakir merged commit 63fb40a into dev Jun 25, 2026
15 checks passed
@dogancanbakir dogancanbakir deleted the reduce-response-memory-footprint branch June 25, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track PR #2521: reduce httpx response memory footprint

2 participants