Skip to content

fix: buffer multi-segment HTTP POST/PUT bodies in fetch network adapter#1567

Open
muralidhar-challa wants to merge 3 commits into
copy:masterfrom
muralidhar-challa:master
Open

fix: buffer multi-segment HTTP POST/PUT bodies in fetch network adapter#1567
muralidhar-challa wants to merge 3 commits into
copy:masterfrom
muralidhar-challa:master

Conversation

@muralidhar-challa
Copy link
Copy Markdown

When a POST or PUT request body spans multiple TCP segments (~1460 bytes each), only the first segment's body bytes were used — subsequent segments were silently dropped because this.read was cleared after the first \r\n\r\n delimiter match, and no more headers trigger re-processing.

This caused "Invalid JSON" errors for any request body larger than approximately one MTU.

Fix:

  • On the first segment: if Content-Length is present and the body is incomplete, store the partial body in this._xb with a deferred callback instead of firing fetch() immediately.
  • On subsequent segments: accumulate incoming data into this._xb.buf. Once Content-Length is satisfied, decode and invoke the deferred fetch.
  • Extract the fetch+response handling into _dispatch_fetch() to avoid duplicating it in both the normal and deferred code paths.

When a POST or PUT request body spans multiple TCP segments (~1460 bytes
each), only the first segment's body bytes were used — subsequent segments
were silently dropped because this.read was cleared after the first
\r\n\r\n delimiter match, and no more headers trigger re-processing.

This caused "Invalid JSON" errors for any request body larger than
approximately one MTU.

Fix:
- On the first segment: if Content-Length is present and the body is
  incomplete, store the partial body in this._xb with a deferred callback
  instead of firing fetch() immediately.
- On subsequent segments: accumulate incoming data into this._xb.buf.
  Once Content-Length is satisfied, decode and invoke the deferred
  fetch.
- Extract the fetch+response handling into _dispatch_fetch() to avoid
  duplicating it in both the normal and deferred code paths.
@copy
Copy link
Copy Markdown
Owner

copy commented May 24, 2026

Could I get a review from one of @basicer @chschnell, @SuperMaxusa or @ProgrammerIn-wonderland

@ProgrammerIn-wonderland
Copy link
Copy Markdown
Contributor

ProgrammerIn-wonderland commented May 24, 2026

This seems like it'd break for binary post data but I'm unsure if that ever worked, want to see @basicer's review tbh

Comment thread src/browser/fetch_network.js Outdated
const content_length = parseInt(req_headers.get("content-length") || "0", 10);
const body_bytes = (data instanceof Uint8Array) ? data : new TextEncoder().encode(data);
if(content_length > 0 && body_bytes.length < content_length) {
const fetch_url = this.net.cors_proxy ? this.net.cors_proxy + encodeURIComponent(target.href) : target.href;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this repeated in FetchNetworkAdapter.prototype.fetch (see line 230)? Same on line 169.

@chschnell
Copy link
Copy Markdown
Contributor

chschnell commented May 24, 2026

To be precise:

  • v86 buffers chunks (segments) in this.read until that buffer contains separator \r\n\r\n.
  • It then splits this.read at the separator into two parts: header and body.
  • The header part is parsed into a fetch-compatible Headers object, and the body part is later passed to fetch().

That's fine for common HTTP GET requests (they have no body), but in your case of a HTTP POST or PUT request with a body larger than the first fragment received in the last chunk with the separator, the body's remaining chunks are dropped and not passed to fetch(), and that's bug number 1.

One strategy to solve this would be to buffer the request's entire body before calling fetch(), another to forward body chunks asynchronously after calling fetch() (better, consider AJAX).

We can pass a ReadableStream object as the body to fetch() instead of the String we're currently passing (which is bug number 2, this should really be some binary type like ArrayBuffer, TypedArray or DataView instead). A ReadableStream should allow us to send body chunks asynchronously after calling fetch(), I believe (not 100% sure).

EDIT: The specification states: Requests whose body is a ReadableStream object cannot be deferred, which makes me believe that ReadableStream doesn't help.

In any case, in order to detect that we have processed the body's last chunk we need to evaluate the request's Content-Length HTTP header field. In case of Chunked transfer encoding there is no such field in the request headers, the body is encoded as a stream of chunks that terminates with an empty chunk.

I skimmed through the PR, my suggestions:

  1. you MUST buffer the body's chunks in binary form (search for \r\n\r\n in the binary chunk, and text-decode only the header part after splitting)
  2. consider Chunked transfer encoding EDIT: not sure how to handle it myself, ignore it but please leave a comment in the code
  3. provide a ReadableStream to fetch() EDIT: see above

While at it, if you don't mind please refactor all the new TextEncoder() and new TextDecoder() calls with a single, module-scoped instance for each at the top of the module.

@SuperMaxusa
Copy link
Copy Markdown
Contributor

EDIT: The specification states: Requests whose body is a ReadableStream object cannot be deferred, which makes me believe that ReadableStream doesn't help.

IIRC this also requires duplex: "half" in the fetch() options, which is not supported in some browsers.

We can pass a ReadableStream object as the body to fetch() instead of the String we're currently passing (which is bug number 2, this should really be some binary type like ArrayBuffer, TypedArray or DataView instead).

I would suggest using here Blob with the type from the Content-Type header. From my tests, it works with a simple formData, plain text and application/octet-stream data.

Addresses reviewer feedback on PR copy#1567:

- Buffer incoming TCP data as raw bytes (Uint8Array) instead of
  text-decoding everything.  Search for \r\n\r\n in binary,
  text-decode only the header portion.  This prevents corruption of
  binary POST bodies and ensures the first \r\n\r\n in headers
  (not in body) is used as the separator.

- Replace ad-hoc new TextEncoder()/new TextDecoder() with module-scoped
  singletons (textEncoder, textDecoder).

- Refactor _dispatch_fetch into a standalone dispatch_fetch(conn, ...)
  helper that takes the connection explicitly, avoiding incorrect
   binding between the connection and adapter objects.

- Add comment noting that Transfer-Encoding: chunked is unsupported.

- Add unit tests (tests/devices/fetch_network_post.js):
  * Small POST body in one segment
  * Large POST body split across 3 segments
  * Binary body containing embedded CRLFCRLF bytes
  * GET request (no body)
  * Headers split across 2 segments
@muralidhar-challa
Copy link
Copy Markdown
Author

Thanks all for the review. I've updated the PR to address the feedback:

@chschnell:

  • Body chunks are now buffered in binary form — this._raw is a Uint8Array, \r\n\r\n is searched for via indexOfBytes() on raw bytes, and only the header portion is text-decoded. The body stays as Uint8Array and is passed directly to fetch().
  • Added a comment noting Transfer-Encoding: chunked is unsupported.
  • Replaced all new TextEncoder() / new TextDecoder() calls with module-scoped textEncoder / textDecoder singletons.
  • ReadableStream — skipped per your edit.

@SuperMaxusa:

  • I kept Uint8Array for the body rather than wrapping in Blob — the original Content-Type header from the guest is already forwarded in req_headers, so the Blob's type would be redundant. Is there a use case where Blob makes a difference vs Uint8Array? Happy to add if so.

@ProgrammerIn-wonderland:

  • The binary-safe parsing fixes the binary POST concern. Added a test case with a body containing embedded \r\n\r\n bytes to verify the separator detection works correctly.

Added 5 unit tests in tests/devices/fetch_network_post.js covering:

  • Small POST (one segment)
  • Large POST split across 3 segments
  • Binary body with embedded CRLFCRLF
  • GET request (no body)
  • Headers split across 2 segments

@chschnell
Copy link
Copy Markdown
Contributor

chschnell commented May 25, 2026

While reading the code I had an idea that would improve overall readability of fetch_network.js. It's about the existing code, not your PR. I would submit another PR about this after yours, but maybe you just want to wrap it into yours.

Currently, we have a function on_data_http() with this bound to a TCPConnection in the middle of the FetchNetworkAdapter code.

I suggest to change the event binding of the function from

this.bus.register("tcp-connection", (conn) => {
    if(conn.sport === 80) {
        conn.on("data", on_data_http);
        conn.accept();
    }
}, this);

to

this.bus.register("tcp-connection", (conn) => {
    if(conn.sport === 80) {
        conn.on("data", function(data) { this.net.on_data_http(this, data); });
        conn.accept();
    }
}, this);

and change the prototypes

async function on_data_http(data)
function dispatch_fetch(conn, fetch_url, opts)

to

FetchNetworkAdapter.prototype.on_data_http = async function(conn, data)
FetchNetworkAdapter.prototype.dispatch_fetch = function(conn, fetch_url, opts)

That is, turn these functions into methods and pass the TCPConnection object conn as an argument. This would make the code in fetch_network.js more readable and straight-forward:

  • this always refers to a FetchNetworkAdapter object
  • eliminates conn.net.foobar, simplyfies to this.foobar (NOTE: typeof(TCPConnection.net) === FetchNetworkAdapter)
  • all modifications to TCPConnection objects conn are explicit

A bit of care needs to be taken when refactoring the two methods, but it looks a lot clearer (in my editior).

@SuperMaxusa
Copy link
Copy Markdown
Contributor

I kept Uint8Array for the body rather than wrapping in Blob — the original Content-Type header from the guest is already forwarded in req_headers, so the Blob's type would be redundant.

Ah, I see that, sorry. I thought it would be safer to use Blob for the request body. Since there is no difference between TypedArray and Blob, let's leave Uint8Array.

Added 5 unit tests in tests/devices/fetch_network_post.js

Can you add these tests to tests/devices/fetch_network.js? It runs the emulator with buildroot-bzimage68.bin (luasocket, curl and socat) and a test server.

Add a POST handler to the test server that echoes the received body
length, and a test case that POSTs 3000 bytes (> ~1460 byte MTU) from
the buildroot guest via curl + dd. Verifies the full body reaches the
server after the binary-safe buffering fix.
@muralidhar-challa
Copy link
Copy Markdown
Author

@chschnell — good suggestion on moving on_data_http/dispatch_fetch to prototype methods with conn passed explicitly. Makes the code clearer and eliminates the conn.net. indirection. I'll keep it out of this PR to avoid scope creep (you mentioned you'd do a follow-up), but happy to include it here if you'd prefer.

@SuperMaxusa — agreed on Blob vs Uint8Array — sticking with Uint8Array. RE tests: added the integration test to tests/devices/fetch_network.js. It POSTs a 3000-byte body (>1 MTU) from the buildroot guest using dd | curl --data-binary and asserts the server receives all 3000 bytes. Ran the full suite — all 16 tests pass including this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants