use req.setEncoding('utf8') instead of per-chunk Buffer.toString()

pcarleton · pcarleton · commit 65afa3db47fb · 2026-05-14T19:22:09.000+01:00
Per-chunk .toString() corrupts a multi-byte UTF-8 char that straddles a TCP
chunk boundary (e.g. 0xC3 | 0xBC decodes to two replacement chars instead of
'ü'). The custom-headers scenario sends '日本語'/'naïve' in the body, so this
is reachable in principle. setEncoding('utf8') makes 'data' emit strings with
boundary handling done by Node's StringDecoder.

Fixed in both places: BaseHttpScenario.handleRequest (request body) and
sendRawRequest (response body).
diff --git a/src/scenarios/client/http-base.ts b/src/scenarios/client/http-base.ts
@@ -88,9 +88,12 @@ export abstract class BaseHttpScenario implements Scenario {
       return;
     }
 
+    // Decode the stream as UTF-8 so multi-byte characters that straddle a
+    // chunk boundary aren't corrupted by per-chunk Buffer.toString().
+    req.setEncoding('utf8');
     let body = '';
     req.on('data', (chunk) => {
-      body += chunk.toString();
+      body += chunk;
     });
     req.on('end', () => {
       try {
diff --git a/src/scenarios/server/http-standard-headers.ts b/src/scenarios/server/http-standard-headers.ts
@@ -82,9 +82,10 @@ async function sendRawRequest(
         }
       },
       (res) => {
+        res.setEncoding('utf8');
         let data = '';
         res.on('data', (chunk) => {
-          data += chunk.toString();
+          data += chunk;
         });
         res.on('end', () => {
           let responseBody: any;