Open-Tech-Foundation
diff --git a/‎CHANGELOG.md‎
Lines changed: 14 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎bench/README.md‎
Lines changed: 45 additions & 42 deletions b/‎bench/README.md‎
Lines changed: 45 additions & 42 deletions
diff --git a/‎bench/rps.sh‎
Lines changed: 61 additions & 38 deletions b/‎bench/rps.sh‎
Lines changed: 61 additions & 38 deletions
diff --git a/‎crates/default-providers/src/system_http.rs‎
Lines changed: 33 additions & 10 deletions b/‎crates/default-providers/src/system_http.rs‎
Lines changed: 33 additions & 10 deletions
diff --git a/‎crates/providers/src/lib.rs‎
Lines changed: 14 additions & 8 deletions b/‎crates/providers/src/lib.rs‎
Lines changed: 14 additions & 8 deletions
@@ -8,6 +8,19 @@ pre-`0.1.0` and the public API is unstable.
 
 ### Changed
 
+- **`runtime:http` throughput — per-request cost cut ~30% (≈35k → ≈49k req/s,
+  hello-world plaintext).** Four changes to the request path, all under the hood:
+  the accept loop **batches** — one `http_next_request` crossing now drains many
+  already-queued requests (`HttpServerProvider::next_request` → `next_requests(id,
+  max)`, an embedder-visible trait change); request metadata crosses as a
+  **structured array** instead of a per-request JSON string built in Rust and
+  `JSON.parse`d in JS; the response body is read **synchronously** from the
+  `Response` (no `await arrayBuffer()` round-trip) via an internal `_parts()`
+  accessor; and a server `Request` reuses the **host-validated URL** instead of
+  re-parsing it (internal `__serverRequest`, gated by a closure-private symbol so
+  the public `Request` constructor's eager validation is unchanged). Measured
+  with an external load generator (`oha`); see `bench/README.md`.
+
 - **Driven loop now wakes on readiness, not on a fixed interval.** The standalone
   `Driver` injects a real `Waker` (`Runtime::set_async_waker` / `Engine::set_async_waker`)
   into the engine's async-op polling, and a newly-dispatched op wakes the loop
@@ -33,7 +46,7 @@ pre-`0.1.0` and the public API is unstable.
   `port: 0` picks an ephemeral one), `finished`, and `stop()`. Backed by a new
   injectable `HttpServerProvider` (vetted **hyper** 1.x, `SystemHttpServer`;
   each connection served on its own task, requests handed to the single-threaded
-  isolate one at a time) and gated on `Capability::NetListen` (like `runtime:net`
+  isolate in batches) and gated on `Capability::NetListen` (like `runtime:net`
   `listen`). Request/response bodies are buffered; TLS is not supported yet. New
   `examples/modules/http.mjs` and `runtime-http.d.ts`.
 
 
@@ -148,66 +148,69 @@ rss         |        40 |        29 |        53 |        19
 esrun a single thread does both jobs — useful for the warm request/response path,
 but not a server-throughput number. For that, `bench/rps.sh` runs a hello-world
 server per runtime (`scripts/helloserver.js`, plaintext `"Hello, World!"` on
-:3000) and points an **external** load generator ([autocannon]) at it — the
-classic plaintext req/s shape.
+:3000) and points an **external** load generator at it — the classic plaintext
+req/s shape.
+
+The generator is [oha] (or [bombardier]) — **not** autocannon: Bun's own
+`bench/express` README notes autocannon's node:http client can't push a fast
+server hard enough to measure it, and indeed autocannon capped *every* runtime at
+~35–40k here, hiding the real spread. Following Bun's setup, we send
+`-H "Accept-Encoding: identity"` (so Deno doesn't gzip the body) and a fixed
+request count.
 
 ```sh
 cargo build --release -p es-runtime-cli
-bench/rps.sh                       # autocannon -c 100, one connection/req
-CONN=250 PIPELINE=20 bench/rps.sh  # higher concurrency + HTTP pipelining
+cargo install oha                        # or: go install github.com/codesenberg/bombardier@latest
+bench/rps.sh                             # oha -c 100 -n 500000
+CONN=250 REQUESTS=1000000 bench/rps.sh   # heavier load
 ```
 
-Needs `autocannon` (used via `bunx`/`npx` if not installed globally). Indicative
-numbers on one Linux x86-64 box:
+Indicative numbers on one Linux x86-64 box (12 cores):
 
 ```
-# bench/rps.sh           (-c 100 -p 1)        # CONN=250 PIPELINE=20
-runtime |      req/sec                        runtime |      req/sec
---------+------------                         --------+------------
-node    |      32,924                         deno    |     125,715
-bun     |      35,644                         node    |      54,884
-deno    |      35,822                         esrun   |      35,156
-esrun   |      36,641                         bun     |      19,226
+# bare server (runtime:http)            # through Hono (framework)
+runtime |      req/sec                  runtime |      req/sec
+--------+------------                   --------+------------
+deno    |      85,070                   deno    |      71,531
+bun     |      82,615                   bun     |      62,894
+esrun   |      49,537                   esrun   |      47,722
+node    |      29,558                   node    |      28,217
 ```
 
-At ordinary concurrency (one in-flight request per connection) all four sit
-around ~35k req/s — esrun is at parity, marginally highest here. Under heavy
-HTTP pipelining the spread reflects each server's I/O model; esrun holds ~35k,
-which is its **single-thread ceiling** — one V8 isolate on a current-thread tokio
-runtime, by design (it's an embeddable runtime, not a multi-core web server). The
-earlier "2× slower" reading came from the in-process `http` workload, where esrun
-pays for the client and the server on the same thread; measured server-to-client
-it isn't there.
+esrun beats Node comfortably and reaches ~60% of Bun/Deno on the bare server,
+~75% through Hono. **All three (esrun, Bun, Deno) saturate ~one core** under this
+load — so this is not a core-count gap but a per-request one: esrun's
+JS↔Rust op boundary costs more per request than Bun's/Deno's tightly-integrated
+native path. The `runtime:http` request path was tuned for it (batched accept
+draining many requests per op crossing; structured request metadata instead of a
+per-request JSON round-trip; a synchronous response-body path; and reusing the
+host-validated URL instead of re-parsing it), which roughly cut the per-request
+cost from ~29µs to ~20µs (≈35k → ≈49k req/s). A single V8 isolate on a
+current-thread tokio runtime is the remaining ceiling — by design (it's an
+embeddable runtime, not a multi-core web server).
 
 ### Through a framework (Hono)
 
-The same shape served through [Hono] — a real, third-party web framework —
-instead of each runtime's bare server. This is the framework counterpart to the
-Bun framework charts: it shows esrun runs **unmodified npm ESM packages** off
-`node_modules`, not just its own server. Hono is Web-standard
-(`app.fetch(request) -> Response`), so it plugs straight into `runtime:http`,
-`Bun.serve`, and `Deno.serve`; Node uses Hono's `@hono/node-server` adapter.
+The right-hand column above is the same shape served through [Hono] — a real,
+third-party web framework — instead of each runtime's bare server. It shows esrun
+runs **unmodified npm ESM packages** off `node_modules`, not just its own server.
+Hono is Web-standard (`app.fetch(request) -> Response`), so it plugs straight into
+`runtime:http`, `Bun.serve`, and `Deno.serve`; Node uses Hono's `@hono/node-server`
+adapter.
 
 ```sh
 cd bench && bun install               # hono + @hono/node-server
-SERVER=scripts/hono.js bench/rps.sh   # -c 100 -p 1
-```
-
-```
-runtime |      req/sec
---------+------------
-node    |      33,358
-bun     |      39,686
-deno    |      40,150
-esrun   |      40,220
+SERVER=scripts/hono.js bench/rps.sh
 ```
 
-esrun is again at parity (marginally highest), and the framework layer costs all
-four about the same as the bare server — Express, by contrast, cannot run on
-esrun at all (it is CommonJS and needs `node:http`'s `(req, res)` API; esrun is
-ESM-only and rejects `node:` builtins).
+The framework narrows the gap (esrun is within ~25% of Bun here), because
+`runtime:http` is already esrun's native path while Bun/Deno pay Hono's adapter
+cost on top of their fast servers. Express, by contrast, cannot run on esrun at
+all (it is CommonJS and needs `node:http`'s `(req, res)` API; esrun is ESM-only
+and rejects `node:` builtins).
 
-[autocannon]: https://github.com/mcollina/autocannon
+[oha]: https://github.com/hatoo/oha
+[bombardier]: https://github.com/codesenberg/bombardier
 [Hono]: https://hono.dev
 
 ## Caveats
 
@@ -1,40 +1,46 @@
 #!/usr/bin/env bash
 #
 # HTTP requests/sec benchmark: a hello-world server per runtime, driven by an
-# external load generator (autocannon) — the classic "req/s" plaintext shape
-# (à la the Bun/TechEmpower charts). This is the *right* way to measure server
-# throughput: a separate client hammers the server over a real socket, so the
-# number reflects the server alone (unlike bench/run.sh's in-process `http`
-# workload, where the same single thread runs both the client fetch and the
-# server). Each runtime runs scripts/helloserver.js with its own native server.
+# external load generator — the classic "req/s" plaintext shape (à la the
+# Bun/TechEmpower charts). A separate client hammers the server over a real
+# socket, so the number reflects the server alone (unlike bench/run.sh's
+# in-process `http` workload, where one thread runs both client and server).
+# Each runtime runs $SERVER (scripts/helloserver.js by default) with its own
+# native server.
 #
-# Needs `autocannon` (used via `bunx autocannon`, or a global install). If
-# neither is available the script explains and exits.
+# Load generator: `oha` (preferred) or `bombardier` — NOT autocannon. Bun's own
+# bench/express README warns autocannon's node:http client can't push a fast
+# server hard enough to measure it, so we follow their setup: oha/bombardier
+# plus `-H "Accept-Encoding: identity"` (stops Deno gzipping the response) and a
+# fixed request count. Install: `cargo install oha`, or
+# `go install github.com/codesenberg/bombardier@latest`.
 #
-# Usage:  bench/rps.sh                        (auto-detects installed runtimes)
-#         CONN=250 PIPELINE=20 bench/rps.sh   (higher load / HTTP pipelining)
-#         DURATION=10 bench/rps.sh
-#         SERVER=scripts/hono.js bench/rps.sh (serve through the Hono framework;
-#                                              run `bun add hono @hono/node-server` first)
+# Usage:  bench/rps.sh                         (auto-detects installed runtimes)
+#         CONN=250 bench/rps.sh                (higher concurrency)
+#         REQUESTS=1000000 bench/rps.sh        (more requests per runtime)
+#         SERVER=scripts/hono.js bench/rps.sh  (serve through the Hono framework;
+#                                               run `bun install` in bench/ first)
 set -uo pipefail
 cd "$(dirname "$0")"
 
 ESRUN="${ESRUN:-../target/release/esrun}"
 SERVER="${SERVER:-scripts/helloserver.js}"  # the hello-world server to run
-PORT=3000   # the server scripts bind this fixed port
+PORT=3000           # the server scripts bind this fixed port
 CONN="${CONN:-100}"
-PIPELINE="${PIPELINE:-1}"
-DURATION="${DURATION:-10}"
+REQUESTS="${REQUESTS:-500000}"
 
-# Resolve autocannon: a global binary, else `bunx autocannon`.
-if command -v autocannon >/dev/null 2>&1; then
-  AC=(autocannon)
-elif command -v bunx >/dev/null 2>&1; then
-  AC=(bunx autocannon)
-elif command -v npx >/dev/null 2>&1; then
-  AC=(npx --yes autocannon)
+# Resolve the load generator: prefer oha, then bombardier (also check the usual
+# cargo/go install dirs even if they aren't on PATH). Sets TOOL + LOADER array.
+OHA="$(command -v oha 2>/dev/null || true)"; [ -z "$OHA" ] && [ -x "$HOME/.cargo/bin/oha" ] && OHA="$HOME/.cargo/bin/oha"
+BOMB="$(command -v bombardier 2>/dev/null || true)"; [ -z "$BOMB" ] && [ -x "$HOME/.local/bin/bombardier" ] && BOMB="$HOME/.local/bin/bombardier"
+if [ -n "$OHA" ]; then
+  TOOL="oha"
+elif [ -n "$BOMB" ]; then
+  TOOL="bombardier"
 else
-  echo "rps.sh needs autocannon (install it, or have bunx/npx available)." >&2
+  echo "rps.sh needs a load generator. Install one:" >&2
+  echo "  cargo install oha     # preferred" >&2
+  echo "  go install github.com/codesenberg/bombardier@latest" >&2
   exit 1
 fi
 
@@ -56,32 +62,49 @@ SERVER_PID=""
 cleanup() { [ -n "$SERVER_PID" ] && kill "$SERVER_PID" 2>/dev/null; }
 trap cleanup EXIT
 
-# Pulls req/s + latency out of autocannon's JSON for one runtime.
+URL="http://127.0.0.1:$PORT/"
+HDR="Accept-Encoding: identity"
+OUT="$(mktemp)"
+trap 'cleanup; rm -f "$OUT"' EXIT
+
+# Runs the load generator against the already-running server, writes JSON to
+# $OUT, then prints "<req/s> <avg-latency-ms>" parsed from it.
+load() {
+  if [ "$TOOL" = "oha" ]; then
+    "$OHA" -n "$REQUESTS" -c "$CONN" --no-tui --output-format json -H "$HDR" "$URL" >"$OUT" 2>/dev/null
+    python3 -c "
+import json
+d=json.load(open('$OUT'))['summary']
+print(f\"{d['requestsPerSec']:.0f} {d['average']*1000:.2f}\")" 2>/dev/null || echo "ERR ERR"
+  else
+    "$BOMB" -c "$CONN" -n "$REQUESTS" -H "$HDR" -o json -p result "$URL" >"$OUT" 2>/dev/null
+    python3 -c "
+import json
+d=json.load(open('$OUT'))['result']
+print(f\"{d['rps']['mean']:.0f} {d['latency']['mean']/1000:.2f}\")" 2>/dev/null || echo "ERR ERR"
+  fi
+}
+
+# Boots one runtime's server, waits for the port, loads it, tears it down.
 measure() {
-  local cmd="$1" j
+  local cmd="$1"
   $cmd "$SERVER" >/dev/null 2>&1 &
   SERVER_PID=$!
-  # Wait for the port to accept connections (up to ~5s).
   for _ in $(seq 50); do
     (echo > "/dev/tcp/127.0.0.1/$PORT") 2>/dev/null && break
     sleep 0.1
   done
-  j=$("${AC[@]}" -c "$CONN" -p "$PIPELINE" -d "$DURATION" -j "http://127.0.0.1:$PORT/" 2>/dev/null)
+  load
   kill "$SERVER_PID" 2>/dev/null; wait "$SERVER_PID" 2>/dev/null; SERVER_PID=""
-  python3 -c "
-import json,sys
-d=json.loads(sys.argv[1])
-print(f\"{d['requests']['average']:.0f} {d['latency']['average']} {d['latency']['p99']}\")
-" "$j" 2>/dev/null || echo "ERR ERR ERR"
 }
 
 echo "HTTP requests/sec — hello-world plaintext (\"Hello, World!\")"
 echo "server: $SERVER"
-echo "load: autocannon -c $CONN -p $PIPELINE -d ${DURATION}s on 127.0.0.1:$PORT"
+echo "load: $TOOL -c $CONN -n $REQUESTS -H \"$HDR\" $URL"
 echo
-printf "%-7s | %12s | %11s | %11s\n" "runtime" "req/sec" "avg lat" "p99 lat"
-printf -- "--------+--------------+-------------+------------\n"
+printf "%-7s | %12s | %11s\n" "runtime" "req/sec" "avg lat"
+printf -- "--------+--------------+------------\n"
 for r in "${ORDER[@]}"; do
-  read -r rps avg p99 <<<"$(measure "${CMD[$r]}")"
-  printf "%-7s | %12s | %9s ms | %8s ms\n" "$r" "$rps" "$avg" "$p99"
+  read -r rps avg <<<"$(measure "${CMD[$r]}")"
+  printf "%-7s | %12s | %9s ms\n" "$r" "$rps" "$avg"
 done
@@ -147,7 +147,10 @@ impl HttpServerProvider for SystemHttpServer {
                 .map_err(err)?;
             let local = listener.local_addr().ok();
             let authority = local.map(|a| a.to_string()).unwrap_or_default();
-            let (tx, rx) = mpsc::channel::<Pending>(64);
+            // Roomy buffer so many connections can have a request queued for the
+            // consumer to drain in one batch (see `next_requests`), rather than
+            // stalling on backpressure between crossings.
+            let (tx, rx) = mpsc::channel::<Pending>(1024);
 
             let acceptor = tokio::spawn(async move {
                 while let Ok((stream, _peer)) = listener.accept().await {
@@ -194,42 +197,62 @@ impl HttpServerProvider for SystemHttpServer {
         })
     }
 
-    fn next_request(
+    fn next_requests(
         &self,
         id: u64,
-    ) -> BoxFuture<Result<Option<(u64, HttpServerRequest)>, ProviderError>> {
+        max: usize,
+    ) -> BoxFuture<Result<Vec<(u64, HttpServerRequest)>, ProviderError>> {
         let this = self.clone();
         Box::pin(async move {
             // Take the receiver out so no lock is held across the await, then
             // reinsert to keep serving (mirrors SystemNet::accept). The shutdown
             // signal lives in a side map `close` can still reach meanwhile.
             let mut rx = match this.requests.lock().unwrap().remove(&id) {
                 Some(rx) => rx,
-                None => return Ok(None), // closed
+                None => return Ok(Vec::new()), // closed
             };
             let shutdown = this
                 .controls
                 .lock()
                 .unwrap()
                 .get(&id)
                 .map(|c| c.shutdown.clone());
-            let got = match shutdown {
+            // Await the first request (parking until one arrives or close fires)…
+            let first = match shutdown {
                 Some(notify) => tokio::select! {
                     biased;
                     () = notify.notified() => None, // close() asked us to stop
                     r = rx.recv() => r,
                 },
                 None => rx.recv().await,
             };
+            let mut batch = Vec::new();
+            if let Some(pending) = first {
+                batch.push(pending);
+                // …then drain whatever else is already queued, without parking,
+                // up to `max` — this is the amortization: one await, many
+                // requests handed to the single-threaded consumer per crossing.
+                while batch.len() < max {
+                    match rx.try_recv() {
+                        Ok(pending) => batch.push(pending),
+                        Err(_) => break, // empty (or disconnected) — stop draining
+                    }
+                }
+            }
             this.requests.lock().unwrap().insert(id, rx);
-            match got {
-                Some((req, sender)) => {
+
+            // Assign a request id to each and stash its response sender. (Empty
+            // batch ⇒ closed/shutting down.)
+            let mut out = Vec::with_capacity(batch.len());
+            if !batch.is_empty() {
+                let mut pending = this.pending.lock().unwrap();
+                for (req, sender) in batch {
                     let rid = this.id();
-                    this.pending.lock().unwrap().insert(rid, sender);
-                    Ok(Some((rid, req)))
+                    pending.insert(rid, sender);
+                    out.push((rid, req));
                 }
-                None => Ok(None), // closed, or all connections gone
             }
+            Ok(out)
         })
     }
 
 
@@ -422,10 +422,11 @@ pub struct HttpServerResponse {
 /// The implementation owns the listener and every accepted connection, parsing
 /// requests and writing responses; the runtime only supplies the response for
 /// each request. The flow is a handoff: [`serve`](Self::serve) binds and starts
-/// accepting, [`next_request`](Self::next_request) pulls the next parsed request
-/// (with an opaque id), and [`respond`](Self::respond) completes that id. This
-/// lets a multi-threaded HTTP backend feed the single-threaded JS isolate one
-/// request at a time. `serve` is capability-checked on `Capability::NetListen`
+/// accepting, [`next_requests`](Self::next_requests) drains a batch of parsed
+/// requests (each with an opaque id), and [`respond`](Self::respond) completes
+/// an id. This lets a multi-threaded HTTP backend feed the single-threaded JS
+/// isolate, amortizing the crossing over a batch. `serve` is capability-checked
+/// on `Capability::NetListen`
 /// (like `runtime:net` `listen`) before this is ever called; an embedder that
 /// installs no `HttpServerProvider` has no `runtime:http` access at all.
 pub trait HttpServerProvider: Send + Sync {
@@ -434,12 +435,17 @@ pub trait HttpServerProvider: Send + Sync {
     fn serve(&self, host: String, port: u16)
     -> BoxFuture<Result<(u64, SocketInfo), ProviderError>>;
 
-    /// Waits for the next inbound request on server `id`; resolves to a new
-    /// (request id, request), or `None` once the server is closed.
-    fn next_request(
+    /// Waits for inbound requests on server `id`, then drains any others already
+    /// queued (up to `max`) so the single-threaded consumer can amortize the
+    /// per-request crossing over a whole batch. Resolves to one-or-more
+    /// `(request id, request)` pairs, or an **empty** vec once the server is
+    /// closed. `max` bounds the batch (caller picks the cap); at least one
+    /// request is awaited before returning.
+    fn next_requests(
         &self,
         id: u64,
-    ) -> BoxFuture<Result<Option<(u64, HttpServerRequest)>, ProviderError>>;
+        max: usize,
+    ) -> BoxFuture<Result<Vec<(u64, HttpServerRequest)>, ProviderError>>;
 
     /// Completes request `request_id` by sending `response` to its client
     /// (idempotent; a stale/unknown id is ignored).