Skip to content

Commit efcd445

Browse files
perf(http): cut runtime:http per-request cost ~30% (≈35k→≈49k req/s)
Investigating why esrun trailed Bun/Deno on HTTP throughput showed it was not a core-count gap — all three saturate ~one core under load — but a per-request one: esrun's JS↔Rust op boundary cost ~29µs/request vs ~11µs for the native runtimes. Four changes to the request path, measured by ablation, bring it to ~20µs: - Batch the accept loop: HttpServerProvider::next_request -> next_requests( id, max) drains all already-queued requests in one op crossing, amortizing the dispatch + promise-resolve + microtask checkpoint over the batch. - Cross request metadata as a structured array, not a per-request JSON string built char-by-char in Rust and JSON.parse'd in JS. - Read the response body synchronously from the Response (Response._parts) instead of the async arrayBuffer() round-trip. - Reuse the host-validated absolute URL for the server Request instead of re-parsing it (internal __serverRequest, gated by a closure-private symbol so the public Request constructor's eager validation is unchanged). Also switch bench/rps.sh from autocannon to oha/bombardier: Bun's own bench/express README warns autocannon's node:http client can't push a fast server hard enough to measure it, and indeed it had capped every runtime at ~35-40k, hiding the real spread (Bun/Deno ~83-85k, esrun ~49k, Node ~30k). README + home-page req/s chart updated to the real oha numbers.
1 parent 6cce0ec commit efcd445

9 files changed

Lines changed: 267 additions & 160 deletions

File tree

CHANGELOG.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,19 @@ pre-`0.1.0` and the public API is unstable.
88

99
### Changed
1010

11+
- **`runtime:http` throughput — per-request cost cut ~30% (≈35k → ≈49k req/s,
12+
hello-world plaintext).** Four changes to the request path, all under the hood:
13+
the accept loop **batches** — one `http_next_request` crossing now drains many
14+
already-queued requests (`HttpServerProvider::next_request` → `next_requests(id,
15+
max)`, an embedder-visible trait change); request metadata crosses as a
16+
**structured array** instead of a per-request JSON string built in Rust and
17+
`JSON.parse`d in JS; the response body is read **synchronously** from the
18+
`Response` (no `await arrayBuffer()` round-trip) via an internal `_parts()`
19+
accessor; and a server `Request` reuses the **host-validated URL** instead of
20+
re-parsing it (internal `__serverRequest`, gated by a closure-private symbol so
21+
the public `Request` constructor's eager validation is unchanged). Measured
22+
with an external load generator (`oha`); see `bench/README.md`.
23+
1124
- **Driven loop now wakes on readiness, not on a fixed interval.** The standalone
1225
`Driver` injects a real `Waker` (`Runtime::set_async_waker` / `Engine::set_async_waker`)
1326
into the engine's async-op polling, and a newly-dispatched op wakes the loop
@@ -33,7 +46,7 @@ pre-`0.1.0` and the public API is unstable.
3346
`port: 0` picks an ephemeral one), `finished`, and `stop()`. Backed by a new
3447
injectable `HttpServerProvider` (vetted **hyper** 1.x, `SystemHttpServer`;
3548
each connection served on its own task, requests handed to the single-threaded
36-
isolate one at a time) and gated on `Capability::NetListen` (like `runtime:net`
49+
isolate in batches) and gated on `Capability::NetListen` (like `runtime:net`
3750
`listen`). Request/response bodies are buffered; TLS is not supported yet. New
3851
`examples/modules/http.mjs` and `runtime-http.d.ts`.
3952

bench/README.md

Lines changed: 45 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -148,66 +148,69 @@ rss | 40 | 29 | 53 | 19
148148
esrun a single thread does both jobs — useful for the warm request/response path,
149149
but not a server-throughput number. For that, `bench/rps.sh` runs a hello-world
150150
server per runtime (`scripts/helloserver.js`, plaintext `"Hello, World!"` on
151-
:3000) and points an **external** load generator ([autocannon]) at it — the
152-
classic plaintext req/s shape.
151+
:3000) and points an **external** load generator at it — the classic plaintext
152+
req/s shape.
153+
154+
The generator is [oha] (or [bombardier]) — **not** autocannon: Bun's own
155+
`bench/express` README notes autocannon's node:http client can't push a fast
156+
server hard enough to measure it, and indeed autocannon capped *every* runtime at
157+
~35–40k here, hiding the real spread. Following Bun's setup, we send
158+
`-H "Accept-Encoding: identity"` (so Deno doesn't gzip the body) and a fixed
159+
request count.
153160

154161
```sh
155162
cargo build --release -p es-runtime-cli
156-
bench/rps.sh # autocannon -c 100, one connection/req
157-
CONN=250 PIPELINE=20 bench/rps.sh # higher concurrency + HTTP pipelining
163+
cargo install oha # or: go install github.com/codesenberg/bombardier@latest
164+
bench/rps.sh # oha -c 100 -n 500000
165+
CONN=250 REQUESTS=1000000 bench/rps.sh # heavier load
158166
```
159167

160-
Needs `autocannon` (used via `bunx`/`npx` if not installed globally). Indicative
161-
numbers on one Linux x86-64 box:
168+
Indicative numbers on one Linux x86-64 box (12 cores):
162169

163170
```
164-
# bench/rps.sh (-c 100 -p 1) # CONN=250 PIPELINE=20
165-
runtime | req/sec runtime | req/sec
166-
--------+------------ --------+------------
167-
node | 32,924 deno | 125,715
168-
bun | 35,644 node | 54,884
169-
deno | 35,822 esrun | 35,156
170-
esrun | 36,641 bun | 19,226
171+
# bare server (runtime:http) # through Hono (framework)
172+
runtime | req/sec runtime | req/sec
173+
--------+------------ --------+------------
174+
deno | 85,070 deno | 71,531
175+
bun | 82,615 bun | 62,894
176+
esrun | 49,537 esrun | 47,722
177+
node | 29,558 node | 28,217
171178
```
172179

173-
At ordinary concurrency (one in-flight request per connection) all four sit
174-
around ~35k req/s — esrun is at parity, marginally highest here. Under heavy
175-
HTTP pipelining the spread reflects each server's I/O model; esrun holds ~35k,
176-
which is its **single-thread ceiling** — one V8 isolate on a current-thread tokio
177-
runtime, by design (it's an embeddable runtime, not a multi-core web server). The
178-
earlier "2× slower" reading came from the in-process `http` workload, where esrun
179-
pays for the client and the server on the same thread; measured server-to-client
180-
it isn't there.
180+
esrun beats Node comfortably and reaches ~60% of Bun/Deno on the bare server,
181+
~75% through Hono. **All three (esrun, Bun, Deno) saturate ~one core** under this
182+
load — so this is not a core-count gap but a per-request one: esrun's
183+
JS↔Rust op boundary costs more per request than Bun's/Deno's tightly-integrated
184+
native path. The `runtime:http` request path was tuned for it (batched accept
185+
draining many requests per op crossing; structured request metadata instead of a
186+
per-request JSON round-trip; a synchronous response-body path; and reusing the
187+
host-validated URL instead of re-parsing it), which roughly cut the per-request
188+
cost from ~29µs to ~20µs (≈35k → ≈49k req/s). A single V8 isolate on a
189+
current-thread tokio runtime is the remaining ceiling — by design (it's an
190+
embeddable runtime, not a multi-core web server).
181191

182192
### Through a framework (Hono)
183193

184-
The same shape served through [Hono] — a real, third-party web framework —
185-
instead of each runtime's bare server. This is the framework counterpart to the
186-
Bun framework charts: it shows esrun runs **unmodified npm ESM packages** off
187-
`node_modules`, not just its own server. Hono is Web-standard
188-
(`app.fetch(request) -> Response`), so it plugs straight into `runtime:http`,
189-
`Bun.serve`, and `Deno.serve`; Node uses Hono's `@hono/node-server` adapter.
194+
The right-hand column above is the same shape served through [Hono] — a real,
195+
third-party web framework — instead of each runtime's bare server. It shows esrun
196+
runs **unmodified npm ESM packages** off `node_modules`, not just its own server.
197+
Hono is Web-standard (`app.fetch(request) -> Response`), so it plugs straight into
198+
`runtime:http`, `Bun.serve`, and `Deno.serve`; Node uses Hono's `@hono/node-server`
199+
adapter.
190200

191201
```sh
192202
cd bench && bun install # hono + @hono/node-server
193-
SERVER=scripts/hono.js bench/rps.sh # -c 100 -p 1
194-
```
195-
196-
```
197-
runtime | req/sec
198-
--------+------------
199-
node | 33,358
200-
bun | 39,686
201-
deno | 40,150
202-
esrun | 40,220
203+
SERVER=scripts/hono.js bench/rps.sh
203204
```
204205

205-
esrun is again at parity (marginally highest), and the framework layer costs all
206-
four about the same as the bare server — Express, by contrast, cannot run on
207-
esrun at all (it is CommonJS and needs `node:http`'s `(req, res)` API; esrun is
208-
ESM-only and rejects `node:` builtins).
206+
The framework narrows the gap (esrun is within ~25% of Bun here), because
207+
`runtime:http` is already esrun's native path while Bun/Deno pay Hono's adapter
208+
cost on top of their fast servers. Express, by contrast, cannot run on esrun at
209+
all (it is CommonJS and needs `node:http`'s `(req, res)` API; esrun is ESM-only
210+
and rejects `node:` builtins).
209211

210-
[autocannon]: https://github.com/mcollina/autocannon
212+
[oha]: https://github.com/hatoo/oha
213+
[bombardier]: https://github.com/codesenberg/bombardier
211214
[Hono]: https://hono.dev
212215

213216
## Caveats

bench/rps.sh

Lines changed: 61 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,46 @@
11
#!/usr/bin/env bash
22
#
33
# HTTP requests/sec benchmark: a hello-world server per runtime, driven by an
4-
# external load generator (autocannon) — the classic "req/s" plaintext shape
5-
# (à la the Bun/TechEmpower charts). This is the *right* way to measure server
6-
# throughput: a separate client hammers the server over a real socket, so the
7-
# number reflects the server alone (unlike bench/run.sh's in-process `http`
8-
# workload, where the same single thread runs both the client fetch and the
9-
# server). Each runtime runs scripts/helloserver.js with its own native server.
4+
# external load generator — the classic "req/s" plaintext shape (à la the
5+
# Bun/TechEmpower charts). A separate client hammers the server over a real
6+
# socket, so the number reflects the server alone (unlike bench/run.sh's
7+
# in-process `http` workload, where one thread runs both client and server).
8+
# Each runtime runs $SERVER (scripts/helloserver.js by default) with its own
9+
# native server.
1010
#
11-
# Needs `autocannon` (used via `bunx autocannon`, or a global install). If
12-
# neither is available the script explains and exits.
11+
# Load generator: `oha` (preferred) or `bombardier` — NOT autocannon. Bun's own
12+
# bench/express README warns autocannon's node:http client can't push a fast
13+
# server hard enough to measure it, so we follow their setup: oha/bombardier
14+
# plus `-H "Accept-Encoding: identity"` (stops Deno gzipping the response) and a
15+
# fixed request count. Install: `cargo install oha`, or
16+
# `go install github.com/codesenberg/bombardier@latest`.
1317
#
14-
# Usage: bench/rps.sh (auto-detects installed runtimes)
15-
# CONN=250 PIPELINE=20 bench/rps.sh (higher load / HTTP pipelining)
16-
# DURATION=10 bench/rps.sh
17-
# SERVER=scripts/hono.js bench/rps.sh (serve through the Hono framework;
18-
# run `bun add hono @hono/node-server` first)
18+
# Usage: bench/rps.sh (auto-detects installed runtimes)
19+
# CONN=250 bench/rps.sh (higher concurrency)
20+
# REQUESTS=1000000 bench/rps.sh (more requests per runtime)
21+
# SERVER=scripts/hono.js bench/rps.sh (serve through the Hono framework;
22+
# run `bun install` in bench/ first)
1923
set -uo pipefail
2024
cd "$(dirname "$0")"
2125

2226
ESRUN="${ESRUN:-../target/release/esrun}"
2327
SERVER="${SERVER:-scripts/helloserver.js}" # the hello-world server to run
24-
PORT=3000 # the server scripts bind this fixed port
28+
PORT=3000 # the server scripts bind this fixed port
2529
CONN="${CONN:-100}"
26-
PIPELINE="${PIPELINE:-1}"
27-
DURATION="${DURATION:-10}"
30+
REQUESTS="${REQUESTS:-500000}"
2831

29-
# Resolve autocannon: a global binary, else `bunx autocannon`.
30-
if command -v autocannon >/dev/null 2>&1; then
31-
AC=(autocannon)
32-
elif command -v bunx >/dev/null 2>&1; then
33-
AC=(bunx autocannon)
34-
elif command -v npx >/dev/null 2>&1; then
35-
AC=(npx --yes autocannon)
32+
# Resolve the load generator: prefer oha, then bombardier (also check the usual
33+
# cargo/go install dirs even if they aren't on PATH). Sets TOOL + LOADER array.
34+
OHA="$(command -v oha 2>/dev/null || true)"; [ -z "$OHA" ] && [ -x "$HOME/.cargo/bin/oha" ] && OHA="$HOME/.cargo/bin/oha"
35+
BOMB="$(command -v bombardier 2>/dev/null || true)"; [ -z "$BOMB" ] && [ -x "$HOME/.local/bin/bombardier" ] && BOMB="$HOME/.local/bin/bombardier"
36+
if [ -n "$OHA" ]; then
37+
TOOL="oha"
38+
elif [ -n "$BOMB" ]; then
39+
TOOL="bombardier"
3640
else
37-
echo "rps.sh needs autocannon (install it, or have bunx/npx available)." >&2
41+
echo "rps.sh needs a load generator. Install one:" >&2
42+
echo " cargo install oha # preferred" >&2
43+
echo " go install github.com/codesenberg/bombardier@latest" >&2
3844
exit 1
3945
fi
4046

@@ -56,32 +62,49 @@ SERVER_PID=""
5662
cleanup() { [ -n "$SERVER_PID" ] && kill "$SERVER_PID" 2>/dev/null; }
5763
trap cleanup EXIT
5864

59-
# Pulls req/s + latency out of autocannon's JSON for one runtime.
65+
URL="http://127.0.0.1:$PORT/"
66+
HDR="Accept-Encoding: identity"
67+
OUT="$(mktemp)"
68+
trap 'cleanup; rm -f "$OUT"' EXIT
69+
70+
# Runs the load generator against the already-running server, writes JSON to
71+
# $OUT, then prints "<req/s> <avg-latency-ms>" parsed from it.
72+
load() {
73+
if [ "$TOOL" = "oha" ]; then
74+
"$OHA" -n "$REQUESTS" -c "$CONN" --no-tui --output-format json -H "$HDR" "$URL" >"$OUT" 2>/dev/null
75+
python3 -c "
76+
import json
77+
d=json.load(open('$OUT'))['summary']
78+
print(f\"{d['requestsPerSec']:.0f} {d['average']*1000:.2f}\")" 2>/dev/null || echo "ERR ERR"
79+
else
80+
"$BOMB" -c "$CONN" -n "$REQUESTS" -H "$HDR" -o json -p result "$URL" >"$OUT" 2>/dev/null
81+
python3 -c "
82+
import json
83+
d=json.load(open('$OUT'))['result']
84+
print(f\"{d['rps']['mean']:.0f} {d['latency']['mean']/1000:.2f}\")" 2>/dev/null || echo "ERR ERR"
85+
fi
86+
}
87+
88+
# Boots one runtime's server, waits for the port, loads it, tears it down.
6089
measure() {
61-
local cmd="$1" j
90+
local cmd="$1"
6291
$cmd "$SERVER" >/dev/null 2>&1 &
6392
SERVER_PID=$!
64-
# Wait for the port to accept connections (up to ~5s).
6593
for _ in $(seq 50); do
6694
(echo > "/dev/tcp/127.0.0.1/$PORT") 2>/dev/null && break
6795
sleep 0.1
6896
done
69-
j=$("${AC[@]}" -c "$CONN" -p "$PIPELINE" -d "$DURATION" -j "http://127.0.0.1:$PORT/" 2>/dev/null)
97+
load
7098
kill "$SERVER_PID" 2>/dev/null; wait "$SERVER_PID" 2>/dev/null; SERVER_PID=""
71-
python3 -c "
72-
import json,sys
73-
d=json.loads(sys.argv[1])
74-
print(f\"{d['requests']['average']:.0f} {d['latency']['average']} {d['latency']['p99']}\")
75-
" "$j" 2>/dev/null || echo "ERR ERR ERR"
7699
}
77100

78101
echo "HTTP requests/sec — hello-world plaintext (\"Hello, World!\")"
79102
echo "server: $SERVER"
80-
echo "load: autocannon -c $CONN -p $PIPELINE -d ${DURATION}s on 127.0.0.1:$PORT"
103+
echo "load: $TOOL -c $CONN -n $REQUESTS -H \"$HDR\" $URL"
81104
echo
82-
printf "%-7s | %12s | %11s | %11s\n" "runtime" "req/sec" "avg lat" "p99 lat"
83-
printf -- "--------+--------------+-------------+------------\n"
105+
printf "%-7s | %12s | %11s\n" "runtime" "req/sec" "avg lat"
106+
printf -- "--------+--------------+------------\n"
84107
for r in "${ORDER[@]}"; do
85-
read -r rps avg p99 <<<"$(measure "${CMD[$r]}")"
86-
printf "%-7s | %12s | %9s ms | %8s ms\n" "$r" "$rps" "$avg" "$p99"
108+
read -r rps avg <<<"$(measure "${CMD[$r]}")"
109+
printf "%-7s | %12s | %9s ms\n" "$r" "$rps" "$avg"
87110
done

crates/default-providers/src/system_http.rs

Lines changed: 33 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,10 @@ impl HttpServerProvider for SystemHttpServer {
147147
.map_err(err)?;
148148
let local = listener.local_addr().ok();
149149
let authority = local.map(|a| a.to_string()).unwrap_or_default();
150-
let (tx, rx) = mpsc::channel::<Pending>(64);
150+
// Roomy buffer so many connections can have a request queued for the
151+
// consumer to drain in one batch (see `next_requests`), rather than
152+
// stalling on backpressure between crossings.
153+
let (tx, rx) = mpsc::channel::<Pending>(1024);
151154

152155
let acceptor = tokio::spawn(async move {
153156
while let Ok((stream, _peer)) = listener.accept().await {
@@ -194,42 +197,62 @@ impl HttpServerProvider for SystemHttpServer {
194197
})
195198
}
196199

197-
fn next_request(
200+
fn next_requests(
198201
&self,
199202
id: u64,
200-
) -> BoxFuture<Result<Option<(u64, HttpServerRequest)>, ProviderError>> {
203+
max: usize,
204+
) -> BoxFuture<Result<Vec<(u64, HttpServerRequest)>, ProviderError>> {
201205
let this = self.clone();
202206
Box::pin(async move {
203207
// Take the receiver out so no lock is held across the await, then
204208
// reinsert to keep serving (mirrors SystemNet::accept). The shutdown
205209
// signal lives in a side map `close` can still reach meanwhile.
206210
let mut rx = match this.requests.lock().unwrap().remove(&id) {
207211
Some(rx) => rx,
208-
None => return Ok(None), // closed
212+
None => return Ok(Vec::new()), // closed
209213
};
210214
let shutdown = this
211215
.controls
212216
.lock()
213217
.unwrap()
214218
.get(&id)
215219
.map(|c| c.shutdown.clone());
216-
let got = match shutdown {
220+
// Await the first request (parking until one arrives or close fires)…
221+
let first = match shutdown {
217222
Some(notify) => tokio::select! {
218223
biased;
219224
() = notify.notified() => None, // close() asked us to stop
220225
r = rx.recv() => r,
221226
},
222227
None => rx.recv().await,
223228
};
229+
let mut batch = Vec::new();
230+
if let Some(pending) = first {
231+
batch.push(pending);
232+
// …then drain whatever else is already queued, without parking,
233+
// up to `max` — this is the amortization: one await, many
234+
// requests handed to the single-threaded consumer per crossing.
235+
while batch.len() < max {
236+
match rx.try_recv() {
237+
Ok(pending) => batch.push(pending),
238+
Err(_) => break, // empty (or disconnected) — stop draining
239+
}
240+
}
241+
}
224242
this.requests.lock().unwrap().insert(id, rx);
225-
match got {
226-
Some((req, sender)) => {
243+
244+
// Assign a request id to each and stash its response sender. (Empty
245+
// batch ⇒ closed/shutting down.)
246+
let mut out = Vec::with_capacity(batch.len());
247+
if !batch.is_empty() {
248+
let mut pending = this.pending.lock().unwrap();
249+
for (req, sender) in batch {
227250
let rid = this.id();
228-
this.pending.lock().unwrap().insert(rid, sender);
229-
Ok(Some((rid, req)))
251+
pending.insert(rid, sender);
252+
out.push((rid, req));
230253
}
231-
None => Ok(None), // closed, or all connections gone
232254
}
255+
Ok(out)
233256
})
234257
}
235258

crates/providers/src/lib.rs

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -422,10 +422,11 @@ pub struct HttpServerResponse {
422422
/// The implementation owns the listener and every accepted connection, parsing
423423
/// requests and writing responses; the runtime only supplies the response for
424424
/// each request. The flow is a handoff: [`serve`](Self::serve) binds and starts
425-
/// accepting, [`next_request`](Self::next_request) pulls the next parsed request
426-
/// (with an opaque id), and [`respond`](Self::respond) completes that id. This
427-
/// lets a multi-threaded HTTP backend feed the single-threaded JS isolate one
428-
/// request at a time. `serve` is capability-checked on `Capability::NetListen`
425+
/// accepting, [`next_requests`](Self::next_requests) drains a batch of parsed
426+
/// requests (each with an opaque id), and [`respond`](Self::respond) completes
427+
/// an id. This lets a multi-threaded HTTP backend feed the single-threaded JS
428+
/// isolate, amortizing the crossing over a batch. `serve` is capability-checked
429+
/// on `Capability::NetListen`
429430
/// (like `runtime:net` `listen`) before this is ever called; an embedder that
430431
/// installs no `HttpServerProvider` has no `runtime:http` access at all.
431432
pub trait HttpServerProvider: Send + Sync {
@@ -434,12 +435,17 @@ pub trait HttpServerProvider: Send + Sync {
434435
fn serve(&self, host: String, port: u16)
435436
-> BoxFuture<Result<(u64, SocketInfo), ProviderError>>;
436437

437-
/// Waits for the next inbound request on server `id`; resolves to a new
438-
/// (request id, request), or `None` once the server is closed.
439-
fn next_request(
438+
/// Waits for inbound requests on server `id`, then drains any others already
439+
/// queued (up to `max`) so the single-threaded consumer can amortize the
440+
/// per-request crossing over a whole batch. Resolves to one-or-more
441+
/// `(request id, request)` pairs, or an **empty** vec once the server is
442+
/// closed. `max` bounds the batch (caller picks the cap); at least one
443+
/// request is awaited before returning.
444+
fn next_requests(
440445
&self,
441446
id: u64,
442-
) -> BoxFuture<Result<Option<(u64, HttpServerRequest)>, ProviderError>>;
447+
max: usize,
448+
) -> BoxFuture<Result<Vec<(u64, HttpServerRequest)>, ProviderError>>;
443449

444450
/// Completes request `request_id` by sending `response` to its client
445451
/// (idempotent; a stale/unknown id is ignored).

0 commit comments

Comments
 (0)