Skip to content

Commit 1fd3aab

Browse files
committed
fair-bench: adapt to upstream API changes after master merge
Master changed several APIs and removed files that fair-bench depended on: - Connectors are now factories taking a ConnectorRuntimeConfig (clockworklabs#4647) - runOne now requires runtimeConfig: RunnerRuntimeConfig - The SpacetimeDB connector exposes .call('seed', ...) (no more .reducer) - templates/keynote-2/src/helpers.ts was deleted; poolMaxFromEnv lives on getSharedRuntimeDefaults().poolMax now - USE_SPACETIME_METRICS_ENDPOINT is no longer read anywhere - The Rust client (clockworklabs#4753), warmup (clockworklabs#4757), and confirmedReads default (clockworklabs#4682) asymmetries documented in FAIR-BENCHMARK.md were already fixed upstream Updates: - fair-bench.ts: use parseBenchOptions, pass runtimeConfig to runOne, factory(config) instead of factory(), unified .call('seed', ...) seeding path for both Spacetime and RPC connectors - postgres-storedproc-rpc-server.ts: import getSharedRuntimeDefaults from ../config.ts instead of the deleted helpers.ts - FAIR-BENCHMARK.md: rewritten to credit master for the asymmetries it has resolved and focus on what this PR still adds (sequential pipelining, read_committed, synchronous_commit=on, stored procedure variant)
1 parent b400bc0 commit 1fd3aab

3 files changed

Lines changed: 240 additions & 188 deletions

File tree

Lines changed: 98 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,83 @@
11
# Fair Benchmark: SpacetimeDB vs Competitors
22

3-
This is an alternative benchmark configuration that levels the playing field between SpacetimeDB and traditional database stacks. The original benchmark (`demo.ts`) has several asymmetries that compound to inflate SpacetimeDB's advantage far beyond what the architecture alone provides.
4-
5-
## What This Changes
6-
7-
| Factor | Original Benchmark | Fair Benchmark |
8-
|--------|-------------------|----------------|
9-
| **SpacetimeDB client** | Custom Rust client with 16,384 in-flight ops | Same TypeScript client as everyone else |
10-
| **TPS counting** | Server-side Prometheus metrics (fire-and-forget) | Client-side round-trip counting for ALL systems |
11-
| **Durability** | `confirmedReads=false` (no durability guarantee) | `confirmedReads=true` (durable commits, like Postgres fsync) |
12-
| **Concurrency model** | 16,384 in-flight for SpacetimeDB vs 8 for competitors | Sequential (non-pipelined) for all systems |
13-
| **Postgres isolation** | `serializable` (non-default, worst-case for contention) | `read_committed` (Postgres actual default) |
14-
| **Postgres sync commit** | `synchronous_commit=off` | `synchronous_commit=on` (matches SpacetimeDB confirmed reads) |
15-
| **Postgres transfer** | 5 ORM round-trips via Drizzle | Also tested with stored procedure (single DB call) |
16-
| **Warmup** | 5s warmup for Rust client only | No warmup for any system (equal cold start) |
3+
This is an alternative benchmark configuration that levels the playing field
4+
between SpacetimeDB and traditional database stacks. It runs alongside the
5+
standard `demo`/`bench` commands without modifying their behaviour.
6+
7+
## What's Already Fair in `master`
8+
9+
When this fork was first opened (Feb 2026), the standard benchmark had
10+
several asymmetries between SpacetimeDB and competitors. Most have since been
11+
addressed upstream:
12+
13+
| Asymmetry | Status on master |
14+
|-----------|------------------|
15+
| SpacetimeDB used a hand-tuned Rust client; competitors used TypeScript | **Fixed (#4753):** Rust client removed; everyone uses the TypeScript client. |
16+
| `STDB_CONFIRMED_READS` defaulted to `false` | **Fixed (#4682):** confirmed reads is now the default. |
17+
| 5s warmup applied only to the SpacetimeDB Rust client | **Fixed (#4757):** warmup removed everywhere. |
18+
| TypeScript client was missing some optimizations the Rust client had | **Fixed (#4494):** TS client brought to parity. |
19+
| Compression unspecified | **Fixed (#4743):** compression mode is now an explicit knob. |
20+
21+
What this PR additionally enforces:
22+
23+
| Factor | Standard `bench` | Fair Benchmark |
24+
|--------|------------------|----------------|
25+
| **Pipelining** | SpacetimeDB pipelines up to `maxInflightPerWorker` (128) per connection; HTTP RPC connectors do not opt-in to pipelining | **Sequential** for all systems (`BENCH_PIPELINED=0`) so per-connection concurrency is identical |
26+
| **Postgres isolation** | `serializable` (forced via the standard `docker-compose.yml`) | `read_committed` (Postgres' actual default) — `SELECT … FOR UPDATE` already provides row locking |
27+
| **Postgres `synchronous_commit`** | `off` | `on` (matches SpacetimeDB confirmed-reads durability) |
28+
| **Postgres single-call transfer** | 5 ORM round-trips via Drizzle | Adds `postgres_storedproc_rpc`: a single `SELECT do_transfer(...)` PL/pgSQL call |
29+
| **Confirmed reads** | Default (true) | Explicitly forced to `true` (belt-and-suspenders) |
30+
31+
These remaining differences matter because the **architectural** advantage
32+
of SpacetimeDB (colocated compute + storage, no network hop for data access)
33+
is what we want to isolate. Sequential mode and a single-call stored
34+
procedure remove the most obvious confounds between "platform architecture"
35+
and "client/protocol/ORM choices."
1736

1837
## Why These Changes Matter
1938

20-
### 1. Same Client Language (TypeScript for All)
21-
22-
The original benchmark uses a hand-tuned **Rust client** for SpacetimeDB that sends 16,384 concurrent operations per connection via binary WebSocket, while all competitors use a TypeScript client with HTTP/JSON and 8 in-flight operations. This alone is a ~2000x difference in concurrency per connection.
23-
24-
The README justifies this by saying "we were bottlenecked on our test TypeScript client" — but then no competitor gets the same optimization. A fair comparison uses the same client for all.
39+
### 1. Sequential Operations (No Pipelining)
2540

26-
### 2. Confirmed Reads (Durable Commits)
41+
SpacetimeDB's TypeScript connector advertises `maxInflightPerWorker = 128`,
42+
which the runner picks up to pipeline 128 in-flight reducer calls per
43+
connection. RPC connectors (Postgres, CockroachDB, SQLite) leave this unset,
44+
so their per-connection concurrency is effectively 1. Forcing
45+
`BENCH_PIPELINED=0` makes both sides sequential per connection, isolating
46+
the per-call latency comparison.
2747

28-
The original benchmark defaults `STDB_CONFIRMED_READS` to `false`, meaning SpacetimeDB doesn't wait for durable commits before reporting success. Meanwhile Postgres runs with `fsync=on`. This is comparing "maybe durable" vs "definitely durable" — not a fair durability comparison.
48+
### 2. Postgres Isolation Level
2949

30-
### 3. Client-Side TPS Counting
50+
The standard `docker-compose.yml` sets
51+
`default_transaction_isolation=serializable` for Postgres. That is **not**
52+
Postgres' default (`read_committed`). Under the Zipf contention workload,
53+
serializable causes large transaction-abort/retry storms that
54+
disproportionately hurt Postgres. The benchmark already uses
55+
`SELECT … FOR UPDATE` for row-level locking, so serializable is unnecessary
56+
to get correct results.
3157

32-
The original `demo.ts` sets `USE_SPACETIME_METRICS_ENDPOINT=1`, which counts committed transactions **on the server** via Prometheus. Combined with the fire-and-forget Rust client, this counts transactions that completed server-side but whose acknowledgments may not have reached the client. All other systems count only after the full round-trip completes.
58+
### 3. Stored Procedure (`postgres_storedproc_rpc`)
3359

34-
### 4. Postgres Isolation Level
60+
A SpacetimeDB reducer is a **single atomic call** that runs inside the
61+
database. The standard Postgres comparison uses Drizzle ORM, which sends
62+
roughly:
3563

36-
The original forces `default_transaction_isolation=serializable` on Postgres, which is **not** the default (`read_committed`). Under the Zipf contention workload, serializable causes massive transaction aborts and retries, dramatically hurting Postgres performance. The benchmark already uses `SELECT ... FOR UPDATE` for row-level locking, making serializable unnecessary.
37-
38-
### 5. Stored Procedure vs ORM
39-
40-
SpacetimeDB's reducer executes as a single atomic operation inside the database. The original Postgres benchmark uses Drizzle ORM which requires:
4164
- `BEGIN`
42-
- `SELECT ... FOR UPDATE` (fetch both accounts)
65+
- `SELECT FOR UPDATE` (fetch both accounts)
4366
- `UPDATE` (debit)
4467
- `UPDATE` (credit)
4568
- `COMMIT`
4669

47-
That's 5 round-trips between the Node.js process and Postgres. A stored procedure (`do_transfer()`) does the same work in a single call — which is the fair equivalent of SpacetimeDB's reducer model.
70+
i.e. ~5 round-trips between Node and Postgres per transfer. A PL/pgSQL
71+
stored procedure (`do_transfer`) does the same work in a single round-trip
72+
— architecturally the same shape as a reducer. Comparing
73+
`postgres_storedproc_rpc` against `spacetimedb` cleanly isolates the
74+
"platform architecture" gap from the "ORM round-trip overhead" gap.
75+
76+
### 4. `synchronous_commit=on`
77+
78+
SpacetimeDB with confirmed reads waits for durable acknowledgement before
79+
returning. Postgres should match: `synchronous_commit=on` (the default;
80+
the standard compose file overrides it to `off` for raw throughput).
4881

4982
## Running the Fair Benchmark
5083

@@ -54,33 +87,32 @@ That's 5 round-trips between the Node.js process and Postgres. A stored procedur
5487
# Install dependencies
5588
pnpm install
5689

57-
# Start services
90+
# Start services with fair config (Postgres tuned to defaults; stored
91+
# procedure RPC server included)
5892
docker compose -f docker-compose-fair.yml up -d
5993

60-
# Start SpacetimeDB
94+
# If running SpacetimeDB locally instead of in Docker
6195
spacetime start
62-
63-
# Publish the SpacetimeDB module
6496
spacetime publish --server local test-1 --module-path ./spacetimedb
6597
```
6698

6799
### Run
68100

69101
```bash
70102
# Default: SpacetimeDB vs Postgres (ORM) vs Postgres (stored proc)
71-
npm run fair-bench
103+
pnpm run fair-bench
72104

73-
# With options
74-
npm run fair-bench -- --seconds 10 --concurrency 50 --alpha 0.5
105+
# With options (these mirror the standard bench CLI; --skip-prep is fair-bench-only)
106+
pnpm run fair-bench -- --seconds 10 --concurrency 50 --alpha 0.5
75107

76108
# High contention
77-
npm run fair-bench -- --alpha 1.5
109+
pnpm run fair-bench -- --alpha 1.5
78110

79111
# Include more systems
80-
npm run fair-bench -- --systems spacetimedb,postgres_rpc,postgres_storedproc_rpc,sqlite_rpc
112+
pnpm run fair-bench -- --systems spacetimedb,postgres_rpc,postgres_storedproc_rpc,sqlite_rpc
81113

82114
# Skip seeding (if already seeded)
83-
npm run fair-bench -- --skip-prep
115+
pnpm run fair-bench -- --skip-prep
84116
```
85117

86118
### Start the Stored Procedure RPC Server (non-Docker)
@@ -90,22 +122,34 @@ npm run fair-bench -- --skip-prep
90122
PG_STOREDPROC_RPC_PORT=4105 npx tsx src/rpc-servers/postgres-storedproc-rpc-server.ts
91123
```
92124

93-
## Expected Results
125+
## Postgres Rust Client (Apples-to-Apples Binary Protocol)
126+
127+
Now that `master` has removed the SpacetimeDB Rust client, the original
128+
"Rust binary protocol vs Node.js HTTP+JSON" gap is no longer present in the
129+
standard benchmark — both sides use TypeScript. The
130+
`postgres-rust-client/` directory in this PR is therefore a **standalone
131+
reference**: it lets you measure how much of any remaining gap is due to
132+
Node.js client overhead vs. the database itself, by driving Postgres with a
133+
Rust binary client.
94134

95-
With a leveled playing field, SpacetimeDB's genuine architectural advantage (colocated compute+storage, no network hop for data access) should still show a meaningful speedup — likely **2-5x** rather than the claimed **14x**. The remaining advantage is real and architectural:
135+
```bash
136+
cd postgres-rust-client
137+
cargo run --release -- --seconds 10 --concurrency 50 --alpha 0.5
138+
```
139+
140+
## What the Numbers Should Show
141+
142+
With the playing field leveled, SpacetimeDB's genuine architectural
143+
advantage (colocated compute+storage, no network hop for data access)
144+
should still show a meaningful speedup. The remaining gap reflects:
96145

97146
- Zero-copy in-process data access vs TCP round-trips
98-
- Rust execution vs Node.js JavaScript
147+
- Rust execution vs Node.js JavaScript on the server side
99148
- Binary BSATN protocol vs JSON serialization
100149

101-
The factors that are **not** architectural and were removed:
102-
- Custom Rust client vs shared TypeScript client
103-
- 16,384 vs 8 in-flight operations per connection
104-
- Server-side vs client-side TPS counting
105-
- Unequal durability guarantees
106-
- Non-default Postgres isolation level penalizing competitors
107-
- ORM overhead (5 round-trips) vs single reducer call
108-
109-
## Detailed Asymmetry Analysis
150+
Confounds that are **not** architectural and are normalized away here:
110151

111-
For a comprehensive breakdown of every asymmetry in the original benchmark, see the table in the PR description.
152+
- Per-connection pipelining differences
153+
- Postgres being run at non-default isolation
154+
- ORM overhead (5 round-trips) vs single-call reducer
155+
- `synchronous_commit=off` skewing Postgres' durability story

0 commit comments

Comments
 (0)