You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -31,7 +31,7 @@ We did it with a repeatable process, not a single clever trick:
31
31
- add server-only fast paths behind a build-time `isServer` flag
32
32
- avoid `delete` in performance-sensitive code
33
33
34
-
The changes span ~20 PRs; we highlight the highest-impact patterns below. This article focuses on methodology and mechanisms you can reuse in any SSR framework.
34
+
The changes span over 20 PRs; we highlight the highest-impact patterns below.
35
35
36
36
## What we optimized (and what we did not)
37
37
@@ -57,7 +57,7 @@ This is transferable: isolate the subsystem you want to improve, and benchmark t
57
57
58
58
### Load generation with `autocannon`
59
59
60
-
We used `autocannon` to generate a 30s sustained load. We tracked:
60
+
We used [`autocannon`](https://github.com/mcollina/autocannon) to generate a 30s sustained load. We tracked:
While the server handled load, we recorded CPU profiles using `@platformatic/flame`.
73
+
To record a CPU profile of the server under load, we use [`@platformatic/flame`](https://github.com/platformatic/flame) to start the server:
74
+
```sh
75
+
flame run ./dist/server.mjs
76
+
```
74
77
75
-
How we read the flamegraph:
78
+
The resulting flamegraph can be read with a tool like [Speedscope](https://www.speedscope.app/):
76
79
77
80
- Focus on **self time** first. That is where the CPU is actually spent, not just where time is waiting on children.
78
81
- Fix one hotspot, re-run, and re-profile.
@@ -87,26 +90,14 @@ Placeholders you should replace with real screenshots:
87
90
88
91
### Reproducing these benchmarks
89
92
90
-
**Environment:**
91
-
92
-
Our benchmarks were stable enough to produce very similar results on a range of setups. However here are the exact environment details we used to run the benchmarks:
93
+
Our benchmarks were stable enough to produce very similar results on a range of setups. However here are the exact environment details we used to run most of the benchmarks:
93
94
94
95
- Node.js: v24.12.0
95
-
- Hardware: Macbook Pro M3
96
+
- Hardware: Macbook Pro M3 Max
96
97
- OS: macOS 15.7
97
98
98
-
**Running the benchmark:**
99
-
100
-
For fast iteration, we setup a single `pnpm bench` command what would concurrently
99
+
The exact benchmark code is available in [our repository](https://github.com/TanStack/router/tree/main/e2e/react-start/flamegraph-bench).
101
100
102
-
- start the built server through `@platformatic/flame` to profile it
103
-
```sh
104
-
flame run ./dist/server.mjs
105
-
```
106
-
- run `autocannon` to stress the server by firing many requests at it
## Finding 1: `URL` is expensive in server hot paths
112
103
@@ -118,7 +109,7 @@ In our SSR profiles, `URL` construction/parsing showed up as significant self-ti
118
109
119
110
Use cheap predicates first, then fall back to heavyweight parsing only when needed.
120
111
121
-
- If a value is clearly internal (eg starts with `/`, `.`, `..`), don't try to parse it as an absolute URL.
112
+
- If a value is clearly internal (eg starts with `/` but not `//`, or starts with `.`), don't try to parse it as an absolute URL.
122
113
- If a feature is only needed in edge cases (eg rewrite logic), keep it off the default path.
123
114
124
115
### What we changed
@@ -128,24 +119,27 @@ Use cheap predicates first, then fall back to heavyweight parsing only when need
128
119
const url =newURL(to, base)
129
120
130
121
// After: check first, parse only if needed
131
-
if (safeInternalUrl(to)) {
122
+
if (isSafeInternal(to)) {
132
123
// fast path: internal navigation, no parsing needed
133
124
} else {
134
125
const url =newURL(to, base)
135
126
// ...external URL handling
136
127
}
137
128
```
138
129
139
-
The `safeInternalUrl` check can be orders of magnitude cheaper than constructing a `URL` object[^url-cost] as long as we're ok with some false negatives in a few cases.
130
+
The `isSafeInternal` check can be orders of magnitude cheaper than constructing a `URL` object[^url-cost] as long as we're ok with some false negatives in a few cases.
This claim should be backed by your flamegraphs and measurements, not by opinion.
136
+
Like every PR in this series, this was profiling the impacted method before and after the change. For example we can see in the example below that the `buildLocation` method went from being one of the major bottlenecks of a navigation to being a very small part of the overall cost:
146
137
147
-
-`<!-- EVIDENCE: flamegraph shows URL construction/parsing as top self-time hotspot before -->`
148
-
-`<!-- EVIDENCE: same hotspot reduced/removed after -->`
## Finding 3: server-only fast paths are worth it (when gated correctly)
187
181
@@ -195,7 +189,7 @@ If you can guard a branch with a **build-time constant** like `isServer`, you ca
195
189
- keep the general algorithm for correctness and edge cases
196
190
- allow bundlers to delete the server-only branch from client builds
197
191
198
-
In TanStack Router, `isServer` is provided via build-time resolution (client: `false`, server: `true`, dev/test: `undefined` with fallback). Modern bundlers like Vite, Rollup, and esbuild perform dead code elimination (DCE)[^dce], removing unreachable branches when the condition is a compile-time constant.
192
+
In TanStack Start, `isServer` is provided via build-time resolution (client: `false`, server: `true`, dev/test: `undefined` with fallback). Modern bundlers like Vite, Rollup, and esbuild perform dead code elimination (DCE)[^dce], removing unreachable branches when the condition is a compile-time constant.
199
193
200
194
### The transferable pattern
201
195
@@ -266,7 +260,20 @@ Benchmark: placeholder text, should link to Matteo's article.
266
260
267
261
The "before" numbers show a server under severe stress: 25% of requests failed (likely timeouts), and p90/p95 hit the 10s timeout ceiling. After the optimizations, the server handles the same load comfortably with sub-30ms tail latency and zero failures.
268
262
269
-
To be clear: TanStack Router was not broken before these changes. Under normal traffic, SSR worked fine. These numbers reflect behavior under _sustained heavy load_—the kind you see during traffic spikes or load testing. The optimizations ensure the server degrades gracefully instead of falling over.
263
+
To be clear: TanStack Start was not broken before these changes. Under normal traffic, SSR worked fine. These numbers reflect behavior under _sustained heavy load_—the kind you see during traffic spikes or load testing. The optimizations ensure the server degrades gracefully instead of falling over.
264
+
265
+
### Event-loop utilization
266
+
267
+
The following graphs show event-loop utilization against throughput for each feature-focused endpoint, before and after the optimizations. Lower utilization at the same req/s means more headroom; higher req/s at the same utilization means more capacity.
268
+
269
+
#### links-100
270
+

271
+
272
+
#### layouts-26-with-params
273
+

274
+
275
+
#### empty (baseline)
276
+

0 commit comments