Skip to content

Commit 6e23941

Browse files
committed
progress
1 parent 3864209 commit 6e23941

File tree

6 files changed

+47
-40
lines changed

6 files changed

+47
-40
lines changed
160 KB
Loading
172 KB
Loading
516 KB
Loading
772 KB
Loading
574 KB
Loading

src/blog/tanstack-start-ssr-performance-600-percent.md

Lines changed: 47 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
---
2-
published: 2026-02-04
2+
published: 2026-02-01
33
authors:
44
- Manuel Schiller
55
- Florian Pellet
66
title: 'From 3000ms to 14ms: CPU profiling of TanStack Start SSR under heavy load'
7-
title: 'Profile, Fix, Repeat: 2x SSR Throughput in 20 PRs'
8-
title: '99.5% Latency Reduction in 20 PRs'
9-
title: '231x Latency Drop: SSR Flamegraphs under heavy load'
10-
title: '343x Faster Latency p95: Profiling SSR Hot Paths in TanStack Start'
7+
# title: 'Profile, Fix, Repeat: 2x SSR Throughput in 20 PRs'
8+
# title: '99.5% Latency Reduction in 20 PRs'
9+
# title: '231x Latency Drop: SSR Flamegraphs under heavy load'
10+
# title: '343x Faster Latency p95: Profiling SSR Hot Paths in TanStack Start'
1111
---
1212

1313
## Executive summary
1414

15-
We improved TanStack Router's SSR performance dramatically. Under sustained load (100 concurrent connections, 30 seconds):
15+
We improved TanStack Start's SSR performance dramatically. Under sustained load (100 concurrent connections, 30 seconds):
1616

1717
- **Throughput**: 477 req/s → 1,041 req/s (**2.2x**)
1818
- **Average latency**: 3,171ms → 14ms (**231x faster**)
@@ -31,7 +31,7 @@ We did it with a repeatable process, not a single clever trick:
3131
- add server-only fast paths behind a build-time `isServer` flag
3232
- avoid `delete` in performance-sensitive code
3333

34-
The changes span ~20 PRs; we highlight the highest-impact patterns below. This article focuses on methodology and mechanisms you can reuse in any SSR framework.
34+
The changes span over 20 PRs; we highlight the highest-impact patterns below.
3535

3636
## What we optimized (and what we did not)
3737

@@ -57,7 +57,7 @@ This is transferable: isolate the subsystem you want to improve, and benchmark t
5757

5858
### Load generation with `autocannon`
5959

60-
We used `autocannon` to generate a 30s sustained load. We tracked:
60+
We used [`autocannon`](https://github.com/mcollina/autocannon) to generate a 30s sustained load. We tracked:
6161

6262
- req/s
6363
- latency distribution (avg, p95, p99)
@@ -70,9 +70,12 @@ autocannon -d 30 -c 100 --warmup [ -d 2 -c 20 ] http://localhost:3000/bench/link
7070

7171
### CPU profiling with `@platformatic/flame`
7272

73-
While the server handled load, we recorded CPU profiles using `@platformatic/flame`.
73+
To record a CPU profile of the server under load, we use [`@platformatic/flame`](https://github.com/platformatic/flame) to start the server:
74+
```sh
75+
flame run ./dist/server.mjs
76+
```
7477

75-
How we read the flamegraph:
78+
The resulting flamegraph can be read with a tool like [Speedscope](https://www.speedscope.app/):
7679

7780
- Focus on **self time** first. That is where the CPU is actually spent, not just where time is waiting on children.
7881
- Fix one hotspot, re-run, and re-profile.
@@ -87,26 +90,14 @@ Placeholders you should replace with real screenshots:
8790

8891
### Reproducing these benchmarks
8992

90-
**Environment:**
91-
92-
Our benchmarks were stable enough to produce very similar results on a range of setups. However here are the exact environment details we used to run the benchmarks:
93+
Our benchmarks were stable enough to produce very similar results on a range of setups. However here are the exact environment details we used to run most of the benchmarks:
9394

9495
- Node.js: v24.12.0
95-
- Hardware: Macbook Pro M3
96+
- Hardware: Macbook Pro M3 Max
9697
- OS: macOS 15.7
9798

98-
**Running the benchmark:**
99-
100-
For fast iteration, we setup a single `pnpm bench` command what would concurrently
99+
The exact benchmark code is available in [our repository](https://github.com/TanStack/router/tree/main/e2e/react-start/flamegraph-bench).
101100

102-
- start the built server through `@platformatic/flame` to profile it
103-
```sh
104-
flame run ./dist/server.mjs
105-
```
106-
- run `autocannon` to stress the server by firing many requests at it
107-
```sh
108-
autocannon -d 30 -c 100 --warmup [ -d 2 -c 20 ] http://localhost:3000/bench/links-100
109-
```
110101

111102
## Finding 1: `URL` is expensive in server hot paths
112103

@@ -118,7 +109,7 @@ In our SSR profiles, `URL` construction/parsing showed up as significant self-ti
118109

119110
Use cheap predicates first, then fall back to heavyweight parsing only when needed.
120111

121-
- If a value is clearly internal (eg starts with `/`, `.`, `..`), don't try to parse it as an absolute URL.
112+
- If a value is clearly internal (eg starts with `/` but not `//`, or starts with `.`), don't try to parse it as an absolute URL.
122113
- If a feature is only needed in edge cases (eg rewrite logic), keep it off the default path.
123114

124115
### What we changed
@@ -128,24 +119,27 @@ Use cheap predicates first, then fall back to heavyweight parsing only when need
128119
const url = new URL(to, base)
129120

130121
// After: check first, parse only if needed
131-
if (safeInternalUrl(to)) {
122+
if (isSafeInternal(to)) {
132123
// fast path: internal navigation, no parsing needed
133124
} else {
134125
const url = new URL(to, base)
135126
// ...external URL handling
136127
}
137128
```
138129

139-
The `safeInternalUrl` check can be orders of magnitude cheaper than constructing a `URL` object[^url-cost] as long as we're ok with some false negatives in a few cases.
130+
The `isSafeInternal` check can be orders of magnitude cheaper than constructing a `URL` object[^url-cost] as long as we're ok with some false negatives in a few cases.
140131

141132
See: [#6442](https://github.com/TanStack/router/pull/6442), [#6447](https://github.com/TanStack/router/pull/6447), [#6516](https://github.com/TanStack/router/pull/6516)
142133

143134
### How we proved it internally
144135

145-
This claim should be backed by your flamegraphs and measurements, not by opinion.
136+
Like every PR in this series, this was profiling the impacted method before and after the change. For example we can see in the example below that the `buildLocation` method went from being one of the major bottlenecks of a navigation to being a very small part of the overall cost:
146137

147-
- `<!-- EVIDENCE: flamegraph shows URL construction/parsing as top self-time hotspot before -->`
148-
- `<!-- EVIDENCE: same hotspot reduced/removed after -->`
138+
139+
| | |
140+
| ------ | --------------------------------------------------------------------------------------------------------------------------------------- |
141+
| Before | ![CPU profiling of buildLocation before the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/before-build-location.png) |
142+
| After | ![CPU profiling of buildLocation after the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/after-build-location.png) |
149143

150144
## Finding 2: SSR does not need reactivity
151145

@@ -170,18 +164,18 @@ This is the difference between "server = a function" and "client = a reactive sy
170164

171165
```typescript
172166
// Before: same code path for client and server
173-
store.subscribe(() => {
174-
/* ... */
175-
}) // overhead on server
176-
const next = replaceEqualDeep(prev, value) // unnecessary structural sharing
167+
function useRouterState() {
168+
return useStore(router, { ... }) // unnecessary subscription on the server
169+
}
177170

178171
// After: server gets a simple snapshot
179-
if (isServer) {
180-
return computeSnapshot() // no subscriptions, no structural sharing
172+
function useRouterState() {
173+
if (isServer) return router.store // no subscriptions on the server
174+
return useStore(router, { ... }) // regular behavior on the client
181175
}
182176
```
183177

184-
See: [#6497](https://github.com/TanStack/router/pull/6497), [#6502](https://github.com/TanStack/router/pull/6502)
178+
See: [#6497](https://github.com/TanStack/router/pull/6497), [#6482](https://github.com/TanStack/router/pull/6482)
185179

186180
## Finding 3: server-only fast paths are worth it (when gated correctly)
187181

@@ -195,7 +189,7 @@ If you can guard a branch with a **build-time constant** like `isServer`, you ca
195189
- keep the general algorithm for correctness and edge cases
196190
- allow bundlers to delete the server-only branch from client builds
197191

198-
In TanStack Router, `isServer` is provided via build-time resolution (client: `false`, server: `true`, dev/test: `undefined` with fallback). Modern bundlers like Vite, Rollup, and esbuild perform dead code elimination (DCE)[^dce], removing unreachable branches when the condition is a compile-time constant.
192+
In TanStack Start, `isServer` is provided via build-time resolution (client: `false`, server: `true`, dev/test: `undefined` with fallback). Modern bundlers like Vite, Rollup, and esbuild perform dead code elimination (DCE)[^dce], removing unreachable branches when the condition is a compile-time constant.
199193

200194
### The transferable pattern
201195

@@ -266,7 +260,20 @@ Benchmark: placeholder text, should link to Matteo's article.
266260

267261
The "before" numbers show a server under severe stress: 25% of requests failed (likely timeouts), and p90/p95 hit the 10s timeout ceiling. After the optimizations, the server handles the same load comfortably with sub-30ms tail latency and zero failures.
268262

269-
To be clear: TanStack Router was not broken before these changes. Under normal traffic, SSR worked fine. These numbers reflect behavior under _sustained heavy load_—the kind you see during traffic spikes or load testing. The optimizations ensure the server degrades gracefully instead of falling over.
263+
To be clear: TanStack Start was not broken before these changes. Under normal traffic, SSR worked fine. These numbers reflect behavior under _sustained heavy load_—the kind you see during traffic spikes or load testing. The optimizations ensure the server degrades gracefully instead of falling over.
264+
265+
### Event-loop utilization
266+
267+
The following graphs show event-loop utilization against throughput for each feature-focused endpoint, before and after the optimizations. Lower utilization at the same req/s means more headroom; higher req/s at the same utilization means more capacity.
268+
269+
#### links-100
270+
![Event-loop utilization vs throughput for links-100, before and after](/blog-assets/tanstack-start-ssr-performance-600-percent/links-after.png)
271+
272+
#### layouts-26-with-params
273+
![Event-loop utilization vs throughput for nested routes, before and after](/blog-assets/tanstack-start-ssr-performance-600-percent/nested-after.png)
274+
275+
#### empty (baseline)
276+
![Event-loop utilization vs throughput for minimal route, before and after](/blog-assets/tanstack-start-ssr-performance-600-percent/nothing-after.png)
270277

271278
### Flamegraph evidence slots
272279

0 commit comments

Comments
 (0)