Skip to content

Commit 59e7b29

Browse files
committed
update
1 parent c1254b4 commit 59e7b29

File tree

1 file changed

+125
-37
lines changed

1 file changed

+125
-37
lines changed

src/blog/tanstack-start-ssr-performance-600-percent.md

Lines changed: 125 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,20 @@
11
---
2-
title: '600% faster SSR: profiling and eliminating server hot paths in TanStack Router'
3-
published: 2026-02-04
4-
authors:
5-
- Manuel Schiller
6-
- Florian Pellet
2+
id: ssr-performance-600-percent
3+
title: "From 3000ms to 14ms: profiling hot paths and eliminating bottlenecks in TanStack Start"
74
---
85

96
## Executive summary
107

11-
We improved TanStack Router’s SSR request throughput by about **600%** (placeholder: **~16k → ~96k requests in 30s**). We did it with a repeatable process, not a single clever trick:
8+
We improved TanStack Router's SSR performance dramatically. Under sustained load:
9+
10+
- **Throughput**: 477 req/s → 1,041 req/s (**2.2x**)
11+
- **Average latency**: 3,171ms → 14ms (**231x faster**)
12+
- **p95 latency**: 10,001ms (timeout) → 29ms (**343x faster**)
13+
- **Success rate**: 75% → 100% (the server stopped failing under load)
14+
15+
For SSR-heavy deployments, this translates directly to lower hosting costs, the ability to handle traffic spikes without scaling, and eliminating user-facing errors.
16+
17+
We did it with a repeatable process, not a single clever trick:
1218

1319
- **Measure under load**, not in microbenchmarks.
1420
- Use CPU profiling to find the highest-impact work.
@@ -18,13 +24,13 @@ We improved TanStack Router’s SSR request throughput by about **600%** (placeh
1824
- add server-only fast paths behind a build-time `isServer` flag
1925
- avoid `delete` in performance-sensitive code
2026

21-
This article focuses on methodology and mechanisms you can reuse in any SSR framework.
27+
The changes span ~20 PRs; we highlight the highest-impact patterns below. This article focuses on methodology and mechanisms you can reuse in any SSR framework.
2228

2329
## What we optimized (and what we did not)
2430

2531
This work started after `v1.154.4` and targets server-side rendering performance. The goal was to increase throughput and reduce server CPU time per request while keeping correctness guarantees.
2632

27-
We are not claiming that any single line of code is the reason. Every change was validated by:
33+
We are not claiming that any single line of code is "the" reason. This work spanned over 20 PRs, with still more to come. And every change was validated by:
2834

2935
- a stable load test
3036
- a CPU profile (flamegraph)
@@ -34,7 +40,7 @@ We are not claiming that any single line of code is “the” reason. Every chan
3440

3541
### Why feature-focused endpoints
3642

37-
We did not benchmark a representative app page. We used endpoints that exaggerate a feature so the profile is unambiguous:
43+
We did not benchmark "a representative app page". We used endpoints that exaggerate a feature so the profile is unambiguous:
3844

3945
- **`links-100`**: renders ~100 links to stress link rendering and location building.
4046
- **`layouts-26-with-params`**: deep nesting + params to stress matching and path/param work.
@@ -72,6 +78,27 @@ Placeholders you should replace with real screenshots:
7278
- `<!-- FLAMEGRAPH: layouts-26-with-params before -->`
7379
- `<!-- FLAMEGRAPH: layouts-26-with-params after -->`
7480

81+
### Reproducing these benchmarks
82+
83+
**Environment:**
84+
85+
Our benchmarks were stable enough to produce very similar results on a range of setups. However here are the exact environment details we used to run the benchmarks:
86+
- Node.js: v24.12.0
87+
- Hardware: Macbook Pro M3
88+
- OS: macOS 15.7
89+
90+
**Running the benchmark:**
91+
92+
For fast iteration, we setup a single `pnpm bench` command what would concurrently
93+
- start the built server through `@platformatic/flame` to profile it
94+
```sh
95+
flame run ./dist/server.mjs
96+
```
97+
- run `autocannon` to stress the server by firing many requests at it
98+
```sh
99+
autocannon -d 30 -c 100 --warmup [ -d 2 -c 20 ] http://localhost:3000/bench/links-100
100+
```
101+
75102
## Finding 1: `URL` is expensive in server hot paths
76103

77104
### The mechanism
@@ -82,9 +109,26 @@ In our SSR profiles, `URL` construction/parsing showed up as significant self-ti
82109

83110
Use cheap predicates first, then fall back to heavyweight parsing only when needed.
84111

85-
- If a value is clearly internal (eg starts with `/`, `.`, `..`), dont try to parse it as an absolute URL.
112+
- If a value is clearly internal (eg starts with `/`, `.`, `..`), don't try to parse it as an absolute URL.
86113
- If a feature is only needed in edge cases (eg rewrite logic), keep it off the default path.
87114

115+
### What we changed
116+
117+
```typescript
118+
// Before: always parse
119+
const url = new URL(to, base)
120+
121+
// After: check first, parse only if needed
122+
if (isAbsoluteUrl(to)) {
123+
const url = new URL(to, base)
124+
// ...external URL handling
125+
} else {
126+
// fast path: internal navigation, no parsing needed
127+
}
128+
```
129+
130+
See: [#6442](https://github.com/TanStack/router/pull/6442), [#6447](https://github.com/TanStack/router/pull/6447), [#6516](https://github.com/TanStack/router/pull/6516)
131+
88132
### How we proved it internally
89133

90134
This claim should be backed by your flamegraphs and measurements, not by opinion.
@@ -96,10 +140,10 @@ This claim should be backed by your flamegraphs and measurements, not by opinion
96140

97141
### The mechanism
98142

99-
SSR renders once per request. There is no ongoing UI to reactively update, so on the server:
143+
SSR renders once per request.[^ssr-streaming] There is no ongoing UI to reactively update, so on the server:
100144

101145
- store subscriptions add overhead but provide no benefit
102-
- structural sharing (replace-equal) reduces re-renders, but SSR does not re-render
146+
- structural sharing[^structural-sharing] (replace-equal) reduces re-renders, but SSR does not re-render
103147
- batching reactive notifications is irrelevant if nothing is subscribed
104148

105149
### The transferable pattern
@@ -109,7 +153,22 @@ If you have a runtime that supports both client reactivity and SSR, separate the
109153
- on the server: compute a snapshot and return it
110154
- on the client: subscribe and use structural sharing to reduce render churn
111155

112-
This is the difference between “server = a function” and “client = a reactive system”.
156+
This is the difference between "server = a function" and "client = a reactive system".
157+
158+
### What we changed
159+
160+
```typescript
161+
// Before: same code path for client and server
162+
store.subscribe(() => { /* ... */ }) // overhead on server
163+
const next = replaceEqualDeep(prev, value) // unnecessary structural sharing
164+
165+
// After: server gets a simple snapshot
166+
if (isServer) {
167+
return computeSnapshot() // no subscriptions, no structural sharing
168+
}
169+
```
170+
171+
See: [#6497](https://github.com/TanStack/router/pull/6497), [#6502](https://github.com/TanStack/router/pull/6502)
113172

114173
## Finding 3: server-only fast paths are worth it (when gated correctly)
115174

@@ -132,40 +191,67 @@ Write two implementations:
132191
- **fast path** for the common case
133192
- **general path** for correctness
134193

135-
And gate them behind a build-time constant so you don’t ship server-only logic to clients.
194+
And gate them behind a build-time constant so you don't ship server-only logic to clients.
195+
196+
### What we changed
197+
198+
```typescript
199+
// isServer is resolved at build time:
200+
// - Vite/bundler replaces it with `true` (server) or `false` (client)
201+
// - Dead code elimination removes the unused branch
202+
203+
if (isServer) {
204+
// server-only fast path (removed from client bundle)
205+
return fastServerPath(input)
206+
}
207+
// general algorithm (used on client, fallback on server in dev)
208+
return generalPath(input)
209+
```
210+
211+
See: [#4648](https://github.com/TanStack/router/pull/4648), [#6505](https://github.com/TanStack/router/pull/6505), [#6506](https://github.com/TanStack/router/pull/6506)
136212

137213
## Finding 4: `delete` can be expensive
138214

139215
### The mechanism
140216

141-
Modern engines optimize property access using object shapes (e.g. V8 HiddenClasses / JSC Structures) and inline caches. `delete` changes an objects shape and can force a slower internal representation (e.g. dictionary/slow properties), which can disable or degrade those optimizations and deopt optimized code.[^v8-fast-properties][^webkit-delete-ic]
217+
Modern engines optimize property access using object "shapes" (e.g. V8 HiddenClasses / JSC Structures) and inline caches. `delete` changes an object's shape and can force a slower internal representation (e.g. dictionary/slow properties), which can disable or degrade those optimizations and deopt optimized code.[^v8-fast-properties][^webkit-delete-ic]
142218

143219
### The transferable pattern
144220

145-
Avoid `delete` in hot paths. Prefer patterns that dont mutate object shapes in-place:
221+
Avoid `delete` in hot paths. Prefer patterns that don't mutate object shapes in-place:
146222

147223
- set a property to `undefined` (when semantics allow)
148-
- create a new object without the key (object rest destructuring) when you need a key removed shape
224+
- create a new object without the key (object rest destructuring) when you need a "key removed" shape
149225

150-
## Results (placeholders)
226+
### What we changed
151227

152-
Replace the placeholders below with your final measurements and keep the raw `autocannon` output in your internal notes.
228+
```typescript
229+
// Before: mutates shape
230+
delete linkProps.activeProps
231+
delete linkProps.inactiveProps
232+
233+
// After: create new object without keys
234+
const { activeProps, inactiveProps, ...rest } = linkProps
235+
return rest
236+
```
153237

154-
### Throughput (30s runs)
238+
See: [#6456](https://github.com/TanStack/router/pull/6456), [#6515](https://github.com/TanStack/router/pull/6515)
155239

156-
| Endpoint | Before req/30s | After req/30s | Change |
157-
| ---------------------- | -------------: | ------------: | ------: |
158-
| links-100 | **TBD** | **TBD** | **TBD** |
159-
| layouts-26-with-params | **TBD** | **TBD** | **TBD** |
240+
## Results
160241

161-
### Latency distribution
242+
Benchmark: placeholder text, should link to Matteo's article.
162243

163-
| Endpoint | Variant | Avg | p95 | p99 |
164-
| ---------------------- | ------- | ------: | ------: | ------: |
165-
| links-100 | before | **TBD** | **TBD** | **TBD** |
166-
| links-100 | after | **TBD** | **TBD** | **TBD** |
167-
| layouts-26-with-params | before | **TBD** | **TBD** | **TBD** |
168-
| layouts-26-with-params | after | **TBD** | **TBD** | **TBD** |
244+
### Summary
245+
246+
| Metric | Before | After | Improvement |
247+
| ------------ | --------: | ---------: | ------------ |
248+
| Success Rate | 75.52% | 100% | +32% |
249+
| Throughput | 477 req/s | 1041 req/s | +118% (2.2x) |
250+
| Avg Response | 3,171ms | 13.7ms | 231x faster |
251+
| p(90) | 10,001ms | 23.0ms | 435x faster |
252+
| p(95) | 10,001ms | 29.1ms | 343x faster |
253+
254+
The "before" numbers show a server under severe stress: 25% of requests failed (likely timeouts), and p90/p95 hit the 10s timeout ceiling. After the optimizations, the server handles the same load comfortably with sub-30ms tail latency and zero failures.
169255

170256
### Flamegraph evidence slots
171257

@@ -182,13 +268,15 @@ There were many other improvements (client and server) not covered here. SSR per
182268

183269
## Fill-in checklist before publishing
184270

185-
- [ ] Replace throughput placeholders (req/30s) with final numbers.
186-
- [ ] Replace latency placeholders (avg/p95/p99) with final numbers.
187-
- [ ] Insert flamegraph screenshots and annotate the before hotspots and after removal.
271+
- [x] Replace throughput placeholders with final numbers.
272+
- [x] Replace latency placeholders (avg/p90/p95) with final numbers.
273+
- [ ] Insert flamegraph screenshots and annotate the "before" hotspots and "after" removal.
188274
- [ ] Ensure every external claim has a citation and every internal claim has evidence.
275+
- [ ] Add `layouts-26-with-params` benchmark results (if desired).
189276

190277
## References
191278

192-
[^v8-fast-properties]: V8 team, “Fast properties in V8” `https://v8.dev/blog/fast-properties`
193-
194-
[^webkit-delete-ic]: WebKit, “A Tour of Inline Caching with Delete” `https://webkit.org/blog/10298/inline-caching-delete/`
279+
[^v8-fast-properties]: V8 team, "Fast properties in V8" `https://v8.dev/blog/fast-properties`
280+
[^webkit-delete-ic]: WebKit, "A Tour of Inline Caching with Delete" `https://webkit.org/blog/10298/inline-caching-delete/`
281+
[^structural-sharing]: Structural sharing is a pattern from immutable data libraries (Immer, React Query, TanStack Store) where unchanged portions of data structures are reused by reference to minimize allocation and enable cheap equality checks.
282+
[^ssr-streaming]: With streaming SSR and Suspense, the server may render multiple chunks, but each chunk is still a single-pass render with no reactive updates.

0 commit comments

Comments
 (0)