Skip to content

Commit 7c34c02

Browse files
committed
feat(blog): Start SSR performance article
1 parent 943c7ed commit 7c34c02

File tree

1 file changed

+193
-0
lines changed

1 file changed

+193
-0
lines changed
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
title: "600% faster SSR: profiling and eliminating server hot paths in TanStack Router"
3+
published: 2026-02-04
4+
authors:
5+
- Manuel Schiller
6+
- Florian Pellet
7+
---
8+
9+
## Executive summary
10+
11+
We improved TanStack Router’s SSR request throughput by about **600%** (placeholder: **~16k → ~96k requests in 30s**). We did it with a repeatable process, not a single clever trick:
12+
13+
- **Measure under load**, not in microbenchmarks.
14+
- Use CPU profiling to find the highest-impact work.
15+
- Remove entire categories of cost from the server hot path:
16+
- avoid `URL` construction/parsing when it is not required
17+
- avoid reactivity work during SSR (subscriptions, structural sharing, batching)
18+
- add server-only fast paths behind a build-time `isServer` flag
19+
- avoid `delete` in performance-sensitive code
20+
21+
This article focuses on methodology and mechanisms you can reuse in any SSR framework.
22+
23+
## What we optimized (and what we did not)
24+
25+
This work started after `v1.154.4` and targets server-side rendering performance. The goal was to increase throughput and reduce server CPU time per request while keeping correctness guarantees.
26+
27+
We are not claiming that any single line of code is “the” reason. Every change was validated by:
28+
29+
- a stable load test
30+
- a CPU profile (flamegraph)
31+
- a before/after comparison on the same benchmark endpoint
32+
33+
## Methodology: feature-focused endpoints + flamegraphs
34+
35+
### Why feature-focused endpoints
36+
37+
We did not benchmark “a representative app page”. We used endpoints that exaggerate a feature so the profile is unambiguous:
38+
39+
- **`links-100`**: renders ~100 links to stress link rendering and location building.
40+
- **`layouts-26-with-params`**: deep nesting + params to stress matching and path/param work.
41+
- **`empty`**: minimal route to establish a baseline for framework overhead.
42+
43+
This is transferable: isolate the subsystem you want to improve, and benchmark that.
44+
45+
### Load generation with `autocannon`
46+
47+
We used `autocannon` to generate a 30s sustained load. We tracked:
48+
49+
- req/s
50+
- latency distribution (avg, p95, p99)
51+
52+
Example command (adjust concurrency and route):
53+
54+
```bash
55+
autocannon -d 30 -c 100 --warmup [ -d 2 -c 20 ] http://localhost:3000/bench/links-100
56+
```
57+
58+
### CPU profiling with `@platformatic/flame`
59+
60+
While the server handled load, we recorded CPU profiles using `@platformatic/flame`.
61+
62+
How we read the flamegraph:
63+
64+
- Focus on **self time** first. That is where the CPU is actually spent, not just where time is waiting on children.
65+
- Fix one hotspot, re-run, and re-profile.
66+
- Prefer changes that remove work in the steady state, not just shift it.
67+
68+
Placeholders you should replace with real screenshots:
69+
70+
- `<!-- FLAMEGRAPH: links-100 before -->`
71+
- `<!-- FLAMEGRAPH: links-100 after -->`
72+
- `<!-- FLAMEGRAPH: layouts-26-with-params before -->`
73+
- `<!-- FLAMEGRAPH: layouts-26-with-params after -->`
74+
75+
## Finding 1: `URL` is expensive in server hot paths
76+
77+
### The mechanism
78+
79+
In our SSR profiles, `URL` construction/parsing showed up as significant self-time in the hot path on link-heavy endpoints. The cost comes from doing real work (parsing/normalization) and allocating objects. When you do it once, it does not matter. When you do it per link, per request, it dominates.
80+
81+
### The transferable pattern
82+
83+
Use cheap predicates first, then fall back to heavyweight parsing only when needed.
84+
85+
- If a value is clearly internal (eg starts with `/`, `.`, `..`), don’t try to parse it as an absolute URL.
86+
- If a feature is only needed in edge cases (eg rewrite logic), keep it off the default path.
87+
88+
### How we proved it internally
89+
90+
This claim should be backed by your flamegraphs and measurements, not by opinion.
91+
92+
- `<!-- EVIDENCE: flamegraph shows URL construction/parsing as top self-time hotspot before -->`
93+
- `<!-- EVIDENCE: same hotspot reduced/removed after -->`
94+
95+
## Finding 2: SSR does not need reactivity
96+
97+
### The mechanism
98+
99+
SSR renders once per request. There is no ongoing UI to reactively update, so on the server:
100+
101+
- store subscriptions add overhead but provide no benefit
102+
- structural sharing (replace-equal) reduces re-renders, but SSR does not re-render
103+
- batching reactive notifications is irrelevant if nothing is subscribed
104+
105+
### The transferable pattern
106+
107+
If you have a runtime that supports both client reactivity and SSR, separate them:
108+
109+
- on the server: compute a snapshot and return it
110+
- on the client: subscribe and use structural sharing to reduce render churn
111+
112+
This is the difference between “server = a function” and “client = a reactive system”.
113+
114+
## Finding 3: server-only fast paths are worth it (when gated correctly)
115+
116+
### The mechanism
117+
118+
Client code cares about bundle size. Server code cares about CPU time per request. Those constraints are different.
119+
120+
If you can guard a branch with a **build-time constant** like `isServer`, you can:
121+
122+
- add server-only fast paths for common cases
123+
- keep the general algorithm for correctness and edge cases
124+
- allow bundlers to delete the server-only branch from client builds
125+
126+
In TanStack Router, `isServer` is provided via build-time resolution (client: `false`, server: `true`, dev/test: `undefined` with fallback), so dead code elimination can remove entire blocks.
127+
128+
### The transferable pattern
129+
130+
Write two implementations:
131+
132+
- **fast path** for the common case
133+
- **general path** for correctness
134+
135+
And gate them behind a build-time constant so you don’t ship server-only logic to clients.
136+
137+
## Finding 4: `delete` can be expensive
138+
139+
### The mechanism
140+
141+
Modern engines optimize property access using object “shapes” (e.g. V8 HiddenClasses / JSC Structures) and inline caches. `delete` changes an object’s shape and can force a slower internal representation (e.g. dictionary/slow properties), which can disable or degrade those optimizations and deopt optimized code.[^v8-fast-properties][^webkit-delete-ic]
142+
143+
### The transferable pattern
144+
145+
Avoid `delete` in hot paths. Prefer patterns that don’t mutate object shapes in-place:
146+
147+
- set a property to `undefined` (when semantics allow)
148+
- create a new object without the key (object rest destructuring) when you need a “key removed” shape
149+
150+
## Results (placeholders)
151+
152+
Replace the placeholders below with your final measurements and keep the raw `autocannon` output in your internal notes.
153+
154+
### Throughput (30s runs)
155+
156+
| Endpoint | Before req/30s | After req/30s | Change |
157+
| ---------------------- | -------------: | ------------: | ------: |
158+
| links-100 | **TBD** | **TBD** | **TBD** |
159+
| layouts-26-with-params | **TBD** | **TBD** | **TBD** |
160+
161+
### Latency distribution
162+
163+
| Endpoint | Variant | Avg | p95 | p99 |
164+
| ---------------------- | ------- | ------: | ------: | ------: |
165+
| links-100 | before | **TBD** | **TBD** | **TBD** |
166+
| links-100 | after | **TBD** | **TBD** | **TBD** |
167+
| layouts-26-with-params | before | **TBD** | **TBD** | **TBD** |
168+
| layouts-26-with-params | after | **TBD** | **TBD** | **TBD** |
169+
170+
### Flamegraph evidence slots
171+
172+
- `<!-- FLAMEGRAPH: links-100 before -->`
173+
- `<!-- FLAMEGRAPH: links-100 after -->`
174+
- `<!-- FLAMEGRAPH: layouts-26-with-params before -->`
175+
- `<!-- FLAMEGRAPH: layouts-26-with-params after -->`
176+
177+
## Conclusion
178+
179+
The biggest gains came from removing whole categories of work from the server hot path. The general lesson is simple: throughput improves when you eliminate repeated work, allocations, and unnecessary generality in the steady state.
180+
181+
There were many other improvements (client and server) not covered here. SSR performance work is ongoing.
182+
183+
## Fill-in checklist before publishing
184+
185+
- [ ] Replace throughput placeholders (req/30s) with final numbers.
186+
- [ ] Replace latency placeholders (avg/p95/p99) with final numbers.
187+
- [ ] Insert flamegraph screenshots and annotate the “before” hotspots and “after” removal.
188+
- [ ] Ensure every external claim has a citation and every internal claim has evidence.
189+
190+
## References
191+
192+
[^v8-fast-properties]: V8 team, “Fast properties in V8” `https://v8.dev/blog/fast-properties`
193+
[^webkit-delete-ic]: WebKit, “A Tour of Inline Caching with Delete” `https://webkit.org/blog/10298/inline-caching-delete/`

0 commit comments

Comments
 (0)