|
| 1 | +--- |
| 2 | +title: "600% faster SSR: profiling and eliminating server hot paths in TanStack Router" |
| 3 | +published: 2026-02-04 |
| 4 | +authors: |
| 5 | + - Manuel Schiller |
| 6 | + - Florian Pellet |
| 7 | +--- |
| 8 | + |
| 9 | +## Executive summary |
| 10 | + |
| 11 | +We improved TanStack Router’s SSR request throughput by about **600%** (placeholder: **~16k → ~96k requests in 30s**). We did it with a repeatable process, not a single clever trick: |
| 12 | + |
| 13 | +- **Measure under load**, not in microbenchmarks. |
| 14 | +- Use CPU profiling to find the highest-impact work. |
| 15 | +- Remove entire categories of cost from the server hot path: |
| 16 | + - avoid `URL` construction/parsing when it is not required |
| 17 | + - avoid reactivity work during SSR (subscriptions, structural sharing, batching) |
| 18 | + - add server-only fast paths behind a build-time `isServer` flag |
| 19 | + - avoid `delete` in performance-sensitive code |
| 20 | + |
| 21 | +This article focuses on methodology and mechanisms you can reuse in any SSR framework. |
| 22 | + |
| 23 | +## What we optimized (and what we did not) |
| 24 | + |
| 25 | +This work started after `v1.154.4` and targets server-side rendering performance. The goal was to increase throughput and reduce server CPU time per request while keeping correctness guarantees. |
| 26 | + |
| 27 | +We are not claiming that any single line of code is “the” reason. Every change was validated by: |
| 28 | + |
| 29 | +- a stable load test |
| 30 | +- a CPU profile (flamegraph) |
| 31 | +- a before/after comparison on the same benchmark endpoint |
| 32 | + |
| 33 | +## Methodology: feature-focused endpoints + flamegraphs |
| 34 | + |
| 35 | +### Why feature-focused endpoints |
| 36 | + |
| 37 | +We did not benchmark “a representative app page”. We used endpoints that exaggerate a feature so the profile is unambiguous: |
| 38 | + |
| 39 | +- **`links-100`**: renders ~100 links to stress link rendering and location building. |
| 40 | +- **`layouts-26-with-params`**: deep nesting + params to stress matching and path/param work. |
| 41 | +- **`empty`**: minimal route to establish a baseline for framework overhead. |
| 42 | + |
| 43 | +This is transferable: isolate the subsystem you want to improve, and benchmark that. |
| 44 | + |
| 45 | +### Load generation with `autocannon` |
| 46 | + |
| 47 | +We used `autocannon` to generate a 30s sustained load. We tracked: |
| 48 | + |
| 49 | +- req/s |
| 50 | +- latency distribution (avg, p95, p99) |
| 51 | + |
| 52 | +Example command (adjust concurrency and route): |
| 53 | + |
| 54 | +```bash |
| 55 | +autocannon -d 30 -c 100 --warmup [ -d 2 -c 20 ] http://localhost:3000/bench/links-100 |
| 56 | +``` |
| 57 | + |
| 58 | +### CPU profiling with `@platformatic/flame` |
| 59 | + |
| 60 | +While the server handled load, we recorded CPU profiles using `@platformatic/flame`. |
| 61 | + |
| 62 | +How we read the flamegraph: |
| 63 | + |
| 64 | +- Focus on **self time** first. That is where the CPU is actually spent, not just where time is waiting on children. |
| 65 | +- Fix one hotspot, re-run, and re-profile. |
| 66 | +- Prefer changes that remove work in the steady state, not just shift it. |
| 67 | + |
| 68 | +Placeholders you should replace with real screenshots: |
| 69 | + |
| 70 | +- `<!-- FLAMEGRAPH: links-100 before -->` |
| 71 | +- `<!-- FLAMEGRAPH: links-100 after -->` |
| 72 | +- `<!-- FLAMEGRAPH: layouts-26-with-params before -->` |
| 73 | +- `<!-- FLAMEGRAPH: layouts-26-with-params after -->` |
| 74 | + |
| 75 | +## Finding 1: `URL` is expensive in server hot paths |
| 76 | + |
| 77 | +### The mechanism |
| 78 | + |
| 79 | +In our SSR profiles, `URL` construction/parsing showed up as significant self-time in the hot path on link-heavy endpoints. The cost comes from doing real work (parsing/normalization) and allocating objects. When you do it once, it does not matter. When you do it per link, per request, it dominates. |
| 80 | + |
| 81 | +### The transferable pattern |
| 82 | + |
| 83 | +Use cheap predicates first, then fall back to heavyweight parsing only when needed. |
| 84 | + |
| 85 | +- If a value is clearly internal (eg starts with `/`, `.`, `..`), don’t try to parse it as an absolute URL. |
| 86 | +- If a feature is only needed in edge cases (eg rewrite logic), keep it off the default path. |
| 87 | + |
| 88 | +### How we proved it internally |
| 89 | + |
| 90 | +This claim should be backed by your flamegraphs and measurements, not by opinion. |
| 91 | + |
| 92 | +- `<!-- EVIDENCE: flamegraph shows URL construction/parsing as top self-time hotspot before -->` |
| 93 | +- `<!-- EVIDENCE: same hotspot reduced/removed after -->` |
| 94 | + |
| 95 | +## Finding 2: SSR does not need reactivity |
| 96 | + |
| 97 | +### The mechanism |
| 98 | + |
| 99 | +SSR renders once per request. There is no ongoing UI to reactively update, so on the server: |
| 100 | + |
| 101 | +- store subscriptions add overhead but provide no benefit |
| 102 | +- structural sharing (replace-equal) reduces re-renders, but SSR does not re-render |
| 103 | +- batching reactive notifications is irrelevant if nothing is subscribed |
| 104 | + |
| 105 | +### The transferable pattern |
| 106 | + |
| 107 | +If you have a runtime that supports both client reactivity and SSR, separate them: |
| 108 | + |
| 109 | +- on the server: compute a snapshot and return it |
| 110 | +- on the client: subscribe and use structural sharing to reduce render churn |
| 111 | + |
| 112 | +This is the difference between “server = a function” and “client = a reactive system”. |
| 113 | + |
| 114 | +## Finding 3: server-only fast paths are worth it (when gated correctly) |
| 115 | + |
| 116 | +### The mechanism |
| 117 | + |
| 118 | +Client code cares about bundle size. Server code cares about CPU time per request. Those constraints are different. |
| 119 | + |
| 120 | +If you can guard a branch with a **build-time constant** like `isServer`, you can: |
| 121 | + |
| 122 | +- add server-only fast paths for common cases |
| 123 | +- keep the general algorithm for correctness and edge cases |
| 124 | +- allow bundlers to delete the server-only branch from client builds |
| 125 | + |
| 126 | +In TanStack Router, `isServer` is provided via build-time resolution (client: `false`, server: `true`, dev/test: `undefined` with fallback), so dead code elimination can remove entire blocks. |
| 127 | + |
| 128 | +### The transferable pattern |
| 129 | + |
| 130 | +Write two implementations: |
| 131 | + |
| 132 | +- **fast path** for the common case |
| 133 | +- **general path** for correctness |
| 134 | + |
| 135 | +And gate them behind a build-time constant so you don’t ship server-only logic to clients. |
| 136 | + |
| 137 | +## Finding 4: `delete` can be expensive |
| 138 | + |
| 139 | +### The mechanism |
| 140 | + |
| 141 | +Modern engines optimize property access using object “shapes” (e.g. V8 HiddenClasses / JSC Structures) and inline caches. `delete` changes an object’s shape and can force a slower internal representation (e.g. dictionary/slow properties), which can disable or degrade those optimizations and deopt optimized code.[^v8-fast-properties][^webkit-delete-ic] |
| 142 | + |
| 143 | +### The transferable pattern |
| 144 | + |
| 145 | +Avoid `delete` in hot paths. Prefer patterns that don’t mutate object shapes in-place: |
| 146 | + |
| 147 | +- set a property to `undefined` (when semantics allow) |
| 148 | +- create a new object without the key (object rest destructuring) when you need a “key removed” shape |
| 149 | + |
| 150 | +## Results (placeholders) |
| 151 | + |
| 152 | +Replace the placeholders below with your final measurements and keep the raw `autocannon` output in your internal notes. |
| 153 | + |
| 154 | +### Throughput (30s runs) |
| 155 | + |
| 156 | +| Endpoint | Before req/30s | After req/30s | Change | |
| 157 | +| ---------------------- | -------------: | ------------: | ------: | |
| 158 | +| links-100 | **TBD** | **TBD** | **TBD** | |
| 159 | +| layouts-26-with-params | **TBD** | **TBD** | **TBD** | |
| 160 | + |
| 161 | +### Latency distribution |
| 162 | + |
| 163 | +| Endpoint | Variant | Avg | p95 | p99 | |
| 164 | +| ---------------------- | ------- | ------: | ------: | ------: | |
| 165 | +| links-100 | before | **TBD** | **TBD** | **TBD** | |
| 166 | +| links-100 | after | **TBD** | **TBD** | **TBD** | |
| 167 | +| layouts-26-with-params | before | **TBD** | **TBD** | **TBD** | |
| 168 | +| layouts-26-with-params | after | **TBD** | **TBD** | **TBD** | |
| 169 | + |
| 170 | +### Flamegraph evidence slots |
| 171 | + |
| 172 | +- `<!-- FLAMEGRAPH: links-100 before -->` |
| 173 | +- `<!-- FLAMEGRAPH: links-100 after -->` |
| 174 | +- `<!-- FLAMEGRAPH: layouts-26-with-params before -->` |
| 175 | +- `<!-- FLAMEGRAPH: layouts-26-with-params after -->` |
| 176 | + |
| 177 | +## Conclusion |
| 178 | + |
| 179 | +The biggest gains came from removing whole categories of work from the server hot path. The general lesson is simple: throughput improves when you eliminate repeated work, allocations, and unnecessary generality in the steady state. |
| 180 | + |
| 181 | +There were many other improvements (client and server) not covered here. SSR performance work is ongoing. |
| 182 | + |
| 183 | +## Fill-in checklist before publishing |
| 184 | + |
| 185 | +- [ ] Replace throughput placeholders (req/30s) with final numbers. |
| 186 | +- [ ] Replace latency placeholders (avg/p95/p99) with final numbers. |
| 187 | +- [ ] Insert flamegraph screenshots and annotate the “before” hotspots and “after” removal. |
| 188 | +- [ ] Ensure every external claim has a citation and every internal claim has evidence. |
| 189 | + |
| 190 | +## References |
| 191 | + |
| 192 | +[^v8-fast-properties]: V8 team, “Fast properties in V8” `https://v8.dev/blog/fast-properties` |
| 193 | +[^webkit-delete-ic]: WebKit, “A Tour of Inline Caching with Delete” `https://webkit.org/blog/10298/inline-caching-delete/` |
0 commit comments