You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(scaling-dive): add N=3 measured numbers for lever 9 (#7775)
Replace the "score pending" placeholder under lever 9 with the
actual numbers from runs 25957107195/25957108328/25957109418
(perf branch) vs 25954537767/25954538807/25954540108 (develop),
both at authors=100..500:step=50:dwell=8s:warmup=2s.
Result: consistent -1.4% to -5.3% CPU reduction across all 9 steps,
matching profile direction at 2-5% (vs 6% profile-attributed upper
bound). Latency delta sits inside the noise envelope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/scaling-dive-2026-05.md
+14-2Lines changed: 14 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -228,7 +228,19 @@ The chain is `checkAccess → SessionManager.findAuthorID → getSessionInfo thr
228
228
229
229
**Fix (#7775):** add a non-throwing private `getSessionInfoOrNull` helper, route the two internal callers (`findAuthorID`, `listSessionsWithDBKey`) at it, and keep `exports.getSessionInfo` as a thin wrapper that preserves the throw for HTTP API compatibility (the API translates the thrown `apierror` to `code: 1`). All 32 cases in `tests/backend/specs/api/sessionsAndGroups.ts` pass, including "getSessionInfo of deleted session" which still expects `code: 1`.
230
230
231
-
**Expected impact:**~6% of total process CPU at the cliff. Score pending a dive sweep against the merged branch.
231
+
**Measured impact (N=3 medians, perf branch vs develop, same `authors=100..500:step=50:dwell=8s:warmup=2s` sweep, perf runs 25957107195/25957108328/25957109418 vs develop runs 25954537767/25954538807/25954540108):**
232
+
233
+
| step | dev CPU% | perf CPU% | ΔCPU% | dev p95 | perf p95 |
234
+
|---:|---:|---:|---:|---:|---:|
235
+
| 100 | 4.76 | 4.67 | -1.7% | 38 | 38 |
236
+
| 200 | 15.21 | 14.60 | -4.0% | 37 | 41 |
237
+
| 300 | 30.46 | 29.68 | -2.6% | 45 | 45 |
238
+
| 350 | 41.58 | 39.36 |**-5.3%**| 39 | 74 |
239
+
| 400 | 56.26 | 54.23 | -3.6% | 2275 | 2089 |
240
+
| 450 | 72.33 | 70.49 | -2.5% | 6167 | 5891 |
241
+
| 500 | 88.38 | 87.14 | -1.4% | 11759 | 11391 |
242
+
243
+
**ΔCPU% is consistently negative (-1.4% to -5.3%) across all 9 steps** — the direction matches the profile prediction. The realised magnitude (2-5%) is below the profile-attributed 6% upper bound because some of the log4js cost the profile attributed to the throw path was unrelated startup/info logging. Latency impact is mostly inside the noise envelope; step 350 looks regressive at the median but the raw triples (dev [39,39,122] vs perf [73,74,124]) overlap heavily with one outlier each.
232
244
233
245
### Other CPU hotspots surfaced (not yet acted on)
234
246
@@ -263,7 +275,7 @@ Mechanism: deferred flush gives more packets per WS frame → fewer per-frame sy
263
275
264
276
**Merge in priority order:**
265
277
266
-
1.**[#7775](https://github.com/ether/etherpad/pull/7775)** — SessionManager throw-as-control-flow fix. CPU-profile-identified ~6% process CPU win at the cliff. No public-API behavior change; passes existing API test suite. Mechanical and low-risk.
278
+
1.**[#7775](https://github.com/ether/etherpad/pull/7775)** — SessionManager throw-as-control-flow fix. N=3 measured 2-5% CPU% reduction across the cliff sweep (profile-predicted 6% upper bound). No public-API behavior change; passes existing API test suite. Mechanical and low-risk.
267
279
2.**[#7774](https://github.com/ether/etherpad/pull/7774)** — engine.io socket flush deferral. The only PR in this program with N=3-confirmed measurable perf improvement at the time it was opened (tighter tail at step 300-350). Wire-compatible, server-side only.
268
280
3.**[#7768](https://github.com/ether/etherpad/pull/7768)** — per-socket fan-out serialization + NEW_CHANGES_BATCH. No measurable perf benefit in N=3 testing — recommend merging for the **correctness fix** (the original code was racy under concurrent commits and could lose revisions on emit error). NEW_CHANGES_BATCH framing is dormant at steady-state and fires under server slowness as forward-compat groundwork.
269
281
4.**[#7762](https://github.com/ether/etherpad/pull/7762)** — Prometheus metrics. Already merged; instrument for any further dive.
0 commit comments