Skip to content

Commit d018003

Browse files
JohnMcLearclaude
andauthored
fix: page sessionstorage cleanup to avoid OOM (#7830) (#7831)
* fix: page sessionstorage cleanup to avoid OOM (#7830) SessionStore._cleanup() previously called `findKeys('sessionstorage:*', null)`, materialising every session key into a single array. On decade- old MariaDB installs with millions of sessions this OOMs the node process within ~15 minutes — see #7830. Switch to ueberdb2 6.1.0's findKeysPaged with a 500-key page size, and yield to the event loop between pages so the DB driver can release each page's buffered rows and request handlers can interleave. The break is now driven by `page.length === 0` rather than `page.length < CLEANUP_PAGE_SIZE` so a stubbed/throttled paged source still iterates the full keyspace. Adds a regression test that seeds 50 sessionstorage rows, monkey-patches `DB.findKeysPaged` to use a 4-key page, runs cleanup, and asserts every expired row is removed plus every valid row preserved across page boundaries. Closes #7830 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Qodo review on #7831 Four follow-ups raised by Qodo on the session cleanup paging fix: - DB.ts: fail-fast at init() if any required wrapper method (incl. findKeysPaged) is missing, so a stale ueberdb2 pin surfaces at boot rather than crashing the first cleanup run an hour later. - SessionStore: bound a single _cleanup() run to 10 minutes. Under sustained session creation the keyspace can grow faster than cleanup drains it; without a budget the next scheduled run would never fire. When the budget hits, log a warning and let the next run continue. - SessionStore: log the defensive `page[0] <= after` cursor-stall break. Previously the loop exited silently, leaving expired rows behind with no operator-visible signal of the backend regression. - Tests: the paged-cleanup regression test now removes both expiredSids AND validSids in finally, so a failed assertion doesn't leak rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: note paged session cleanup in CHANGELOG + settings template CHANGELOG.md picks up an entry under 3.1.0 Notable fixes describing the OOM cause, the paged iteration, the 10-minute per-run budget, the cursor-stall logging, and the fail-fast init guard. settings.json.template's sessionCleanup comment adds the page-size, budget, and pointer to #7830 so admins can reason about the new behaviour from the template alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate lockfile against ueberdb2 6.1.2 Now that ether/ueberDB#983 unblocked the publish workflow (OIDC trusted publishing), ueberdb2 6.1.2 is live on npm and the `^6.1.0` pin in src/package.json resolves cleanly. Resolves the ERR_PNPM_OUTDATED_LOCKFILE that was blocking CI on this PR. 29 SessionStore backend tests still green against the published tarball. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 0d40f2d commit d018003

7 files changed

Lines changed: 204 additions & 24 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717

1818
- **Export HTML — ordered-list counter no longer poisoned by a sibling unordered list.** When an ordered-list level was the only consumer of `olItemCounts`, closing *any* list at that depth (including a `<ul>` that happened to share the level) reset the counter to 0. A subsequent unrelated `<ol>` at the same depth then took the "counter exists but is 0" branch and emitted `<ol class="...">` without the `start=` attribute. The reset is now gated on `line.listTypeName === 'number'` so closing an unordered list never touches the ol bookkeeping. Fixes #7786 / #7787 (#7791).
1919
- **Export — bad `:rev` returns a meaningful 500 body, not Express's HTML error page.** A non-numeric `:rev` (e.g. `/p/foo/test1/export/txt`) reached `checkValidRev` which throws `CustomError('rev is not a number', 'apierror')`; the message fell through `.catch(next)` and Express's default renderer returned an HTML 500 page. The route handler now catches the apierror and emits `err.message` as a deterministic `text/plain` 500. As a follow-up, `checkValidRev` runs *before* `res.attachment()` so an invalid rev no longer leaves a `Content-Disposition` header in place (browsers were offering to save the error message as a file), and unrelated export failures (conversion, fs, soffice) are surfaced as text/plain rather than the HTML stack page. Fixes #7788 (#7792).
20+
- **Session cleanup no longer OOMs on huge sessionstorage tables.** Pre-2.7.3 `SessionStore._cleanup()` issued a single unbounded `findKeys('sessionstorage:*', null)` that materialised every key into one JS array; on a decade-old MariaDB install with millions of stale sessions the mysql2 driver retained the rows on the pool connection while the JS array dominated heap, OOMing the process within ~15 minutes of boot. Cleanup now pages the keyspace in 500-key batches via the new `findKeysPaged` API on ueberdb2 6.1.0 (DB-side ranged query on mysql/postgres, JS-side fallback elsewhere), yielding to the event loop between pages. A single run is capped at 10 minutes; the next scheduled run continues. The defensive cursor-stall guard now logs an error rather than silently aborting, and `DB.init()` fails fast if any required wrapper method is missing (a misconfigured ueberdb2 pin surfaces at boot instead of an hour later). Fixes #7830 (#7831).
2021

2122
### Security hardening
2223

pnpm-lock.yaml

Lines changed: 66 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

settings.json.template

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -634,6 +634,12 @@
634634
/*
635635
* Whether to periodically clean up expired and stale sessions from the
636636
* database. Set to false to disable. Default: true.
637+
*
638+
* Cleanup runs hourly and walks the sessionstorage:* keyspace in 500-key
639+
* pages so very large session tables don't pin the keys in memory all at
640+
* once (see https://github.com/ether/etherpad/issues/7830). A single run
641+
* is capped at 10 minutes; if your DB is so large the budget is reached,
642+
* a warning is logged and the next scheduled run continues.
637643
*/
638644
"sessionCleanup": true
639645
},

src/node/db/DB.ts

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,13 @@ exports.init = async () => {
4747
stats.gauge(`ueberdb_${metric}`, () => exports.db.metrics[metric]);
4848
}
4949
}
50-
for (const fn of ['get', 'set', 'findKeys', 'getSub', 'setSub', 'remove']) {
50+
for (const fn of ['get', 'set', 'findKeys', 'findKeysPaged', 'getSub', 'setSub', 'remove']) {
5151
const f = exports.db[fn];
52+
if (typeof f !== 'function') {
53+
throw new Error(
54+
`ueberdb2 ${exports.db.constructor.name} is missing required method ${fn}; ` +
55+
'check that ueberdb2 is at the minimum version pinned in package.json');
56+
}
5257
exports[fn] = async (...args:string[]) => await f.call(exports.db, ...args);
5358
Object.setPrototypeOf(exports[fn], Object.getPrototypeOf(f));
5459
Object.defineProperties(exports[fn], Object.getOwnPropertyDescriptors(f));

src/node/db/SessionStore.ts

Lines changed: 67 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,19 @@ const logger = log4js.getLogger('SessionStore');
1212
// How often to run the cleanup of expired/stale sessions.
1313
const CLEANUP_INTERVAL_MS = 60 * 60 * 1000; // 1 hour
1414

15+
// Maximum number of session keys fetched from the database per cleanup
16+
// iteration. Bounded so that instances with very large session keyspaces (see
17+
// https://github.com/ether/etherpad/issues/7830) don't load every key into
18+
// memory at once. Tuned for ~50 KB per page assuming ~100-char keys.
19+
const CLEANUP_PAGE_SIZE = 500;
20+
21+
// Upper bound on a single cleanup run. Under sustained session creation the
22+
// keyspace can grow faster than cleanup processes it; without a budget the
23+
// loop would never reach an empty page and the next scheduled run would never
24+
// fire. When the budget hits, the next scheduled run picks up where this one
25+
// left off (the database state advances each iteration regardless).
26+
const CLEANUP_MAX_RUNTIME_MS = 10 * 60 * 1000; // 10 minutes
27+
1528
class SessionStore extends expressSession.Store {
1629
/**
1730
* @param {?number} [refresh] - How often (in milliseconds) `touch()` will update a session's
@@ -74,37 +87,71 @@ class SessionStore extends expressSession.Store {
7487
* - Sessions with no expiry that contain no data beyond the default cookie are removed.
7588
* These are the empty sessions that accumulate indefinitely (bug #5010) — they have
7689
* `{cookie: {path: "/", _expires: null, ...}}` and nothing else.
90+
*
91+
* Iterates the keyspace in fixed-size pages (CLEANUP_PAGE_SIZE) so a large
92+
* sessionstorage table (#7830) doesn't load every key into memory at once.
7793
*/
7894
async _cleanup() {
79-
const keys = await DB.findKeys('sessionstorage:*', null);
80-
if (!keys || keys.length === 0) return;
8195
const now = Date.now();
96+
const startMs = Date.now();
8297
let removed = 0;
83-
for (const key of keys) {
84-
const sess = await DB.get(key);
85-
if (!sess) {
86-
await DB.remove(key);
87-
removed++;
88-
continue;
98+
let scanned = 0;
99+
let after: string | undefined;
100+
let budgetExhausted = false;
101+
// eslint-disable-next-line no-constant-condition
102+
while (true) {
103+
const page = await DB.findKeysPaged('sessionstorage:*', null, {
104+
limit: CLEANUP_PAGE_SIZE,
105+
...(after != null ? {after} : {}),
106+
});
107+
if (!page || page.length === 0) break;
108+
// Defensive: a buggy backend that returns the cursor key would loop
109+
// forever. `after` is exclusive, so the first key of the next page must
110+
// be strictly greater than the previous cursor. Log so an operator can
111+
// notice partial cleanup caused by a pagination regression.
112+
if (after != null && page[0] <= after) {
113+
logger.error(
114+
`Session cleanup: paged cursor did not advance (after=${after}, ` +
115+
`page[0]=${page[0]}); aborting this run to prevent an infinite loop`);
116+
break;
89117
}
90-
const expires = sess.cookie?.expires;
91-
if (expires) {
92-
// Session has an expiry — remove if expired.
93-
if (new Date(expires).getTime() <= now) {
118+
for (const key of page) {
119+
scanned++;
120+
const sess = await DB.get(key);
121+
if (!sess) {
94122
await DB.remove(key);
95123
removed++;
124+
continue;
96125
}
97-
} else {
98-
// Session has no expiry and no user data beyond the cookie — remove as empty/stale.
99-
const hasData = Object.keys(sess).some((k) => k !== 'cookie');
100-
if (!hasData) {
101-
await DB.remove(key);
102-
removed++;
126+
const expires = sess.cookie?.expires;
127+
if (expires) {
128+
if (new Date(expires).getTime() <= now) {
129+
await DB.remove(key);
130+
removed++;
131+
}
132+
} else {
133+
const hasData = Object.keys(sess).some((k) => k !== 'cookie');
134+
if (!hasData) {
135+
await DB.remove(key);
136+
removed++;
137+
}
103138
}
104139
}
140+
after = page[page.length - 1];
141+
if (Date.now() - startMs > CLEANUP_MAX_RUNTIME_MS) {
142+
budgetExhausted = true;
143+
break;
144+
}
145+
// Yield to the event loop between pages so request handlers can run and
146+
// the DB driver can release the previous page's buffered rows.
147+
await new Promise((resolve) => setImmediate(resolve));
105148
}
106-
if (removed > 0) {
107-
logger.info(`Session cleanup: removed ${removed} expired/stale sessions out of ${keys.length}`);
149+
if (budgetExhausted) {
150+
logger.warn(
151+
`Session cleanup: hit ${CLEANUP_MAX_RUNTIME_MS}ms budget after scanning ` +
152+
`${scanned} keys (${removed} removed); next scheduled run will continue`);
153+
} else if (removed > 0) {
154+
logger.info(`Session cleanup: removed ${removed} expired/stale sessions out of ${scanned}`);
108155
}
109156
}
110157

src/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@
8787
"surrealdb": "^2.0.3",
8888
"tinycon": "0.6.8",
8989
"tsx": "4.22.3",
90-
"ueberdb2": "^6.0.3",
90+
"ueberdb2": "^6.1.0",
9191
"underscore": "1.13.8",
9292
"undici": "^8.3.0",
9393
"unorm": "1.6.0",

src/tests/backend/specs/SessionStore.ts

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,5 +309,62 @@ describe(__filename, function () {
309309
// After shutdown, the timer should be cleared.
310310
assert(ss!._cleanupTimer == null);
311311
});
312+
313+
// Regression for https://github.com/ether/etherpad/issues/7830 — cleanup
314+
// used to load every sessionstorage key into a single array; on huge DBs
315+
// this OOMed. Verifies the paged iteration still hits every key when the
316+
// count exceeds CLEANUP_PAGE_SIZE — we seed a few-row spread and force a
317+
// small page size to keep the test fast.
318+
it('pages across a large sessionstorage keyspace', async function () {
319+
// Tag rows so the assertion ignores anything other tests left behind.
320+
const tag = common.randomString();
321+
const expiredSids: string[] = [];
322+
const validSids: string[] = [];
323+
// Seed 25 expired + 25 valid rows. The default CLEANUP_PAGE_SIZE (500)
324+
// would cover this in one call, so we monkey-patch the constant for
325+
// this test by stubbing DB.findKeysPaged to enforce a small page.
326+
const real = db.findKeysPaged;
327+
let pageCalls = 0;
328+
db.findKeysPaged = async (key: string, notKey: any, opts: any) => {
329+
pageCalls++;
330+
return await real.call(db, key, notKey, {...opts, limit: 4});
331+
};
332+
try {
333+
for (let i = 0; i < 25; i++) {
334+
const sid = `cleanup_paged_exp_${tag}_${String(i).padStart(2, '0')}`;
335+
expiredSids.push(sid);
336+
await db.set(`sessionstorage:${sid}`, {
337+
cookie: {path: '/', expires: new Date(1).toJSON(), httpOnly: true},
338+
});
339+
}
340+
for (let i = 0; i < 25; i++) {
341+
const sid = `cleanup_paged_val_${tag}_${String(i).padStart(2, '0')}`;
342+
validSids.push(sid);
343+
await db.set(`sessionstorage:${sid}`, {
344+
cookie: {
345+
path: '/', expires: new Date(Date.now() + 60000).toJSON(), httpOnly: true,
346+
},
347+
});
348+
}
349+
await ss!._cleanup();
350+
for (const sid of expiredSids) {
351+
assert(await db.get(`sessionstorage:${sid}`) == null, `expired ${sid} not removed`);
352+
}
353+
for (const sid of validSids) {
354+
assert(await db.get(`sessionstorage:${sid}`) != null, `valid ${sid} was wrongly removed`);
355+
}
356+
// page size 4 over 50 rows -> at least 12 paged calls (final page may
357+
// be short). Confirms we actually iterated.
358+
assert(pageCalls >= 12, `expected paged iteration (got ${pageCalls} calls)`);
359+
} finally {
360+
db.findKeysPaged = real;
361+
// Symmetric cleanup — if an assertion threw earlier, expiredSids may
362+
// still be present in the DB. Remove both groups so the test leaves
363+
// no rows behind even on failure.
364+
for (const sid of [...expiredSids, ...validSids]) {
365+
await db.remove(`sessionstorage:${sid}`);
366+
}
367+
}
368+
});
312369
});
313370
});

0 commit comments

Comments
 (0)