Skip to content

Commit 2c65f87

Browse files
committed
CS-11264 follow-up: validate via registry, no per-row cold-mount
Routing every permission row through `reconciler.lookupOrMount` made `_realm-auth` correct for the user's own non-pinned realms but introduced a cold-mount-storm hazard: `fetchUserPermissions` is called with `onlyOwnRealms: false`, so the result includes every `'*'`-readable realm in addition to the user's owned/writable realms. The host login flow and `boxel realm list` both hit `/_realm-auth` and expect a JWT for every accessible realm; on a fresh process that's an unbounded series of sequential `realm.start()` calls against the entire public-readable set. A JWT does not actually require a mounted realm. The token encodes the URL, the user's permissions, the session room, and the realm server URL; the realm itself is only used when the holder later hits a per-realm endpoint, and `findOrMountRealm` / `reconciler.lookupOrMount` handle the on-demand mount there. Switch the handler to: - Validate each accessible URL against `reconciler.knownByUrl` (the in-memory mirror of `realm_registry`), with a single batched probe against `realm_registry` for URLs not yet reflected in this process. Mirrors `multiRealmAuthorization` (CS-11238). - Resolve the session room once per request (it is keyed by matrixUserId in the DB, not by realm). For the create branch, use the realm-server's matrix client to call `createDM`, matching how `_server-session` creates session rooms — the caller has already exchanged an OpenID token via `_server-session` to get the JWT they're presenting here, so the server client is already logged in. - Issue the JWT for every URL that resolved against the registry, with no mount. Mount stays deferred to the next per-realm request. Strengthen the regression test: `testingOnlyEvictRealmFromRealmsList` now also removes the realm from `reconciler.mounted`, so the test proves the handler issues a JWT for a realm absent from both `realms[]` and `mounted` (i.e. would have needed a cold mount under the previous follow-up's behavior). Adds preconditions and a post-call assertion that `reconciler.mounted` is still empty for the URL — the handler did NOT cold-mount.
1 parent 6b84929 commit 2c65f87

3 files changed

Lines changed: 133 additions & 37 deletions

File tree

packages/realm-server/handlers/handle-realm-auth.ts

Lines changed: 76 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,14 @@ import type Koa from 'koa';
22

33
import type { CreateRoutesArgs } from '../routes';
44
import {
5-
SupportedMimeType,
5+
fetchSessionRoom,
66
fetchUserPermissions,
7-
type Realm,
7+
param,
8+
query,
9+
separatedByCommas,
10+
SupportedMimeType,
11+
upsertSessionRoom,
12+
type Expression,
813
} from '@cardstack/runtime-common';
914
import type { RealmServerTokenClaim } from 'utils/jwt';
1015
import { getUserByMatrixUserId } from '@cardstack/billing/billing-queries';
@@ -14,6 +19,7 @@ import * as Sentry from '@sentry/node';
1419

1520
export default function handleRealmAuth({
1621
dbAdapter,
22+
matrixClient,
1723
realmSecretSeed,
1824
reconciler,
1925
serverURL,
@@ -37,32 +43,88 @@ export default function handleRealmAuth({
3743
userId: matrixUserId,
3844
onlyOwnRealms: false,
3945
});
46+
let accessibleRealmUrls = Object.keys(permissionsForAllRealms);
4047

41-
let sessions: { [realm: string]: string } = {};
42-
for (let [realmUrl, permissions] of Object.entries(
43-
permissionsForAllRealms,
44-
)) {
45-
let realm: Realm | undefined;
48+
// Validate each accessible realm URL against the registry WITHOUT
49+
// mounting it. fetchUserPermissions with onlyOwnRealms:false returns
50+
// every '*'-readable realm in addition to the user's own; routing
51+
// each row through reconciler.lookupOrMount would cold-mount every
52+
// public-readable realm on the server on the first post-restart
53+
// _realm-auth call (boxel realm list, host login). Phase 3's lazy-
54+
// mount contract is "mount on first per-realm request" — the per-
55+
// realm JWT itself is fine to issue from registry presence alone;
56+
// the mount happens later, when the holder actually hits a realm
57+
// endpoint and findOrMountRealm/lookupOrMount runs there.
58+
//
59+
// Same lookup order as multiRealmAuthorization (CS-11238): in-memory
60+
// reconciler.knownByUrl first, then a single batched probe against
61+
// realm_registry for any URLs not yet reflected in this process
62+
// (e.g. a freshly-published row from a peer instance between NOTIFY
63+
// and the next reconcile pass).
64+
let registeredUrls = new Set<string>();
65+
let urlsToProbe: string[] = [];
66+
for (let realmUrl of accessibleRealmUrls) {
67+
if (reconciler.knownByUrl.has(realmUrl)) {
68+
registeredUrls.add(realmUrl);
69+
} else {
70+
urlsToProbe.push(realmUrl);
71+
}
72+
}
73+
if (urlsToProbe.length > 0) {
74+
let rows = (await query(dbAdapter, [
75+
'SELECT url FROM realm_registry WHERE url IN (',
76+
...separatedByCommas(urlsToProbe.map((url) => [param(url)])),
77+
')',
78+
] as Expression)) as { url: string }[];
79+
for (let { url } of rows) {
80+
registeredUrls.add(url);
81+
}
82+
}
83+
84+
// Resolve the user's session room ONCE per request. The session room
85+
// is keyed by matrixUserId in the DB, not by realm, so calling
86+
// realm.ensureSessionRoom() per accessible realm (as this handler
87+
// used to) was N redundant DB reads on the fast path and forced us
88+
// to materialize each realm as a `Realm` instance on the cold path.
89+
// For the create branch we use the realm-server's matrix client,
90+
// matching how _server-session creates session rooms — at this
91+
// point in the request lifecycle the caller has already exchanged
92+
// an OpenID token via _server-session, so the server client is
93+
// already logged in. login() is idempotent (cached promise).
94+
let sessionRoom: string | null = await fetchSessionRoom(
95+
dbAdapter,
96+
matrixUserId,
97+
);
98+
if (!sessionRoom) {
4699
try {
47-
realm = await reconciler.lookupOrMount(realmUrl);
100+
await matrixClient.login();
101+
sessionRoom = await matrixClient.createDM(matrixUserId);
102+
await upsertSessionRoom(dbAdapter, matrixUserId, sessionRoom);
48103
} catch (error) {
49104
Sentry.withScope((scope) => {
50-
scope.setExtra('realmUrl', realmUrl);
51105
scope.setExtra('matrixUserId', matrixUserId);
52-
scope.setExtra('permissionsForAllRealms', permissionsForAllRealms);
53106
Sentry.captureException(error);
54107
});
55-
continue;
108+
await sendResponseForError(
109+
ctxt,
110+
500,
111+
'Internal Server Error',
112+
'Failed to ensure session room',
113+
);
114+
return;
56115
}
57-
if (!realm) {
116+
}
117+
118+
let sessions: { [realm: string]: string } = {};
119+
for (let realmUrl of accessibleRealmUrls) {
120+
if (!registeredUrls.has(realmUrl)) {
58121
console.error(
59122
`Permissions found pointing to unknown realm ${realmUrl}`,
60123
);
61124
continue;
62125
}
63-
126+
let permissions = permissionsForAllRealms[realmUrl];
64127
try {
65-
let sessionRoom = await realm.ensureSessionRoom(matrixUserId);
66128
sessions[realmUrl] = createJWT(
67129
{
68130
user: matrixUserId,
@@ -79,10 +141,8 @@ export default function handleRealmAuth({
79141
scope.setExtra('realmUrl', realmUrl);
80142
scope.setExtra('matrixUserId', matrixUserId);
81143
scope.setExtra('permissionsForAllRealms', permissionsForAllRealms);
82-
83144
Sentry.captureException(error);
84145
});
85-
86146
continue;
87147
}
88148
}

packages/realm-server/server.ts

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -537,23 +537,34 @@ export class RealmServer {
537537
return [...this.realms];
538538
}
539539

540+
// Test-only accessor for the reconciler. Exposed so realm-auth-test
541+
// can inspect knownByUrl / mounted as preconditions and assert that
542+
// _realm-auth does not cold-mount during request handling.
543+
get testingOnlyReconciler() {
544+
return this.reconciler;
545+
}
546+
540547
testingOnlyUnmountRealms() {
541548
for (let realm of this.realms) {
542549
this.virtualNetwork.unmount(realm.handle);
543550
}
544551
}
545552

546-
// Simulate the post-restart "this realm isn't in this process's
547-
// realms[] yet" state without tearing down its disk mount, indexer,
548-
// or matrix client. realm-auth-test uses this to prove
549-
// _realm-auth resolves through the reconciler instead of realms[].
550-
// The realm stays in reconciler.mounted so lookupOrMount() returns
551-
// it via the fast path.
553+
// Simulate the post-restart "this realm has a registry row but no
554+
// active Realm instance on this process" state without tearing down
555+
// its disk mount, indexer, or matrix client. realm-auth-test uses
556+
// this to prove _realm-auth issues a JWT for a realm that is absent
557+
// from both realms[] and reconciler.mounted — i.e. one that would
558+
// need a cold lookupOrMount to be materialized. The registry row
559+
// (and reconciler.knownByUrl entry) is left in place; the next
560+
// request that actually needs the realm will lazy-mount it via the
561+
// normal request path.
552562
testingOnlyEvictRealmFromRealmsList(url: string): void {
553563
let idx = this.realms.findIndex((r) => r.url === url);
554564
if (idx !== -1) {
555565
this.realms.splice(idx, 1);
556566
}
567+
this.reconciler.mounted.delete(url);
557568
}
558569

559570
// Test-only accessor for the request-path realm resolver. Exposed so

packages/realm-server/tests/realm-auth-test.ts

Lines changed: 40 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -95,16 +95,22 @@ module(basename(__filename), function () {
9595
});
9696

9797
// CS-11264 regression: after a realm-server restart, a non-pinned
98-
// realm is not in this process's realms[] until something triggers a
99-
// lazy mount via the request path. Pre-fix, _realm-auth iterated
100-
// realms[] directly and silently skipped any realm not yet mounted on
101-
// this instance — so an owner's first post-restart boxel-cli call
102-
// (push/publish/sync) failed with "No realm token available". Post-
103-
// fix, the handler routes through reconciler.lookupOrMount, so any
104-
// realm the reconciler can resolve (already mounted, or registered
105-
// via knownByUrl, or directly readable from realm_registry) yields a
106-
// session token.
107-
test('POST /_realm-auth issues a token for a realm reachable only via the reconciler (post-restart)', async function (assert) {
98+
// realm has a row in realm_registry but no active Realm instance
99+
// on this process until something triggers a lazy mount via the
100+
// request path. Pre-fix, _realm-auth iterated realms[] directly
101+
// and silently skipped any realm not yet mounted on this instance
102+
// — so an owner's first post-restart boxel-cli call
103+
// (push/publish/sync) failed with "No realm token available".
104+
// Post-fix, the handler validates against reconciler.knownByUrl
105+
// (with a registry probe fallback) and issues a JWT without
106+
// mounting; the mount happens later via the normal request path
107+
// when the holder of the JWT actually hits a realm endpoint.
108+
//
109+
// Mounting per row inside _realm-auth was rejected because
110+
// fetchUserPermissions(onlyOwnRealms:false) returns every
111+
// '*'-readable realm, so a single boxel realm list / host login
112+
// would cold-mount the entire accessible set after a restart.
113+
test('POST /_realm-auth issues a token for a realm absent from realms[] and reconciler.mounted (post-restart)', async function (assert) {
108114
sinon
109115
.stub(MatrixClient.prototype, 'createDM')
110116
.resolves('!post-restart-session-room:localhost');
@@ -114,11 +120,26 @@ module(basename(__filename), function () {
114120
});
115121
sinon.stub(MatrixClient.prototype, 'joinRoom').resolves();
116122

117-
// Simulate the post-restart state where the realm is still
118-
// mounted in the reconciler (which knows about it from boot or a
119-
// prior lazy mount on another caller) but absent from this
120-
// process's realms[] snapshot.
123+
// Simulate the post-restart state: registry row + knownByUrl
124+
// entry are present (reconciler boot has reflected the registry),
125+
// but neither realms[] nor reconciler.mounted holds an active
126+
// Realm. The handler must NOT attempt to cold-mount; it must
127+
// issue a JWT from the registry presence alone.
121128
testRealmServer.testingOnlyEvictRealmFromRealmsList(testRealmHref);
129+
assert.false(
130+
testRealmServer.testingOnlyRealms.some(
131+
(r) => r.url === testRealmHref,
132+
),
133+
'precondition: realm is absent from realms[]',
134+
);
135+
assert.false(
136+
testRealmServer.testingOnlyReconciler.mounted.has(testRealmHref),
137+
'precondition: realm is absent from reconciler.mounted',
138+
);
139+
assert.true(
140+
testRealmServer.testingOnlyReconciler.knownByUrl.has(testRealmHref),
141+
'precondition: registry row is still reflected in reconciler.knownByUrl',
142+
);
122143

123144
let response = await request
124145
.post('/_realm-auth')
@@ -136,7 +157,11 @@ module(basename(__filename), function () {
136157
assert.strictEqual(response.status, 200, 'HTTP 200 status');
137158
assert.ok(
138159
response.body[testRealmHref],
139-
'response includes a JWT for the realm even though it was absent from realms[]',
160+
'response includes a JWT for the realm even though it was absent from realms[] and reconciler.mounted',
161+
);
162+
assert.false(
163+
testRealmServer.testingOnlyReconciler.mounted.has(testRealmHref),
164+
'the handler did NOT cold-mount the realm — mount remains deferred to the next per-realm request',
140165
);
141166
});
142167
});

0 commit comments

Comments
 (0)