fix(core): split SEO fetch from loadEntry to avoid D1 result-set column limit on wide collections#1603
Conversation
🦋 Changeset detectedLatest commit: 4b10630 The changes in this PR will be included in the next version bump. This PR includes changesets to release 16 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/auth-atproto
@emdash-cms/blocks
@emdash-cms/cloudflare
@emdash-cms/contentful-to-portable-text
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/plugin-cli
@emdash-cms/plugin-types
@emdash-cms/registry-client
@emdash-cms/registry-lexicons
@emdash-cms/sandbox-workerd
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-field-kit
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
There was a problem hiding this comment.
Approach assessment
This is the right fix for the right problem. The previous LEFT JOIN _emdash_seo appended 5 extra alias columns to a SELECT * on an already-wide per-collection table; on D1 that pushed wide-schemacollections over the per-result-set column limit and surfaced as a silent null entry. Splitting SEO into a small, bounded follow-up query and folding the same alias columns onto the row keeps extractSeo() unchanged and preserves the public entry.data.seo shape. The added regression test exercises a 95-field collection across dialects and covers the three SEO states (present, missing, and on a wide table), which matches the bug report.
I read the loader change, traced SEO_COLUMN_ALIASES through extractSeo(), verified the _emdash_seo migration schema still lines up with the hard-coded columns in the new query, and checked for leftover references to the removed seoSelect helper. The SQL is safe (parameterized values, validated identifiers via getTableName), the missing-table/missing-column fallback is still exercised, and the preview/revision code path continues to read SEO from the content-row aliases as before.
Headline: clean bug fix, but two stale comments now contradict the implementation and should be updated before merge.
Findings
-
[suggestion]
packages/core/src/loader.ts:27-38This JSDoc still describes the pre-PR design: it says SEO is "joined in" via a "LEFT JOIN" at "zero extra query cost." After this change,
loadEntryissues a separate follow-up query and folds SEO onto the row using these aliases. Update the comment so the next maintainer doesn't see the join description and undo the split for "one less round trip."/** * SEO columns fetched from `_emdash_seo` on the single-entry path, mapped to * aliased result keys. SEO lives in a side table, so `loadEntry` fetches it in * a small follow-up query and folds it onto the row using these aliases; the * result is surfaced as a nested `data.seo` object (see extractSeo) rather than * flat fields. Splitting the query keeps D1's per-result-set column count * bounded, since joining in these 5 columns can push wide collections over * D1's limit. * * The `_emdash_` prefix on the aliases guarantees they can never collide with * a content field. Field slugs must match `/^[a-z][a-z0-9_]*$/`, so a user can * legitimately define a `seo_title` field; selecting the joined column under * its bare name would shadow that field in the result set and drop the user's * value. The prefix (illegal as a leading slug char) sidesteps this entirely. */ -
[suggestion]
packages/core/tests/unit/loader-seo.test.ts:17This regression test's description still says the loader "LEFT JOINs"
_emdash_seo. The implementation no longer joins; it uses a separate follow-up query. Update the comment so the test documentation doesn't contradict the fix.* the admin had no effect on rendered pages because the content loader never * surfaced the `_emdash_seo` row. The loader now fetches that row in a * follow-up query and attaches the result to `entry.data.seo`, which * `getSeoMeta()` reads.
Verified on production with pkg.pr.new snapshot 🎯Installed the Test methodology:
Result with the fix (98 columns): All correct. With the prior Bonus finding while stress-testing: D1's column limit applies to
|
Overlapping PRsThis PR modifies files that are also changed by other open PRs:
This may cause merge conflicts or duplicated work. A maintainer will coordinate. |
…mn limit on wide collections The content loader's single-query LEFT JOIN _emdash_seo added 5 alias columns to every result set, which pushed per-collection ec_* tables with ~95+ flat user fields past D1's per-query column limit (~100). The join failed with D1_ERROR: too many columns in result set, the error was wrapped as a generic Failed to load entry, and the call site surfaced a silent null. SEO is now fetched as a separate follow-up query and folded onto the row using the same alias names extractSeo() reads, so the public API is unchanged. The result set width is now bounded regardless of how wide the collection schema gets. One extra round trip per loadEntry, no behavior change at the API boundary. loadCollection was already join-free. Adds a regression test that exercises a 95-user-field collection with and without a SEO row, on both dialects. Closes emdash-cms#1600 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0c1dc1d to
a4bfc3c
Compare
|
Rebased onto All 59 loader tests pass on the rebased branch (10 files, including the 3 new ones in The bot also flagged overlap with #1399. Looked at it: that PR touches |
… fetch Per review feedback on emdash-cms#1603: SEO_COLUMN_ALIASES JSDoc still described the pre-PR LEFT JOIN design, and the loader-seo.test.ts regression-test comment still said the loader joins _emdash_seo. Both now reflect the follow-up query design and explain why we do not join (D1 per-result-set column limit on wide flat-schema collections). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks for the careful read. Both stale comments addressed in 4484337:
Tests still pass (59/59 across 10 loader files), typecheck and lint clean. |
|
I am trying to avoid these additional per-entry queries because they can really balloon the rendering time. Is there any way we can get around this? |
|
Good push back, the extra round trip is the obvious cost and I was hoping someone would call it before this landed. A cleaner direction: keep SEO joined in the single query, but collapse the 5 alias columns into one JSON column so the result set width stays bounded. D1's per-result-set cap is on column count, not bytes, so this stays safe on any schema width, and we drop the second round trip entirely. Sketch (in the same spot in const seoSelect = sql`(
SELECT json_object(
'seo_title', s.seo_title,
'seo_description', s.seo_description,
'seo_image', s.seo_image,
'seo_canonical', s.seo_canonical,
'seo_no_index', s.seo_no_index
)
FROM ${sql.ref("_emdash_seo")} AS s
WHERE s.collection = ${type} AND s.content_id = c.id
LIMIT 1
) AS _emdash_seo_json`;
const result = await sql<Record<string, unknown>>`
SELECT c.*, ${seoSelect}, ${termsSelect}, ${bylinesSelect}
FROM ${sql.ref(tableName)} AS c
WHERE c.deleted_at IS NULL
AND ((c.slug = ${id} AND c.locale = ${locale}) OR c.id = ${id})
LIMIT 1
`.execute(db);
const row = result.rows[0];
if (row?._emdash_seo_json) {
const parsed = JSON.parse(row._emdash_seo_json as string) as Record<string, unknown>;
for (const [col, alias] of Object.entries(SEO_COLUMN_ALIASES)) {
row[alias] = parsed[col] ?? null;
}
delete row._emdash_seo_json;
}This mirrors what Two things worth verifying before I push it up:
Happy to push this onto the same branch and rerun the wide-collection tests if you're good with the direction. |
|
Yeah, a JSON column is probably the best plan. it's what I've used a lot to reduce round-trips. |
… query Per @ascorbic feedback: avoid the extra per-entry round trip that the prior revision of this PR added. SEO is now aggregated in SQL as one JSON column (`_emdash_seo`) and expanded in JS onto the existing SEO_COLUMN_ALIASES, mirroring how `foldedHydrationSelects` already handles bylines and taxonomy terms. Result: one query, one extra column instead of five, and the per-result-set column count stays bounded at any collection schema width. D1's column limit no longer triggers `D1_ERROR: too many columns in result set` on wide flat-schema collections, and we keep the single round trip. Dialect-specific aggregation (`json_object` on SQLite, `json_build_object` on Postgres) mirrors the existing folded-hydration helpers; expansion is a no-op when the SEO row is missing, preserving the `extractSeo()` null behavior the LEFT JOIN miss used to produce. Tests: 63/63 loader tests pass on both dialects, including the three wide-collection regression cases (95+ user columns).
There was a problem hiding this comment.
Re-review summary
The two stale-comment findings from the previous review are resolved: the JSDoc above SEO_COLUMN_ALIASES now accurately describes the folded-JSON-column design, and the loader-seo.test.ts description no longer references a LEFT JOIN.
The implementation itself is the right fix for the reported D1 issue. Folding SEO into a single correlated JSON subquery keeps the result-set width bounded and preserves both entry.data.seo shape and the single round trip. extractSeo() is unchanged, the SQL is safe (parameterized values, validated identifiers, hard-coded JSON keys), and the new regression test covers wide collections plus the three SEO states across both dialects.
One stale piece of documentation remains: the extractSeo() JSDoc still describes SEO as coming from "joined" columns and a "LEFT JOIN miss". That should be updated to match the folded JSON subquery.
Findings
-
[suggestion]
packages/core/src/loader.ts:225-230This JSDoc still describes the pre-PR mechanism: SEO is "joined" from
_emdash_seoand a missing row is a "LEFT JOIN miss". After this change the loader uses a correlated JSON-object subquery (foldedSeoSelect) whose result is expanded onto aliased columns; there is no LEFT JOIN anywhere inloadEntry. Update the comment so future maintainers don't assume the legacy join semantics./** * Build a `data.seo` object from the aliased `_emdash_seo_*` columns on a row. * * Returns `null` when no SEO row exists (the folded JSON column is NULL, so the * `seo_no_index` alias is absent; the column is `NOT NULL DEFAULT 0` whenever a * row is present). Returning null keeps the `seo` key off entries that have * none, so `getSeoMeta()` falls back to its defaults exactly as before. */
What does this PR do?
Fixes a silent
nullentry fromgetEmDashEntry()on Cloudflare D1 when the collection'sec_*table is wide. The loader's single-queryLEFT JOIN _emdash_seoadded 5 alias columns to every result set, which pushed collections with ~95+ flat user fields past D1's per-query column limit (~100). D1 threwD1_ERROR: too many columns in result set, the loader's try/catch wrapped it as a genericFailed to load entryerror, and the call site surfacednull(with the real message hidden inside a non-enumerableError.message, soJSON.stringify(error)looked empty and made the failure mode hard to spot).SEO is now fetched as a separate follow-up query and folded onto the row using the same alias names
extractSeo()reads. Result-set width is bounded regardless of how many fields the collection has.loadEntry.loadCollectionwas already join-free, so this brings the two paths into the same shape.entry.data.seois identical before/after.extractSeo()still returnsnullfor entries with no_emdash_seorow, preserving prior behavior.Closes #1600
Background context (different bug, exposed this one once it was fixed): #1446.
Type of change
Checklist
pnpm typecheckpassespnpm lintpassespnpm testpasses (or targeted tests for my change), all 55 loader tests pass, including the 3 new ones inloader-wide-collection.test.tspnpm formathas been runpackages/core/tests/unit/loader-wide-collection.test.tscovers a 95-user-field collection across both dialects, with and without a SEO row.changeset/fix-d1-result-set-column-limit.md(patch)AI-generated code disclosure
Screenshots / test output
The new tests in
loader-wide-collection.test.ts:loads an entry from a collection with 95+ flat user columnsstill attaches data.seo on wide collections (SEO follow-up query)omits data.seo when no SEO row exists, even on wide collectionsBoth dialects (SQLite + Postgres) covered via
describeEachDialect.Notes for reviewers
The diff in
loader.tsis the body ofloadEntry's table read: drops theLEFT JOIN _emdash_seo+seoSelectcolumns from the main query, then issues a smallSELECT seo_title, seo_description, seo_image, seo_canonical, seo_no_index FROM _emdash_seo WHERE collection = ? AND content_id = ?after the row is found. The retrieved SEO values are copied onto the row under the same_emdash_seo_*keys the join used, soextractSeo()keeps working without changes.The comment above the query has been updated to explain why we don't join (D1 column limit) so the next person who finds it doesn't undo this for "one less round trip".