Skip to content

fix(core): split SEO fetch from loadEntry to avoid D1 result-set column limit on wide collections#1603

Merged
ascorbic merged 5 commits into
emdash-cms:mainfrom
Vallhalen:fix/d1-result-set-column-limit
Jun 30, 2026
Merged

fix(core): split SEO fetch from loadEntry to avoid D1 result-set column limit on wide collections#1603
ascorbic merged 5 commits into
emdash-cms:mainfrom
Vallhalen:fix/d1-result-set-column-limit

Conversation

@Vallhalen

@Vallhalen Vallhalen commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes a silent null entry from getEmDashEntry() on Cloudflare D1 when the collection's ec_* table is wide. The loader's single-query LEFT JOIN _emdash_seo added 5 alias columns to every result set, which pushed collections with ~95+ flat user fields past D1's per-query column limit (~100). D1 threw D1_ERROR: too many columns in result set, the loader's try/catch wrapped it as a generic Failed to load entry error, and the call site surfaced null (with the real message hidden inside a non-enumerable Error.message, so JSON.stringify(error) looked empty and made the failure mode hard to spot).

SEO is now fetched as a separate follow-up query and folded onto the row using the same alias names extractSeo() reads. Result-set width is bounded regardless of how many fields the collection has.

  • One extra round trip per loadEntry. loadCollection was already join-free, so this brings the two paths into the same shape.
  • No public-API change, entry.data.seo is identical before/after.
  • extractSeo() still returns null for entries with no _emdash_seo row, preserving prior behavior.

Closes #1600

Background context (different bug, exposed this one once it was fixed): #1446.

Type of change

  • Bug fix
  • Feature (requires maintainer-approved Discussion)
  • Refactor (no behavior change)
  • Translation
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm lint passes
  • pnpm test passes (or targeted tests for my change), all 55 loader tests pass, including the 3 new ones in loader-wide-collection.test.ts
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable), new file packages/core/tests/unit/loader-wide-collection.test.ts covers a 95-user-field collection across both dialects, with and without a SEO row
  • User-visible strings in the admin UI are wrapped for translation, N/A, no admin UI changes
  • I have added a changeset, .changeset/fix-d1-result-set-column-limit.md (patch)
  • New features link to an approved Discussion, N/A, this is a bug fix

AI-generated code disclosure

  • This PR includes AI-generated code, model/tool: Claude Opus 4.7 (via Claude Code), implementing the user-proposed fix from the linked issue

Screenshots / test output

$ pnpm test loader

 RUN  v4.1.5 C:/tmp/emdash/packages/core

 Test Files  9 passed (9)
      Tests  55 passed (55)
   Start at  12:23:09
   Duration  2.44s

The new tests in loader-wide-collection.test.ts:

  • loads an entry from a collection with 95+ flat user columns
  • still attaches data.seo on wide collections (SEO follow-up query)
  • omits data.seo when no SEO row exists, even on wide collections

Both dialects (SQLite + Postgres) covered via describeEachDialect.

Notes for reviewers

The diff in loader.ts is the body of loadEntry's table read: drops the LEFT JOIN _emdash_seo + seoSelect columns from the main query, then issues a small SELECT seo_title, seo_description, seo_image, seo_canonical, seo_no_index FROM _emdash_seo WHERE collection = ? AND content_id = ? after the row is found. The retrieved SEO values are copied onto the row under the same _emdash_seo_* keys the join used, so extractSeo() keeps working without changes.

The comment above the query has been updated to explain why we don't join (D1 column limit) so the next person who finds it doesn't undo this for "one less round trip".

@changeset-bot

changeset-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 4b10630

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 16 packages
Name Type
emdash Patch
@emdash-cms/cloudflare Patch
@emdash-cms/sandbox-workerd Patch
@emdash-cms/fixture-perf-site Patch
@emdash-cms/perf-demo-site Patch
@emdash-cms/cache-demo-site Patch
@emdash-cms/do-demo-site Patch
@emdash-cms/do-solo-demo-site Patch
@emdash-cms/admin Patch
@emdash-cms/auth Patch
@emdash-cms/blocks Patch
@emdash-cms/gutenberg-to-portable-text Patch
@emdash-cms/x402 Patch
create-emdash Patch
@emdash-cms/auth-atproto Patch
@emdash-cms/plugin-embeds Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 24, 2026

Copy link
Copy Markdown

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@1603

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@1603

@emdash-cms/auth-atproto

npm i https://pkg.pr.new/@emdash-cms/auth-atproto@1603

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@1603

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@1603

@emdash-cms/contentful-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/contentful-to-portable-text@1603

emdash

npm i https://pkg.pr.new/emdash@1603

create-emdash

npm i https://pkg.pr.new/create-emdash@1603

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@1603

@emdash-cms/plugin-cli

npm i https://pkg.pr.new/@emdash-cms/plugin-cli@1603

@emdash-cms/plugin-types

npm i https://pkg.pr.new/@emdash-cms/plugin-types@1603

@emdash-cms/registry-client

npm i https://pkg.pr.new/@emdash-cms/registry-client@1603

@emdash-cms/registry-lexicons

npm i https://pkg.pr.new/@emdash-cms/registry-lexicons@1603

@emdash-cms/sandbox-workerd

npm i https://pkg.pr.new/@emdash-cms/sandbox-workerd@1603

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@1603

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@1603

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@1603

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@1603

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@1603

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@1603

@emdash-cms/plugin-field-kit

npm i https://pkg.pr.new/@emdash-cms/plugin-field-kit@1603

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@1603

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@1603

commit: 4b10630

@github-actions github-actions Bot added the query-count changed PR diff modifies query-count snapshot files label Jun 24, 2026

@emdashbot emdashbot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach assessment

This is the right fix for the right problem. The previous LEFT JOIN _emdash_seo appended 5 extra alias columns to a SELECT * on an already-wide per-collection table; on D1 that pushed wide-schemacollections over the per-result-set column limit and surfaced as a silent null entry. Splitting SEO into a small, bounded follow-up query and folding the same alias columns onto the row keeps extractSeo() unchanged and preserves the public entry.data.seo shape. The added regression test exercises a 95-field collection across dialects and covers the three SEO states (present, missing, and on a wide table), which matches the bug report.

I read the loader change, traced SEO_COLUMN_ALIASES through extractSeo(), verified the _emdash_seo migration schema still lines up with the hard-coded columns in the new query, and checked for leftover references to the removed seoSelect helper. The SQL is safe (parameterized values, validated identifiers via getTableName), the missing-table/missing-column fallback is still exercised, and the preview/revision code path continues to read SEO from the content-row aliases as before.

Headline: clean bug fix, but two stale comments now contradict the implementation and should be updated before merge.


Findings

  • [suggestion] packages/core/src/loader.ts:27-38

    This JSDoc still describes the pre-PR design: it says SEO is "joined in" via a "LEFT JOIN" at "zero extra query cost." After this change, loadEntry issues a separate follow-up query and folds SEO onto the row using these aliases. Update the comment so the next maintainer doesn't see the join description and undo the split for "one less round trip."

    /**
     * SEO columns fetched from `_emdash_seo` on the single-entry path, mapped to
     * aliased result keys. SEO lives in a side table, so `loadEntry` fetches it in
     * a small follow-up query and folds it onto the row using these aliases; the
     * result is surfaced as a nested `data.seo` object (see extractSeo) rather than
     * flat fields. Splitting the query keeps D1's per-result-set column count
     * bounded, since joining in these 5 columns can push wide collections over
     * D1's limit.
     *
     * The `_emdash_` prefix on the aliases guarantees they can never collide with
     * a content field. Field slugs must match `/^[a-z][a-z0-9_]*$/`, so a user can
     * legitimately define a `seo_title` field; selecting the joined column under
     * its bare name would shadow that field in the result set and drop the user's
     * value. The prefix (illegal as a leading slug char) sidesteps this entirely.
     */
    
  • [suggestion] packages/core/tests/unit/loader-seo.test.ts:17

    This regression test's description still says the loader "LEFT JOINs" _emdash_seo. The implementation no longer joins; it uses a separate follow-up query. Update the comment so the test documentation doesn't contradict the fix.

     * the admin had no effect on rendered pages because the content loader never
     * surfaced the `_emdash_seo` row. The loader now fetches that row in a
     * follow-up query and attaches the result to `entry.data.seo`, which
     * `getSeoMeta()` reads.
    

@Vallhalen

Vallhalen commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Verified on production with pkg.pr.new snapshot 🎯

Installed the @1603 snapshot on our long-migrated production instance (chcedointernetu.pl) and ran a full stress test to confirm the fix holds end-to-end against Cloudflare D1:

npm i https://pkg.pr.new/emdash@1603 \
      https://pkg.pr.new/@emdash-cms/cloudflare@1603 \
      https://pkg.pr.new/@emdash-cms/admin@1603

Test methodology:

  1. PITR'd D1 to a known-good bookmark.
  2. Installed the PR snapshot, ran build + deploy.
  3. Restored the 8 unused columns we had originally dropped to work around the bug (the workaround documented in the linked issue), bringing ec_pages back to 98 columns, the exact width that previously failed in 0.20+ stock.
  4. Hit homepage and all 9 originally-failing CMS routes.

Result with the fix (98 columns):

=== homepage ===
len: 221002 (full content, 0.16.1 baseline)
Sklepu internetowego: True   (hero typed strings)
Chcesz do internetu : True   (hero headline)
Konrad Krawczuk     : True   (about section)
inaczej?            : True   (sec_dlaczego section)
FAQ                 : True
h2 sections: 11

=== 9 CMS pages ===
200  /uslugi/strony-www
200  /uslugi/sklepy-internetowe
200  /portfolio
200  /blog
200  /o-mnie
200  /konsultacje
200  /uslugi/wdrozenia-ai
200  /uslugi/reklamy
200  /uslugi/optymalizacja

All correct. With the prior LEFT JOIN _emdash_seo shape, the same 98-column table would have produced a 103-column result set and failed with D1_ERROR: too many columns in result set, returning a silent null entry (this was exactly the production behavior before the fix). Cleaned up the stress columns afterwards.

Bonus finding while stress-testing: D1's column limit applies to ALTER TABLE too

When I tried to push past 100 columns to verify the result-set width doesn't matter even at the extreme, D1 refused the ALTER TABLE ADD COLUMN itself:

$ wrangler d1 execute chcedointernetu --remote \
    --command "ALTER TABLE ec_pages ADD COLUMN stress_test_3 TEXT"
ERROR: too many columns on sqlite_altertab_ec_pages: SQLITE_ERROR [code: 7500]

So D1's per-table column ceiling sits around 100 (not the 2000 SQLite default). That means once a collection's ec_* table reaches ~95 user fields, two things happen simultaneously:

  1. The old SELECT c.*, <5 SEO aliases> query crosses the result-set limit and getEmDashEntry starts returning silent nulls (what this PR fixes).
  2. Adding any further field via the admin would fail with too many columns on sqlite_altertab_<table> even before this PR.

This PR keeps loadEntry working right up to D1's hard ceiling. Beyond that, the editor experience will break before the loader does, useful to know for migration guidance, but it doesn't change anything about this fix.

Side effect: +1 round trip per loadEntry

Saw the Query-count snapshot changes workflow flag +8 queries across 8 routes (1 extra per loadEntry for the SEO follow-up). That's the intended trade-off, loadCollection was already SEO-join-free, so this brings the two paths into the same shape. Happy to switch to an adaptive variant (SELECT c.* + LEFT JOIN when narrow, separate query when ≥ N columns) if maintainers prefer to preserve the per-row round-trip count for typical schemas. Just let me know which direction you'd prefer and I can push another commit.

@github-actions

Copy link
Copy Markdown
Contributor

Overlapping PRs

This PR modifies files that are also changed by other open PRs:

This may cause merge conflicts or duplicated work. A maintainer will coordinate.

@github-actions github-actions Bot added review/awaiting-author Reviewed; waiting on the author to respond needs-rebase and removed review/needs-review No maintainer or bot review yet labels Jun 24, 2026
…mn limit on wide collections

The content loader's single-query LEFT JOIN _emdash_seo added 5 alias
columns to every result set, which pushed per-collection ec_* tables
with ~95+ flat user fields past D1's per-query column limit (~100).
The join failed with D1_ERROR: too many columns in result set, the
error was wrapped as a generic Failed to load entry, and the call
site surfaced a silent null.

SEO is now fetched as a separate follow-up query and folded onto the
row using the same alias names extractSeo() reads, so the public API
is unchanged. The result set width is now bounded regardless of how
wide the collection schema gets.

One extra round trip per loadEntry, no behavior change at the API
boundary. loadCollection was already join-free.

Adds a regression test that exercises a 95-user-field collection
with and without a SEO row, on both dialects.

Closes emdash-cms#1600

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Vallhalen Vallhalen force-pushed the fix/d1-result-set-column-limit branch from 0c1dc1d to a4bfc3c Compare June 26, 2026 09:32
@github-actions github-actions Bot added review/needs-rereview Author pushed changes since the last review and removed query-count changed PR diff modifies query-count snapshot files needs-rebase review/awaiting-author Reviewed; waiting on the author to respond labels Jun 26, 2026
@Vallhalen

Copy link
Copy Markdown
Contributor Author

Rebased onto main after #1619 landed (which folded byline + taxonomy hydration into loadEntry's query). The two changes compose cleanly: folded hydration adds two aggregated JSON columns (_emdash_terms, _emdash_bylines), and this PR drops the five SEO alias columns by moving SEO into a follow-up query. Net result for the single-entry path is c.* + 2 aggregated columns, which stays bounded regardless of how wide ec_* gets.

All 59 loader tests pass on the rebased branch (10 files, including the 3 new ones in loader-wide-collection.test.ts). pnpm typecheck and pnpm lint clean. query-counts.snapshot.{d1,sqlite}.json are unchanged from main (the SEO removal is offset by the folded hydration savings on the routes that hit loadEntry).

The bot also flagged overlap with #1399. Looked at it: that PR touches loader.ts for taxonomy-defs caching (different region of the file), so it should rebase cleanly on top of this one or vice versa. Happy to defer to whichever order the maintainers prefer.

… fetch

Per review feedback on emdash-cms#1603: SEO_COLUMN_ALIASES JSDoc still described the
pre-PR LEFT JOIN design, and the loader-seo.test.ts regression-test comment
still said the loader joins _emdash_seo. Both now reflect the follow-up
query design and explain why we do not join (D1 per-result-set column
limit on wide flat-schema collections).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Vallhalen

Copy link
Copy Markdown
Contributor Author

Thanks for the careful read. Both stale comments addressed in 4484337:

  • SEO_COLUMN_ALIASES JSDoc rewritten to describe the follow-up-query design and explain the D1 column-limit rationale (so the next reader does not undo the split for "one less round trip").
  • loader-seo.test.ts:14-21 comment updated to match: the loader now fetches SEO in a follow-up query, and the dialect-sensitivity note shifted from "the LEFT JOIN SQL" to "the follow-up SEO query."

Tests still pass (59/59 across 10 loader files), typecheck and lint clean.

@ascorbic

Copy link
Copy Markdown
Collaborator

I am trying to avoid these additional per-entry queries because they can really balloon the rendering time. Is there any way we can get around this?

@Vallhalen

Copy link
Copy Markdown
Contributor Author

Good push back, the extra round trip is the obvious cost and I was hoping someone would call it before this landed.

A cleaner direction: keep SEO joined in the single query, but collapse the 5 alias columns into one JSON column so the result set width stays bounded. D1's per-result-set cap is on column count, not bytes, so this stays safe on any schema width, and we drop the second round trip entirely.

Sketch (in the same spot in loader.ts):

const seoSelect = sql`(
  SELECT json_object(
    'seo_title', s.seo_title,
    'seo_description', s.seo_description,
    'seo_image', s.seo_image,
    'seo_canonical', s.seo_canonical,
    'seo_no_index', s.seo_no_index
  )
  FROM ${sql.ref("_emdash_seo")} AS s
  WHERE s.collection = ${type} AND s.content_id = c.id
  LIMIT 1
) AS _emdash_seo_json`;

const result = await sql<Record<string, unknown>>`
  SELECT c.*, ${seoSelect}, ${termsSelect}, ${bylinesSelect}
  FROM ${sql.ref(tableName)} AS c
  WHERE c.deleted_at IS NULL
  AND ((c.slug = ${id} AND c.locale = ${locale}) OR c.id = ${id})
  LIMIT 1
`.execute(db);

const row = result.rows[0];
if (row?._emdash_seo_json) {
  const parsed = JSON.parse(row._emdash_seo_json as string) as Record<string, unknown>;
  for (const [col, alias] of Object.entries(SEO_COLUMN_ALIASES)) {
    row[alias] = parsed[col] ?? null;
  }
  delete row._emdash_seo_json;
}

This mirrors what foldedHydrationSelects already does for bylines and terms (each is a single JSON-aggregated column in the SELECT, then expanded in JS), so the SEO path lines up with the same shape rather than being a special case.

Two things worth verifying before I push it up:

  1. D1 SQLite version on json_object(). D1 has JSON1 enabled (it ships with SQLite >= 3.38), so json_object() and json_extract() are available. Worth confirming on whatever D1 baseline you're targeting.
  2. null row case. When there's no _emdash_seo row for the content, the correlated subquery returns NULL (not an empty object), so row._emdash_seo_json is null and extractSeo() keeps returning null exactly as it does today.

Happy to push this onto the same branch and rerun the wide-collection tests if you're good with the direction.

@ascorbic

Copy link
Copy Markdown
Collaborator

Yeah, a JSON column is probably the best plan. it's what I've used a lot to reduce round-trips.

… query

Per @ascorbic feedback: avoid the extra per-entry round trip that the prior
revision of this PR added. SEO is now aggregated in SQL as one JSON column
(`_emdash_seo`) and expanded in JS onto the existing SEO_COLUMN_ALIASES,
mirroring how `foldedHydrationSelects` already handles bylines and taxonomy
terms.

Result: one query, one extra column instead of five, and the per-result-set
column count stays bounded at any collection schema width. D1's column limit
no longer triggers `D1_ERROR: too many columns in result set` on wide
flat-schema collections, and we keep the single round trip.

Dialect-specific aggregation (`json_object` on SQLite, `json_build_object`
on Postgres) mirrors the existing folded-hydration helpers; expansion is a
no-op when the SEO row is missing, preserving the `extractSeo()` null behavior
the LEFT JOIN miss used to produce.

Tests: 63/63 loader tests pass on both dialects, including the three
wide-collection regression cases (95+ user columns).
@ascorbic ascorbic added bot:review Trigger an emdashbot code review on this PR and removed bot:review Trigger an emdashbot code review on this PR labels Jun 30, 2026

@emdashbot emdashbot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review summary

The two stale-comment findings from the previous review are resolved: the JSDoc above SEO_COLUMN_ALIASES now accurately describes the folded-JSON-column design, and the loader-seo.test.ts description no longer references a LEFT JOIN.

The implementation itself is the right fix for the reported D1 issue. Folding SEO into a single correlated JSON subquery keeps the result-set width bounded and preserves both entry.data.seo shape and the single round trip. extractSeo() is unchanged, the SQL is safe (parameterized values, validated identifiers, hard-coded JSON keys), and the new regression test covers wide collections plus the three SEO states across both dialects.

One stale piece of documentation remains: the extractSeo() JSDoc still describes SEO as coming from "joined" columns and a "LEFT JOIN miss". That should be updated to match the folded JSON subquery.


Findings

  • [suggestion] packages/core/src/loader.ts:225-230

    This JSDoc still describes the pre-PR mechanism: SEO is "joined" from _emdash_seo and a missing row is a "LEFT JOIN miss". After this change the loader uses a correlated JSON-object subquery (foldedSeoSelect) whose result is expanded onto aliased columns; there is no LEFT JOIN anywhere in loadEntry. Update the comment so future maintainers don't assume the legacy join semantics.

    /**
     * Build a `data.seo` object from the aliased `_emdash_seo_*` columns on a row.
     *
     * Returns `null` when no SEO row exists (the folded JSON column is NULL, so the
     * `seo_no_index` alias is absent; the column is `NOT NULL DEFAULT 0` whenever a
     * row is present). Returning null keeps the `seo` key off entries that have
     * none, so `getSeoMeta()` falls back to its defaults exactly as before.
     */
    

@ascorbic ascorbic left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ascorbic ascorbic merged commit 13b87b7 into emdash-cms:main Jun 30, 2026
47 checks passed
@emdashbot emdashbot Bot mentioned this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core bot:review Trigger an emdashbot code review on this PR cla: signed overlap review/needs-rereview Author pushed changes since the last review size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loader 'SELECT c.*' + SEO LEFT JOIN crosses D1 result-set column limit on wide collection schemas (silent null entry)

2 participants