Skip to content

Commit a4bfc3c

Browse files
Vallhalenclaude
andcommitted
fix(core): split SEO fetch from loadEntry to avoid D1 result-set column limit on wide collections
The content loader's single-query LEFT JOIN _emdash_seo added 5 alias columns to every result set, which pushed per-collection ec_* tables with ~95+ flat user fields past D1's per-query column limit (~100). The join failed with D1_ERROR: too many columns in result set, the error was wrapped as a generic Failed to load entry, and the call site surfaced a silent null. SEO is now fetched as a separate follow-up query and folded onto the row using the same alias names extractSeo() reads, so the public API is unchanged. The result set width is now bounded regardless of how wide the collection schema gets. One extra round trip per loadEntry, no behavior change at the API boundary. loadCollection was already join-free. Adds a regression test that exercises a 95-user-field collection with and without a SEO row, on both dialects. Closes #1600 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent b66f697 commit a4bfc3c

3 files changed

Lines changed: 197 additions & 20 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"emdash": patch
3+
---
4+
5+
Fixes silent `null` entries on wide-schema collections under Cloudflare D1. The content loader's single-query `LEFT JOIN _emdash_seo` added 5 alias columns to every result set, which pushed collections with ~95+ flat user fields past D1's per-query column limit (~100). The query failed with `D1_ERROR: too many columns in result set`, the error was wrapped as a generic `Failed to load entry`, and the call site surfaced `null`. SEO is now fetched as a separate follow-up query and folded onto the row, keeping the result-set width bounded regardless of how wide the collection schema gets.

packages/core/src/loader.ts

Lines changed: 36 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -994,40 +994,39 @@ export function emdashLoader(): LiveLoader<EntryData, EntryFilter, CollectionFil
994994
// When locale is specified, prefer locale-scoped slug match,
995995
// but IDs are globally unique so always check id without locale scope.
996996
//
997-
// LEFT JOIN _emdash_seo folds per-entry SEO (canonical, noindex,
998-
// etc.) into this single query at zero extra round-trip cost. The
999-
// joined columns are surfaced as a nested data.seo object via
1000-
// extractSeo() and excluded from the generic field mapping. SEO is
1001-
// 1:1 with content (PK on collection+content_id), so the join never
1002-
// multiplies rows.
1003-
const seoSelect = sql.join(
1004-
Object.entries(SEO_COLUMN_ALIASES).map(
1005-
([col, alias]) => sql`${sql.ref(`s.${col}`)} AS ${sql.ref(alias)}`,
1006-
),
1007-
);
1008-
// Fold byline + taxonomy hydration into the content query (see
1009-
// foldedHydrationSelects), removing the two separate hydration
1010-
// round trips per fetch.
997+
// Per-entry SEO (canonical, noindex, etc.) is fetched as a
998+
// follow-up query and folded onto the row via SEO_COLUMN_ALIASES,
999+
// preserving the data.seo shape that extractSeo() returns.
1000+
//
1001+
// We intentionally do NOT LEFT JOIN _emdash_seo here: that adds
1002+
// 5 alias columns to every result set, which can push wide
1003+
// flat-schema collections (common when porting from WordPress /
1004+
// ACF) past D1's per-result-set column limit (~100). The join
1005+
// failed with `D1_ERROR: too many columns in result set` and
1006+
// surfaced as a silent `null` entry at the call site. A separate
1007+
// SEO query is one extra round trip but is bounded in shape and
1008+
// works at any collection width.
1009+
//
1010+
// Byline + taxonomy hydration stays folded into the content
1011+
// query (see foldedHydrationSelects) because each is a single
1012+
// aggregated JSON column, so they add only two columns to the
1013+
// result set regardless of how many terms/credits an entry has.
10111014
const { terms: termsSelect, bylines: bylinesSelect } = foldedHydrationSelects(
10121015
db,
10131016
type,
10141017
"c",
10151018
);
10161019
const result = locale
10171020
? await sql<Record<string, unknown>>`
1018-
SELECT c.*, ${seoSelect}, ${termsSelect}, ${bylinesSelect}
1021+
SELECT c.*, ${termsSelect}, ${bylinesSelect}
10191022
FROM ${sql.ref(tableName)} AS c
1020-
LEFT JOIN ${sql.ref("_emdash_seo")} AS s
1021-
ON s.collection = ${type} AND s.content_id = c.id
10221023
WHERE c.deleted_at IS NULL
10231024
AND ((c.slug = ${id} AND c.locale = ${locale}) OR c.id = ${id})
10241025
LIMIT 1
10251026
`.execute(db)
10261027
: await sql<Record<string, unknown>>`
1027-
SELECT c.*, ${seoSelect}, ${termsSelect}, ${bylinesSelect}
1028+
SELECT c.*, ${termsSelect}, ${bylinesSelect}
10281029
FROM ${sql.ref(tableName)} AS c
1029-
LEFT JOIN ${sql.ref("_emdash_seo")} AS s
1030-
ON s.collection = ${type} AND s.content_id = c.id
10311030
WHERE c.deleted_at IS NULL
10321031
AND (c.slug = ${id} OR c.id = ${id})
10331032
LIMIT 1
@@ -1038,6 +1037,23 @@ export function emdashLoader(): LiveLoader<EntryData, EntryFilter, CollectionFil
10381037
return undefined;
10391038
}
10401039

1040+
// Fold SEO onto the row using the same aliases the join used,
1041+
// so extractSeo() reads it transparently. Missing SEO row is
1042+
// expected (LEFT JOIN behavior preserved): extractSeo() returns
1043+
// null when the noIndex column is missing.
1044+
const seoResult = await sql<Record<string, unknown>>`
1045+
SELECT seo_title, seo_description, seo_image, seo_canonical, seo_no_index
1046+
FROM ${sql.ref("_emdash_seo")}
1047+
WHERE collection = ${type} AND content_id = ${row.id}
1048+
LIMIT 1
1049+
`.execute(db);
1050+
const seoRow = seoResult.rows[0];
1051+
if (seoRow) {
1052+
for (const [col, alias] of Object.entries(SEO_COLUMN_ALIASES)) {
1053+
row[alias] = seoRow[col];
1054+
}
1055+
}
1056+
10411057
const i18nConfig = virtualConfig?.i18n;
10421058
const i18nEnabled = i18nConfig && i18nConfig.locales.length > 1;
10431059
const entrySlug = rowStr(row, "slug") || rowStr(row, "id");
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
import { it, expect, beforeEach, afterEach } from "vitest";
2+
3+
import { handleContentCreate } from "../../src/api/index.js";
4+
import { SchemaRegistry } from "../../src/schema/registry.js";
5+
import { SeoRepository } from "../../src/database/repositories/seo.js";
6+
import { emdashLoader } from "../../src/loader.js";
7+
import { runWithContext } from "../../src/request-context.js";
8+
import {
9+
describeEachDialect,
10+
setupForDialect,
11+
teardownForDialect,
12+
type DialectTestContext,
13+
} from "../utils/test-db.js";
14+
15+
/**
16+
* Regression test for #1600: loadEntry's SELECT shape on wide collections.
17+
*
18+
* When a per-collection `ec_*` table has many flat scalar columns (common when
19+
* porting from WordPress / ACF or other builders where every section is a
20+
* top-level field), the previous implementation did:
21+
*
22+
* SELECT c.*, <5 SEO alias columns> FROM ec_table c LEFT JOIN _emdash_seo s
23+
*
24+
* On Cloudflare D1 the per-query result-set column limit (~100) made this
25+
* fail with `D1_ERROR: too many columns in result set` for collections
26+
* around 95+ user columns. The loader's try/catch wrapped it as a generic
27+
* `Failed to load entry` error and the call site returned a silent `null`.
28+
*
29+
* The fix splits the query: fetch the row from the collection table without
30+
* a SEO join, then fetch SEO separately and fold it onto the row using the
31+
* same alias names extractSeo() reads. The result set stays bounded in width
32+
* regardless of how many fields the collection has.
33+
*
34+
* Run on both dialects to keep parity with loader-seo.test.ts.
35+
*/
36+
describeEachDialect("Loader on wide-schema collections (#1600)", (dialect) => {
37+
let ctx: DialectTestContext;
38+
let seoRepo: SeoRepository;
39+
const COLLECTION = "wide_collection";
40+
const USER_FIELD_COUNT = 95;
41+
42+
beforeEach(async () => {
43+
ctx = await setupForDialect(dialect);
44+
const registry = new SchemaRegistry(ctx.db);
45+
46+
// Create a collection with SEO enabled and a large number of flat
47+
// scalar fields. 95 user fields + 14 system columns + 5 SEO aliases
48+
// would have been ~114 result-set columns under the old LEFT JOIN
49+
// shape, well past D1's per-query limit.
50+
await registry.createCollection({
51+
slug: COLLECTION,
52+
label: "Wide Collection",
53+
labelSingular: "Wide Entry",
54+
});
55+
await registry.createField(COLLECTION, {
56+
slug: "title",
57+
label: "Title",
58+
type: "string",
59+
});
60+
for (let i = 1; i <= USER_FIELD_COUNT; i++) {
61+
await registry.createField(COLLECTION, {
62+
slug: `field_${i}`,
63+
label: `Field ${i}`,
64+
type: "string",
65+
});
66+
}
67+
// Enable SEO so extractSeo() has somewhere to read from.
68+
await ctx.db
69+
.updateTable("_emdash_collections")
70+
.set({ has_seo: 1 })
71+
.where("slug", "=", COLLECTION)
72+
.execute();
73+
74+
seoRepo = new SeoRepository(ctx.db);
75+
});
76+
77+
afterEach(async () => {
78+
await teardownForDialect(ctx);
79+
});
80+
81+
function load(idOrSlug: string) {
82+
const loader = emdashLoader();
83+
return runWithContext({ db: ctx.db }, () =>
84+
loader.loadEntry!({ filter: { type: COLLECTION, id: idOrSlug } }),
85+
);
86+
}
87+
88+
it("loads an entry from a collection with 95+ flat user columns", async () => {
89+
const data: Record<string, string> = { title: "Wide Entry" };
90+
for (let i = 1; i <= USER_FIELD_COUNT; i++) {
91+
data[`field_${i}`] = `value-${i}`;
92+
}
93+
const result = await handleContentCreate(ctx.db, COLLECTION, {
94+
data,
95+
status: "published",
96+
});
97+
if (!result.success) throw new Error("Failed to create entry");
98+
const slug = result.data!.item.slug!;
99+
100+
const loaded = await load(slug);
101+
102+
expect(loaded).toBeDefined();
103+
expect((loaded as { data: Record<string, unknown> }).data.title).toBe("Wide Entry");
104+
// Spot-check a handful of user fields across the range.
105+
const loadedData = (loaded as { data: Record<string, unknown> }).data;
106+
expect(loadedData.field_1).toBe("value-1");
107+
expect(loadedData.field_50).toBe("value-50");
108+
expect(loadedData.field_95).toBe("value-95");
109+
});
110+
111+
it("still attaches data.seo on wide collections (SEO follow-up query)", async () => {
112+
const data: Record<string, string> = { title: "Wide With SEO" };
113+
for (let i = 1; i <= USER_FIELD_COUNT; i++) {
114+
data[`field_${i}`] = `value-${i}`;
115+
}
116+
const result = await handleContentCreate(ctx.db, COLLECTION, {
117+
data,
118+
status: "published",
119+
});
120+
if (!result.success) throw new Error("Failed to create entry");
121+
const item = result.data!.item;
122+
123+
await seoRepo.upsert(COLLECTION, item.id, {
124+
noIndex: true,
125+
canonical: "https://example.com/wide",
126+
title: "Wide SEO Title",
127+
});
128+
129+
const loaded = await load(item.slug!);
130+
const loadedData = (loaded as { data: Record<string, unknown> }).data;
131+
const seo = loadedData.seo as Record<string, unknown> | undefined;
132+
133+
expect(seo).toBeDefined();
134+
expect(seo!.noIndex).toBe(true);
135+
expect(seo!.canonical).toBe("https://example.com/wide");
136+
expect(seo!.title).toBe("Wide SEO Title");
137+
});
138+
139+
it("omits data.seo when no SEO row exists, even on wide collections", async () => {
140+
const data: Record<string, string> = { title: "No SEO" };
141+
for (let i = 1; i <= USER_FIELD_COUNT; i++) {
142+
data[`field_${i}`] = `value-${i}`;
143+
}
144+
const result = await handleContentCreate(ctx.db, COLLECTION, {
145+
data,
146+
status: "published",
147+
});
148+
if (!result.success) throw new Error("Failed to create entry");
149+
const slug = result.data!.item.slug!;
150+
151+
const loaded = await load(slug);
152+
const loadedData = (loaded as { data: Record<string, unknown> }).data;
153+
154+
expect(loadedData.seo).toBeUndefined();
155+
});
156+
});

0 commit comments

Comments
 (0)