Skip to content

Commit f52f04a

Browse files
joaoh82claude
andauthored
feat(web): docs SEO + LLM-discoverability — llms.txt, Pagefind search, FAQ JSON-LD, docs events (SQLR-36) (#172)
- /llms.txt (curated index) + /llms-full.txt (full docs+blog markdown) + /docs.md (raw-markdown docs), all force-static, referenced from a hand-rolled /robots.txt route (replaces the typed robots.ts) - Pagefind docs search: postbuild index of .next/server/app scoped via data-pagefind-body, custom dark-themed UI on /docs - Heading anchors: H2 helper on /docs, rehype-slug + autolink on blog MDX - Blog articles: on-page ToC (>600 words), tag-ranked related posts - /docs: Related footer (5 links) + helpful-vote widget - Landing FAQ: 8 h3/p pairs mirrored into FAQPage JSON-LD - PostHog custom events: docs-search-query, docs-helpful-vote (no-op without NEXT_PUBLIC_POSTHOG_KEY) - Fixed the one bare code fence (ASCII diagram -> ```text) Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent d660b48 commit f52f04a

22 files changed

Lines changed: 1580 additions & 55 deletions

web/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@
1515
/.next/
1616
/out/
1717

18+
# pagefind search index — generated by the postbuild script
19+
/public/pagefind/
20+
1821
# production
1922
/build
2023

web/README.md

Lines changed: 88 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,13 @@ extracted into its own repository later without rewrites.
1515

1616
## Pages
1717

18-
- `/` — landing (hero with animated REPL, features, architecture, roadmap, SDK switcher, SQL surface, desktop showcase, blog series, footer)
18+
- `/` — landing (hero with animated REPL, features, architecture, roadmap, SDK switcher, SQL surface, desktop showcase, blog series, FAQ with `FAQPage` JSON-LD, footer)
1919
- `/playground` — in-browser SQL playground: the full engine compiled to WebAssembly, with a CodeMirror editor, sample datasets, HNSW vector search, and OPFS session persistence. The WASM bundle is a pinned copy of `sdk/wasm/pkg/` vendored into `public/playground/pkg/`. See [`../examples/wasm-playground/README.md`](../examples/wasm-playground/README.md).
20-
- `/docs` — Getting Started page (sticky sidebar nav + on-page TOC)
20+
- `/docs` — Getting Started page (sticky sidebar nav + on-page TOC, Pagefind search box, heading anchor links, related-links footer, helpful-vote widget)
21+
- `/docs.md` — the same docs content served as raw `text/markdown` for AI crawlers that prefer markdown over rendered HTML
22+
- `/llms.txt` + `/llms-full.txt` — LLM-discoverability surfaces per the [llms.txt convention](https://llmstxt.org/); see "LLM surfaces" below
2123
- `/blog` — index of long-form posts pulled from `content/blog/*.mdx`
22-
- `/blog/[slug]` — per-post detail page (MDX rendered server-side, `Article` JSON-LD, breadcrumb JSON-LD, dynamic OG image, prev/next navigation)
24+
- `/blog/[slug]` — per-post detail page (MDX rendered server-side, `Article` JSON-LD, breadcrumb JSON-LD, dynamic OG image, on-page ToC, heading anchors, related posts, helpful-vote widget, prev/next navigation)
2325
- `/blog/tags/[tag]` — tag pages (one per unique frontmatter tag)
2426
- `/blog/rss.xml` — RSS 2.0 feed
2527

@@ -51,14 +53,26 @@ Each public route ships full search/social metadata. The pieces:
5153
the apple icon are rasterized from the same play-glyph mark used in
5254
[`src/lib/og.tsx`](src/lib/og.tsx) and `.brand-mark`; regenerate them
5355
from `icon.svg` (e.g. with `sharp`) if the mark ever changes.
54-
- **`/sitemap.xml` + `/robots.txt`** — Next 15 metadata routes
55-
([`src/app/sitemap.ts`](src/app/sitemap.ts),
56-
[`src/app/robots.ts`](src/app/robots.ts)). Add a route to the `ROUTES`
57-
list when shipping a new page.
58-
- **JSON-LD structured data**`SoftwareApplication` schema on the landing
59-
page, `BreadcrumbList` on `/docs`, `Blog` on `/blog`, and
56+
- **`/sitemap.xml`** — Next 15 metadata route
57+
([`src/app/sitemap.ts`](src/app/sitemap.ts)). Add a route to the
58+
`STATIC_ROUTES` list when shipping a new page.
59+
- **`/robots.txt`** — a hand-rolled route handler
60+
([`src/app/robots.txt/route.ts`](src/app/robots.txt/route.ts)) rather
61+
than the typed metadata route, so it can reference the sitemap *and* the
62+
`llms.txt` surfaces (Next's `MetadataRoute.Robots` can't emit comments).
63+
- **JSON-LD structured data**`SoftwareApplication` + `FAQPage` schema on
64+
the landing page, `BreadcrumbList` on `/docs`, `Blog` on `/blog`, and
6065
`BlogPosting` + `BreadcrumbList` on each `/blog/<slug>`. Validate via
6166
Google's [Rich Results Test](https://search.google.com/test/rich-results).
67+
The FAQ lives in [`src/components/faq.tsx`](src/components/faq.tsx) — the
68+
visible `<h3>`/`<p>` pairs and the JSON-LD are generated from the same
69+
array, so they can't drift apart.
70+
- **Heading anchors** — every docs `h2` (via the local `H2` helper in
71+
[`src/app/docs/page.tsx`](src/app/docs/page.tsx)) and every blog heading
72+
(via `rehype-slug` + `rehype-autolink-headings` in
73+
[`src/components/blog-mdx.tsx`](src/components/blog-mdx.tsx)) carries a
74+
hover-visible `#` link, so sections are shareable and quotable with
75+
anchors.
6276
- **Search Console verification** — fill in the placeholder tokens in
6377
`metadata.verification` ([`src/app/layout.tsx`](src/app/layout.tsx)) once
6478
Google Search Console + Bing Webmaster Tools issue them.
@@ -72,6 +86,49 @@ export lives in [`seo/keywords.md`](seo/keywords.md). When rewriting a
7286
page's headline or meta description, update the corresponding entry in
7387
that sheet so future rewrites stay coordinated.
7488

89+
## Docs search (Pagefind)
90+
91+
`/docs` ships a client-side search box backed by
92+
[Pagefind](https://pagefind.app/) — static, serverless, no account needed.
93+
94+
- The index is generated by the `postbuild` script (`package.json`):
95+
`pagefind --site .next/server/app --output-path public/pagefind`. It
96+
indexes the **prerendered HTML** that `next build` leaves in
97+
`.next/server/app`, so it runs after every production build (Vercel
98+
included — npm runs `postbuild` automatically after `build`).
99+
- Only elements under a `data-pagefind-body` attribute are indexed — the
100+
docs main column and blog articles opt in; everything else (landing,
101+
playground, nav, footers) stays out of the index. `data-pagefind-ignore`
102+
carves out the CTA/related/vote footers.
103+
- The UI is the custom [`src/components/docs-search.tsx`](src/components/docs-search.tsx)
104+
(Pagefind's JS API, not its default UI) so it matches the site's dark
105+
styling. Raw result URLs point at the `.html` files Pagefind saw
106+
(`/docs.html`) and are mapped back to routes in `cleanUrl`.
107+
- `public/pagefind/` is gitignored — it's a build artifact. In `next dev`
108+
there is no index; the box degrades to a hint to run a build.
109+
110+
## LLM surfaces
111+
112+
Three build-time-generated plain-text routes make the site legible to AI
113+
crawlers (and are referenced as comments from `robots.txt`):
114+
115+
- **`/llms.txt`** ([`src/app/llms.txt/route.ts`](src/app/llms.txt/route.ts)) —
116+
curated index per [llmstxt.org](https://llmstxt.org/): project name +
117+
tagline, then sections of absolute links with one-line summaries (docs,
118+
playground, every blog post, GitHub/docs.rs/registries).
119+
- **`/llms-full.txt`** ([`src/app/llms-full.txt/route.ts`](src/app/llms-full.txt/route.ts)) —
120+
the full docs markdown + every blog post concatenated as one markdown
121+
file.
122+
- **`/docs.md`** ([`src/app/docs.md/route.ts`](src/app/docs.md/route.ts)) —
123+
the docs page as raw `text/markdown`.
124+
125+
All three are `force-static` (generated at build time) and built by
126+
[`src/lib/llms.ts`](src/lib/llms.ts). The markdown source for the docs
127+
content is [`content/docs/getting-started.md`](content/docs/getting-started.md)
128+
**it mirrors `src/app/docs/page.tsx`**; when you change a docs section,
129+
update the matching section there too. Blog content needs no mirroring (the
130+
MDX sources are the truth).
131+
75132
## Local development
76133

77134
```sh
@@ -93,7 +150,8 @@ npm run lint # next lint (ESLint)
93150
```
94151
web/
95152
├── content/
96-
│ └── blog/ # MDX posts (one .mdx file per post; frontmatter at top)
153+
│ ├── blog/ # MDX posts (one .mdx file per post; frontmatter at top)
154+
│ └── docs/ # markdown mirror of /docs — feeds /docs.md + /llms-full.txt
97155
├── seo/
98156
│ └── keywords.md # keyword research + per-page primary/secondary registry (SQLR-33)
99157
├── src/
@@ -105,10 +163,15 @@ web/
105163
│ │ ├── docs/page.tsx # /docs
106164
│ │ ├── blog/ # /blog index, [slug] detail, tags/[tag], rss.xml
107165
│ │ ├── sitemap.ts # /sitemap.xml — enumerates static + per-post + per-tag URLs
108-
│ │ └── robots.ts # /robots.txt
166+
│ │ ├── robots.txt/ # /robots.txt route handler (sitemap + llms.txt refs)
167+
│ │ ├── llms.txt/ # /llms.txt — curated LLM index (llmstxt.org)
168+
│ │ ├── llms-full.txt/ # /llms-full.txt — full docs+blog markdown concat
169+
│ │ └── docs.md/ # /docs.md — docs page as raw markdown
109170
│ ├── components/ # one .tsx per landing section (hero, features, roadmap, …)
110171
│ └── lib/
172+
│ ├── analytics.ts # useTrack() — PostHog capture, no-ops without the key
111173
│ ├── blog.ts # MDX loader: frontmatter parsing, post enumeration, tag helpers
174+
│ ├── llms.ts # builders for /llms.txt, /llms-full.txt, /docs.md
112175
│ ├── og.tsx # shared OpenGraph frame
113176
│ ├── site.ts # SITE constants (version, repo URL, social links)
114177
│ └── utils.ts # shadcn cn() helper
@@ -277,6 +340,20 @@ PostHog is wired up in two places:
277340
If the env var is absent the provider is omitted and the middleware
278341
falls through to `NextResponse.next()`.
279342

343+
Beyond pageviews/autocapture, two custom events (SQLR-36) flow through the
344+
`useTrack()` hook in [`src/lib/analytics.ts`](src/lib/analytics.ts), which
345+
no-ops when the key is absent:
346+
347+
- **`docs-search-query`** — fired from the docs search box once typing
348+
settles (~1.2 s debounce); properties: `query`, `results` (count). Zero-
349+
result queries are the signal for missing docs.
350+
- **`docs-helpful-vote`** — fired by the "Was this page helpful?" widget on
351+
`/docs` and every blog post; properties: `path`, `helpful` (boolean). No
352+
PII either way.
353+
354+
Worth a weekly skim in PostHog: most-visited docs pages, searches with
355+
`results = 0`, and pages accumulating `helpful = false` votes.
356+
280357
### Privacy / compliance follow-ups
281358

282359
The current wiring uses PostHog defaults — autocapture + cookies — and

web/content/blog/shipping-concurrent-writes-mvcc-v010.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ distinguished from page frames by the sentinel `page_num = u32::MAX`
229229
The frame body encodes the commit timestamp plus a record stream of
230230
the resolved write-set:
231231

232-
```
232+
```text
233233
┌────────┬────────┬─────────────────────────────────────────────────┐
234234
│ offset │ length │ content │
235235
├────────┼────────┼─────────────────────────────────────────────────┤

0 commit comments

Comments
 (0)