Skip to content

Commit 7fd0773

Browse files
committed
fix(seo): fix robots.txt, add 404 handling, and create llms.txt
- Fix malformed robots.txt: add User-agent/Allow directives, block AI training crawlers (CCBot, Google-Extended, Bytespider, Applebot-Extended, meta-externalagent) while allowing retrieval bots - Add proper 404 handling: generate 404.html in prerender script for Cloudflare Pages, add top-level catch-all route with NotFound page - Create /llms.txt with site description and doc structure for LLMs
1 parent 226ea98 commit 7fd0773

File tree

5 files changed

+130
-1
lines changed

5 files changed

+130
-1
lines changed

apps/web/public/llms.txt

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# ooxml.dev
2+
3+
> The OOXML spec, explained by people who actually implemented it.
4+
5+
ooxml.dev is an interactive reference for ECMA-376 (Office Open XML), the standard behind .docx, .xlsx, and .pptx files. It features live previews, real-world implementation gotchas, and AI-powered spec search. Built by the SuperDoc team (https://www.superdoc.dev).
6+
7+
## Main Pages
8+
9+
- [Home](https://ooxml.dev/): Overview and entry point
10+
- [Docs](https://ooxml.dev/docs/): Interactive OOXML reference documentation
11+
- [MCP Server](https://ooxml.dev/mcp/): Connect AI assistants to search 18,000+ ECMA-376 specification chunks via MCP
12+
- [Spec Explorer](https://ooxml.dev/spec/): Search and browse the ECMA-376 specification with semantic search
13+
14+
## Documentation
15+
16+
### Getting Started
17+
- [Introduction](https://ooxml.dev/docs/): Learn the basics of OOXML and how to use this reference
18+
19+
### WordprocessingML
20+
- [Paragraphs](https://ooxml.dev/docs/paragraphs/): Text structure and formatting — paragraphs, runs, and text elements (w:p)
21+
- [Paragraph Borders](https://ooxml.dev/docs/paragraph-borders/): Border properties, between-border groups, nil/none semantics (w:pBdr)
22+
- [Tables](https://ooxml.dev/docs/tables/): Table structure, properties, and implementation gotchas (w:tbl)
23+
- [Styles](https://ooxml.dev/docs/styles/): Style definitions, inheritance, and resolution (styles.xml)
24+
25+
### Guides
26+
- [Creating Documents](https://ooxml.dev/docs/creating-documents/): Step-by-step guide to creating a valid OOXML document from scratch
27+
- [Common Gotchas](https://ooxml.dev/docs/common-gotchas/): Real-world implementation issues and how to solve them
28+
29+
## MCP Server
30+
31+
The ooxml.dev MCP server provides AI assistants with access to the ECMA-376 specification. Available tools:
32+
33+
- `search_ecma_spec`: Semantic search across 18,000+ specification chunks
34+
- `get_section`: Retrieve a specific section by ID (e.g., "17.3.2" for paragraph properties)
35+
- `list_parts`: Browse the specification structure by part (1-4)
36+
37+
## About
38+
39+
ooxml.dev is built and maintained by the SuperDoc team (https://www.superdoc.dev). SuperDoc is a document engine that implements the OOXML specification. The content on ooxml.dev comes from real-world experience building a full OOXML renderer and editor.

apps/web/public/robots.txt

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,24 @@
1+
# https://ooxml.dev robots.txt
2+
3+
User-agent: *
4+
Allow: /
5+
6+
# AI training crawlers - blocked
7+
User-agent: CCBot
8+
Disallow: /
9+
10+
User-agent: Google-Extended
11+
Disallow: /
12+
13+
User-agent: Bytespider
14+
Disallow: /
15+
16+
User-agent: Applebot-Extended
17+
Disallow: /
18+
19+
User-agent: meta-externalagent
20+
Disallow: /
21+
22+
# For AI-friendly content, see /llms.txt
23+
124
Sitemap: https://ooxml.dev/sitemap.xml

apps/web/scripts/prerender.ts

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,29 @@ ${urls.join("\n")}
223223
</urlset>`;
224224
}
225225

226+
// --- 404 page ---
227+
228+
function render404Page(): string {
229+
const title = "Page Not Found | ooxml.dev";
230+
const description = "The page you're looking for doesn't exist.";
231+
const content = `<main style="max-width:640px;margin:0 auto;padding:4rem 1rem;text-align:center">
232+
<h1 style="font-size:2rem;font-weight:bold;margin-bottom:1rem">404 — Page Not Found</h1>
233+
<p style="margin-bottom:2rem;opacity:0.7">The page you're looking for doesn't exist or has been moved.</p>
234+
<a href="/" style="color:inherit;text-decoration:underline">Go to homepage</a>
235+
<span style="margin:0 0.5rem;opacity:0.5">·</span>
236+
<a href="/docs" style="color:inherit;text-decoration:underline">Browse docs</a>
237+
</main>`;
238+
239+
let html = template;
240+
html = html.replace(/<title>[^<]*<\/title>/, `<title>${escapeHtml(title)}</title>`);
241+
html = html.replace(
242+
"</head>",
243+
` <meta name="description" content="${escapeHtml(description)}"/>\n <meta name="robots" content="noindex"/>\n </head>`,
244+
);
245+
html = html.replace('<div id="root"></div>', `<div id="root">${content}</div>`);
246+
return html;
247+
}
248+
226249
// --- Main ---
227250

228251
const paths = getAllPaths();
@@ -239,9 +262,14 @@ for (const path of paths) {
239262
console.log(` ✓ ${path}`);
240263
}
241264

265+
// Generate 404 page (Cloudflare Pages serves this with 404 status)
266+
const notFoundHtml = render404Page();
267+
writeFileSync(resolve(DIST, "404.html"), notFoundHtml);
268+
console.log(` ✓ /404.html`);
269+
242270
// Generate sitemap
243271
const sitemap = generateSitemap(paths);
244272
writeFileSync(resolve(DIST, "sitemap.xml"), sitemap);
245273
console.log(` ✓ /sitemap.xml`);
246274

247-
console.log(`\nPre-rendered ${count} pages + sitemap.`);
275+
console.log(`\nPre-rendered ${count} pages + 404 + sitemap.`);

apps/web/src/main.tsx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import { DocsLayout } from "./pages/docs/Layout";
55
import { DocsPage } from "./pages/docs/Page";
66
import { Home } from "./pages/Home";
77
import { Mcp } from "./pages/Mcp";
8+
import { NotFound } from "./pages/NotFound";
89
import { SpecExplorer } from "./pages/SpecExplorer";
910
import "./index.css";
1011

@@ -23,6 +24,7 @@ const router = createBrowserRouter([
2324
{ path: "*", element: <DocsPage /> },
2425
],
2526
},
27+
{ path: "*", element: <NotFound /> },
2628
],
2729
},
2830
]);

apps/web/src/pages/NotFound.tsx

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import { Link } from "react-router-dom";
2+
import { Navbar } from "../components/Navbar";
3+
import { Footer } from "../components/Footer";
4+
import { useDocumentTitle } from "../hooks/useDocumentTitle";
5+
6+
export function NotFound() {
7+
useDocumentTitle("Page Not Found | ooxml.dev");
8+
9+
return (
10+
<div className="min-h-screen bg-[var(--color-bg-primary)]">
11+
<Navbar maxWidth />
12+
13+
<main className="mx-auto max-w-xl px-4 py-24 text-center">
14+
<h1 className="mb-4 text-3xl font-bold">404 — Page Not Found</h1>
15+
<p className="mb-8 text-[var(--color-text-secondary)]">
16+
The page you're looking for doesn't exist or has been moved.
17+
</p>
18+
<div className="flex justify-center gap-4">
19+
<Link
20+
to="/"
21+
className="rounded-lg bg-[var(--color-accent)] px-5 py-2.5 font-medium text-white transition hover:bg-[var(--color-accent-hover)]"
22+
>
23+
Go to Homepage
24+
</Link>
25+
<Link
26+
to="/docs"
27+
className="rounded-lg border border-[var(--color-border)] px-5 py-2.5 font-medium text-[var(--color-text-primary)] transition hover:bg-[var(--color-bg-secondary)]"
28+
>
29+
Browse Docs
30+
</Link>
31+
</div>
32+
</main>
33+
34+
<Footer />
35+
</div>
36+
);
37+
}

0 commit comments

Comments
 (0)