Skip to content

fix: make SPA routes crawlable at source (sitemap script + pre-render)#429

Merged
rdmueller merged 4 commits into
LLM-Coding:mainfrom
raifdmueller:fix/sitemap-clean-urls-root-cause
Apr 13, 2026
Merged

fix: make SPA routes crawlable at source (sitemap script + pre-render)#429
rdmueller merged 4 commits into
LLM-Coding:mainfrom
raifdmueller:fix/sitemap-clean-urls-root-cause

Conversation

@raifdmueller

@raifdmueller raifdmueller commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Root-cause fix for the recurring "workflow page is still dynamic" issue. Two separate layers of the problem, both addressed:

Layer 1: The sitemap script was never fixed

Previous fixes (#371, #384) edited the output `sitemap.xml` directly but never touched `scripts/generate-sitemap.js`. Every `npm run build` ran the script and regenerated the sitemap with the broken hash URLs — silently reverting each fix. This is why the bug kept coming back.

Fixed: `scripts/generate-sitemap.js` now produces clean, crawlable URLs matching the History API router and includes all 11 pages (was 3). 141 URLs total.

Layer 2: Clean URLs alone don't help non-JS fetchers

Even with `/workflow` as a clean URL, GitHub Pages was serving the empty SPA shell — so pure HTTP fetchers (claude.ai, curl, basic crawlers that don't execute JS) saw nothing.

Fixed: New `scripts/prerender-routes.js` runs after `vite build` and generates `dist//index.html` for each of 9 doc routes. The pre-rendered AsciiDoc fragment from `render-docs.js` is injected into a static copy of the Vite shell with per-route `<title>`, ``, and canonical URL.

When GitHub Pages serves `dist/workflow/index.html` for `/workflow`:

  • Crawlers / claude.ai / curl see real content immediately in the initial HTML response
  • Users with JS get the SPA booting on top, which wipes `#app` and re-renders as usual — interactive UX is unchanged

Changes

File Change
`scripts/generate-sitemap.js` Removed `#/` hash prefix, added all 11 pages
`scripts/prerender-routes.js` New — post-build pre-render step
`website/package.json` Chained `node ../scripts/prerender-routes.js` into `build`
`website/public/sitemap.xml` Regenerated (for immediate effect)

Pre-rendered routes

/about, /workflow, /brownfield, /changelog, /contributing, /agentskill, /rejected-proposals, /all-anchors, /evaluations

Test plan

  • `scripts/generate-sitemap.js` produces clean URLs
  • Build generates `dist//index.html` for all 9 doc routes
  • `dist/workflow/index.html` contains real workflow content (29 content markers)
  • Title, description, canonical URL are per-route
  • Unit tests pass (89/89)
  • E2E tests pass (will run in CI)
  • After deploy: `curl https://llm-coding.github.io/Semantic-Anchors/workflow\` returns pre-rendered content without JS execution

🤖 Generated with Claude Code

Summary by CodeRabbit

Versionshinweise

  • New Features

    • Erweiterte Sitemap mit zusätzlichen Seiten und sauberen URL-Pfaden statt Hash-basierter Routen für bessere SEO-Unterstützung.
    • Optimierte Vorrendering für statische Seiten zur Verbesserung der Website-Leistung und Indexierbarkeit.
  • Refactor

    • Aktualisiert Sitemap-Generator für konsistente URL-Verwaltung und dynamische Pfadkonstruktion.

Root-cause fix for the recurring "workflow page is still dynamic" issue.

## The recurring bug

Previous fix attempts (LLM-Coding#371, LLM-Coding#384) edited the output sitemap.xml directly
but never touched scripts/generate-sitemap.js. Every time npm run build
ran the script, it regenerated the sitemap with the broken hash URLs —
silently reverting the fix.

## What changed

### 1. scripts/generate-sitemap.js — the actual root cause

- Removed hash prefix (#/) from all generated URLs — they were not
  crawlable (hash fragments look like the homepage to search engines
  and pure HTTP fetchers like claude.ai).
- Added all 11 router pages (was 3): Workflow, Brownfield, Contracts,
  Evaluations, Changelog, AgentSkill, Rejected Proposals, All Anchors.
- Matches website/src/utils/router.js History API routes.

### 2. scripts/prerender-routes.js — the deeper problem

Even with clean URLs, /workflow returned the SPA shell with empty #app.
Crawlers and non-JS fetchers (claude.ai, curl, LLM retrievers) saw
nothing. Only modern JS-executing crawlers (Googlebot) could eventually
index the content.

New post-build step generates per-route static HTML (dist/<route>/index.html)
for all 9 doc routes by injecting the pre-rendered AsciiDoc fragment
from render-docs.js into the Vite shell. Also updates the per-route
<title>, meta description, and canonical URL.

GitHub Pages serves dist/workflow/index.html when /workflow is requested.
The SPA boots on top of the pre-rendered content, wipes #app, and
re-renders normally — so interactive UX is unchanged. Crawlers see real
content immediately.

### 3. website/package.json

Chained the pre-render step into the build command so it can't be
forgotten: `vite build && node ../scripts/prerender-routes.js`.

## Result

- sitemap.xml: 141 crawlable URLs (11 pages + 130 anchors)
- dist/workflow/index.html, dist/brownfield/index.html, dist/about/...
  each contain the real content, title, meta description, canonical URL
- Unit tests pass (89/89)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@raifdmueller has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 24 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 40 minutes and 24 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: e74595b0-45ed-4a8e-bdb0-a09f682f0529

📥 Commits

Reviewing files that changed from the base of the PR and between 4a9a72d and 9253168.

📒 Files selected for processing (3)
  • scripts/generate-sitemap.js
  • scripts/prerender-routes.js
  • website/public/sitemap.xml

Walkthrough

Das Pull Request aktualisiert die Sitemap-Generierung für saubere URLs statt Hash-Fragmente, fügt ein neues Prerendering-Skript für statische HTML-Seiten pro Route hinzu und modifiziert die Build-Pipeline, um das Prerendering nach dem Vite-Build auszuführen.

Changes

Cohort / File(s) Summary
Sitemap-Generierung
scripts/generate-sitemap.js, website/public/sitemap.xml
Sitemap-Generierung refaktoriert von Hash-basierten Routen zu sauberen Pfaden; PAGES-Liste mit Metadaten eingeführt, urlEntry()-Helper für konsistente XML-Rendering hinzugefügt, BASE_URL + page.path für URL-Konstruktion verwendet. Sitemap-XML mit neuen Routen (/workflow, /brownfield, /contracts etc.), aktualisierten changefreq/priority-Werten und lastmod-Datum aktualisiert.
Prerendering & Build-Pipeline
scripts/prerender-routes.js, website/package.json
Neues Prerendering-Skript hinzugefügt, das statische HTML-Seiten pro Route unter website/dist/<route>/index.html generiert; liest Vite-Shell, aktualisiert <title>, Meta-Tags und kanonische Links, injiziert Fragment-HTML in #app-Div. Build-Skript in package.json um Prerendering-Schritt erweitert.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Der PR-Titel beschreibt präzise die Hauptänderungen: Behebung von SPA-Routen durch ein Sitemap-Skript und Pre-Rendering, was der detaillierten Zusammenfassung entspricht.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/prerender-routes.js`:
- Around line 35-98: The ROUTES array in scripts/prerender-routes.js is missing
the '/contracts' entry so the prerender step doesn't produce
dist/contracts/index.html; add a matching route object (path: '/contracts',
fragment and metadata matching the existing route in
website/src/utils/router.js) into the ROUTES constant so prerendering generates
the static page, or alternatively remove '/contracts' from the crawlable list in
scripts/generate-sitemap.js to avoid advertising a non-prerendered URL;
reference the ROUTES constant and the '/contracts' path and coordinate with
website/src/utils/router.js and scripts/generate-sitemap.js when making the
change.
- Around line 136-139: The code currently only warns when a configured fragment
is missing (the fs.existsSync(fragmentPath) check) which allows builds to
succeed with incomplete prerendered routes; change this to fail fast by throwing
an Error or returning a rejected promise so the build exits non‑zero: replace
the console.warn/return false with a throw new Error(`Missing fragment for
${route.path}: ${route.fragment} (expected at ${fragmentPath})`) or ensure the
surrounding function (where fragmentPath and route are used) propagates the
error to the process so the build fails. Ensure the thrown error references
fragmentPath, route.path and route.fragment so logs show exact missing fragment
details.

In `@website/package.json`:
- Line 11: The build script ("build": "vite build && node
../scripts/prerender-routes.js") pulls in ../scripts/prerender-routes.js but
your lint/format gates still only target src/; update the project's lint/format
config and npm scripts so the ESLint and Prettier checks include the external
scripts (e.g., add ../scripts/**/*.js to the globs used by "lint", "lint:fix",
"format" and "format:check"), or move/duplicate the prerender script inside the
checked source; ensure the referenced symbols are "build", "lint", "lint:fix",
"format", "format:check" and the external script path
../scripts/prerender-routes.js so the new files are covered by Tier‑1 gates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 43ca92b7-2bc6-4b20-87a1-612811c9e6a6

📥 Commits

Reviewing files that changed from the base of the PR and between debe6e2 and 4a9a72d.

📒 Files selected for processing (4)
  • scripts/generate-sitemap.js
  • scripts/prerender-routes.js
  • website/package.json
  • website/public/sitemap.xml

Comment thread scripts/prerender-routes.js
Comment thread scripts/prerender-routes.js
Comment thread website/package.json
"dev": "vite",
"prebuild": "node ../scripts/sync-anchors.js && node ../scripts/render-docs.js",
"build": "vite build",
"build": "vite build && node ../scripts/prerender-routes.js",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Der neue Build-Pfad liegt außerhalb eurer Lint-/Prettier-Gates.

Mit node ../scripts/prerender-routes.js hängt der Produktionsbuild jetzt von zusätzlichem scripts/**/*.js ab, aber lint, lint:fix, format und format:check prüfen weiter nur src/. Syntax- und Formatfehler in den neuen Build-Skripten werden damit nicht von den Tier-1-Gates abgefangen.

Vorschlag
-    "lint": "eslint src/",
-    "lint:fix": "eslint src/ --fix",
-    "format": "prettier --write src/",
-    "format:check": "prettier --check src/"
+    "lint": "eslint src/ ../scripts/",
+    "lint:fix": "eslint src/ ../scripts/ --fix",
+    "format": "prettier --write src/ ../scripts/",
+    "format:check": "prettier --check src/ ../scripts/"
As per coding guidelines: `**/*.js`: All projects must implement ESLint configuration and enforce formatting with Prettier to achieve Tier 1 automated gates as documented in Risk Radar assessment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/package.json` at line 11, The build script ("build": "vite build &&
node ../scripts/prerender-routes.js") pulls in ../scripts/prerender-routes.js
but your lint/format gates still only target src/; update the project's
lint/format config and npm scripts so the ESLint and Prettier checks include the
external scripts (e.g., add ../scripts/**/*.js to the globs used by "lint",
"lint:fix", "format" and "format:check"), or move/duplicate the prerender script
inside the checked source; ensure the referenced symbols are "build", "lint",
"lint:fix", "format", "format:check" and the external script path
../scripts/prerender-routes.js so the new files are covered by Tier‑1 gates.

Addresses CodeRabbit pre-merge check for docstring coverage.
1. Drop /contracts from the sitemap.
   /contracts is a fully interactive JS page (localStorage, client-side
   filter UI, async data fetch) — it has no static content worth
   pre-rendering. Advertising it in the sitemap as a crawlable URL would
   return an empty SPA shell to non-JS fetchers (claude.ai, curl), which
   is exactly what this PR aims to avoid. Googlebot can still find it
   via regular navigation from the homepage.

2. Fail fast on missing fragments in prerender-routes.js.
   A console.warn + return false let the build ship with an incomplete
   set of static pages. Now throws a descriptive Error so the build
   exits non-zero if render-docs.js fails to produce a fragment this
   script expects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants