- GEO Endpoints Documentation
Generative Engine Optimization (GEO) is the practice of making your website's content discoverable and consumable by AI-powered search engines and large language models (LLMs). While traditional SEO focuses on ranking in conventional search engines like Google, GEO ensures that AI systems -- such as ChatGPT, Claude, Perplexity, and Google AI Overviews -- can find, understand, and accurately represent your content.
GEO sits alongside two related disciplines:
| Discipline | Full Name | Focus |
|---|---|---|
| SEO | Search Engine Optimization | Ranking in traditional search results (Google, Bing) |
| AEO | Answer Engine Optimization | Appearing in featured snippets and direct-answer panels |
| GEO | Generative Engine Optimization | Being discovered and cited by AI/LLM-powered engines |
AI crawlers need a structured way to discover what data a site exposes for machine consumption. Just as robots.txt tells traditional crawlers what they can and cannot access, the /.well-known/ai.txt file serves as the "robots.txt for AI" -- it declares crawler permissions and provides an index of all AI-specific endpoints on the site.
This repository implements the following GEO endpoints across its starter applications:
| # | Endpoint | Purpose |
|---|---|---|
| 1 | /.well-known/ai.txt |
AI crawler permissions and endpoint index |
| 2 | /ai/summary.json |
Structured site/page summary for LLMs |
| 3 | /ai/faq.json |
FAQ data in structured format for answer engines |
| 4 | /ai/service.json |
Service/business metadata for AI crawlers |
| 5 | sitemap-llm.xml |
LLM-specific sitemap for AI content discovery |
The /.well-known/ai.txt file is the AI equivalent of robots.txt. It serves two primary functions:
- Crawler Permissions -- Explicitly allows or disallows specific AI crawlers from accessing parts of the site.
- Endpoint Discovery -- Lists all AI-specific data endpoints (
/ai/*.json) so LLMs know where to fetch structured data.
Specification:
| Property | Value |
|---|---|
| Location | /.well-known/ai.txt |
| Content-Type | text/plain; charset=utf-8 |
| Cache-Control | public, max-age=86400, s-maxage=86400 (24 hours) |
| Security Header | X-Content-Type-Options: nosniff |
| Authentication | None required -- publicly accessible |
| Generation | Dynamic (generated per-request via force-dynamic) |
When a request is made to https://www.example.com/.well-known/ai.txt, the endpoint returns:
# AI Crawler Permissions for https://www.example.com
User-Agent: *
Allow: /
User-Agent: GPTBot
Allow: /
User-Agent: Claude-Web
Allow: /
User-Agent: Anthropic-AI
Allow: /
User-Agent: Google-Extended
Allow: /
User-Agent: CCBot
Allow: /
User-Agent: PerplexityBot
Allow: /
Disallow: /api/editing/
Disallow: /sitecore/
AI-Endpoint: https://www.example.com/ai/summary.json
AI-Endpoint: https://www.example.com/ai/faq.json
AI-Endpoint: https://www.example.com/ai/service.json
Sitemap: https://www.example.com/sitemap-llm.xml
Sitemap: https://www.example.com/sitemap.xml
Last-Modified: 2026-02-06
The site URL (https://www.example.com above) is dynamically resolved from the incoming request headers. The Last-Modified date is auto-generated as the current date in ISO format.
The endpoint is implemented as a Next.js App Router API route at:
src/app/api/well-known/ai-txt/route.ts
Key aspects of the implementation:
export const dynamic = 'force-dynamic'-- Prevents Next.js from statically generating this route at build time; the content is generated fresh on each request.generateAiTxtContent(siteUrl)-- Pure function that builds the plaintext content, inserting the resolved site URL into all endpoint references.resolveSiteUrl(request)-- Determines the site's base URL using the following fallback chain:hostorx-forwarded-hostrequest header (combined withx-forwarded-protoor defaulting tohttps)- First entry's
hostNamefrom.sitecore/sites.json request.nextUrl.originas final fallback
- Response headers -- Sets
Content-Type,Cache-Control, andX-Content-Type-Options. - Error handling -- Catches errors and returns a
500response with a plaintext error message.
The Pages Router starter (basic-nextjs-pages-router) implements the same endpoint at:
src/pages/api/well-known/ai-txt.ts
In each starter's next.config.ts (or next.config.js for Pages Router), a rewrite maps the well-known path to the API route:
{
source: '/.well-known/ai.txt',
destination: '/api/well-known/ai-txt',
locale: false,
}The locale: false setting ensures the rewrite works regardless of any i18n locale prefix (e.g., /en/.well-known/ai.txt is not required).
The /.well-known/ai.txt endpoint is implemented in all 6 starters:
| Starter | Router | Route File | Config File |
|---|---|---|---|
basic-nextjs |
App Router | examples/basic-nextjs/src/app/api/well-known/ai-txt/route.ts |
examples/basic-nextjs/next.config.ts |
kit-nextjs-article-starter |
App Router | examples/kit-nextjs-article-starter/src/app/api/well-known/ai-txt/route.ts |
examples/kit-nextjs-article-starter/next.config.ts |
kit-nextjs-location-finder |
App Router | examples/kit-nextjs-location-finder/src/app/api/well-known/ai-txt/route.ts |
examples/kit-nextjs-location-finder/next.config.ts |
kit-nextjs-product-listing |
App Router | examples/kit-nextjs-product-listing/src/app/api/well-known/ai-txt/route.ts |
examples/kit-nextjs-product-listing/next.config.ts |
kit-nextjs-skate-park |
App Router | examples/kit-nextjs-skate-park/src/app/api/well-known/ai-txt/route.ts |
examples/kit-nextjs-skate-park/next.config.ts |
basic-nextjs-pages-router |
Pages Router | examples/basic-nextjs-pages-router/src/pages/api/well-known/ai-txt.ts |
examples/basic-nextjs-pages-router/next.config.js |
The ai.txt file includes explicit permissions for the following AI crawlers:
| User-Agent | Organization | Purpose |
|---|---|---|
* (wildcard) |
All crawlers | Default allow rule for any unspecified crawler |
GPTBot |
OpenAI | Crawler used by ChatGPT and OpenAI services |
Claude-Web |
Anthropic | Web crawler for Claude AI |
Anthropic-AI |
Anthropic | General Anthropic AI crawler |
Google-Extended |
Crawler used for Google AI features (Gemini, AI Overviews) | |
CCBot |
Common Crawl | Open-source web crawler used to build AI training datasets |
PerplexityBot |
Perplexity AI | Crawler for the Perplexity AI search engine |
Disallow rules prevent AI crawlers from accessing sensitive areas:
| Path | Reason |
|---|---|
/api/editing/ |
Sitecore XM Cloud editing API -- internal only, not intended for public consumption |
/sitecore/ |
Sitecore system paths -- administrative and internal routes |
The /ai/summary.json endpoint provides an authoritative, machine-readable summary of the site so that AI systems can quickly understand what the site is about without crawling its entire content. Think of it as a structured "elevator pitch" for LLMs.
The description field is capped at 800 characters to ensure it remains concise and suitable for inclusion in AI context windows.
Specification:
| Property | Value |
|---|---|
| Location | /ai/summary.json |
| Content-Type | application/json; charset=utf-8 |
| Cache-Control | public, max-age=86400 (24 hours) |
| Authentication | None required -- publicly accessible |
| Generation | Dynamic -- 3 kit starters fetch from Experience Edge via GraphQL; skate-park uses hardcoded content |
| Description Limit | 800 characters maximum |
When a request is made to https://www.example.com/ai/summary.json, the endpoint returns:
{
"title": "Solterra & Co.",
"description": "Solterra & Co. is a lifestyle and editorial site showcasing content-driven experiences for modern brands. It features modular components, article and topic listings, hero and promo sections, and rich media. Built with Sitecore XM Cloud and Next.js for performance, personalization, and AI-ready content delivery.",
"lastModified": "2026-02-06T12:00:00.000Z"
}Each starter returns its own site-specific content:
| Starter | Title | Description Summary |
|---|---|---|
| kit-nextjs-article-starter | "Solterra & Co." | Lifestyle and editorial site for modern brands |
| kit-nextjs-location-finder | "Alaris" | Car brand site with dealer/location finder |
| kit-nextjs-product-listing | "SYNC" | Product-focused site for audio gear companies |
| kit-nextjs-skate-park | "Skate Park" | Demo site showcasing component examples |
The response payload follows the SummaryJsonPayload interface:
export interface SummaryJsonPayload {
title: string; // The site or brand name
description: string; // Short summary, max 800 characters
lastModified: string; // ISO 8601 timestamp of when the response was generated
}Three kit starters (article-starter, product-listing, location-finder) use a two-file pattern to fetch summary data from Experience Edge:
- Library file (
src/lib/summary-from-edge.ts) -- ContainsfetchSummaryFromEdge(), which queries Experience Edge viaSitecoreClient.getDatafor a single item at the content path/Data/AI Config/Summary. Returns{ title, description }ornullif the item is not found. - Route handler (
src/app/api/ai/summary/route.ts) -- CallsfetchSummaryFromEdge(), appliesensureDescriptionLength()to enforce the 800-character limit, and returns the JSON response. If Edge returns nothing, the fields default to empty strings.
The content path is dynamically constructed from the configured site name (NEXT_PUBLIC_DEFAULT_SITE_NAME):
| Starter | Content Path Pattern |
|---|---|
| article-starter | /sitecore/content/solterra/{siteName}/Data/AI Config/Summary |
| product-listing | /sitecore/content/sync/{siteName}/Data/AI Config/Summary |
| location-finder | /sitecore/content/alaris/{siteName}/Data/AI Config/Summary |
The GraphQL query uses the AISummary fragment type to access custom template fields (title and description), both returned as jsonValue scalars.
The skate-park starter still uses the original single-file pattern with hardcoded title and description strings directly in the route handler.
The endpoint is implemented as a Next.js App Router API route at:
src/app/api/ai/summary/route.ts
Key aspects of the implementation:
MAX_DESCRIPTION_LENGTH = 800-- Constant defining the maximum allowed description length per the GEO contract.ensureDescriptionLength(description, maxLength)-- Utility function that trims whitespace and truncates descriptions exceeding the limit, appending...to indicate truncation.GET()handler -- CallsfetchSummaryFromEdge()(in the 3 refactored starters) and returns a JSON response with theSummaryJsonPayload.- Response headers -- Sets
Content-Type: application/json; charset=utf-8andCache-Control: public, max-age=86400.
In each kit starter's next.config.ts, a rewrite maps the public-facing path to the API route:
{
source: '/ai/summary.json',
destination: '/api/ai/summary',
locale: false,
}The locale: false setting ensures the path works without any i18n locale prefix.
The /ai/summary.json endpoint is implemented in 4 kit starters only:
| Starter | Route File | Library File | Data Source |
|---|---|---|---|
kit-nextjs-article-starter |
examples/kit-nextjs-article-starter/src/app/api/ai/summary/route.ts |
src/lib/summary-from-edge.ts |
Experience Edge |
kit-nextjs-location-finder |
examples/kit-nextjs-location-finder/src/app/api/ai/summary/route.ts |
src/lib/summary-from-edge.ts |
Experience Edge |
kit-nextjs-product-listing |
examples/kit-nextjs-product-listing/src/app/api/ai/summary/route.ts |
src/lib/summary-from-edge.ts |
Experience Edge |
kit-nextjs-skate-park |
examples/kit-nextjs-skate-park/src/app/api/ai/summary/route.ts |
-- | Hardcoded in route |
The /ai/faq.json endpoint serves frequently asked questions in a structured JSON format optimized for AI answer engines and generative AI systems. It enables LLMs to surface accurate, site-specific answers directly in AI-generated responses, supporting both GEO (Generative Engine Optimization) and AEO (Answer Engine Optimization) strategies.
The endpoint enforces a minimum of 3 and maximum of 10 FAQ items per response, ensuring sufficient content for AI consumption without overwhelming context windows.
Specification:
| Property | Value |
|---|---|
| Location | /ai/faq.json (also available at /faq.json via rewrite) |
| Content-Type | application/json |
| Cache-Control | public, max-age=86400 (24 hours) |
| Authentication | None required -- publicly accessible |
| Generation | Dynamic -- 3 kit starters fetch from Experience Edge via GraphQL; skate-park reads from a static data file |
| Item Count | Minimum 3, maximum 10 |
When a request is made to https://www.example.com/ai/faq.json, the endpoint returns a JSON array of question/answer pairs:
[
{
"question": "What is Solterra & Co.?",
"answer": "Solterra & Co. is an inspired brands company focused on green energy and sustainability. We work toward a cleaner, healthier planet through clean, renewable energy and our brands."
},
{
"question": "What are Solterra & Co.'s sustainability priorities?",
"answer": "We focus on four sustainability priorities: Climate, Nature, Plastics, and Livelihoods. Our work includes solar electric programs, wind energy collection, and partnerships that cut costs and carbon footprints."
},
{
"question": "How do I get in touch or stay updated?",
"answer": "Use Get In Touch on the website for general inquiries. Sign up for our newsletter for bite-sized insights on green energy and company updates."
}
]Each starter returns its own site-specific FAQ content:
| Starter | # of FAQs | Topics |
|---|---|---|
| kit-nextjs-article-starter (Solterra & Co.) | 7 | Brand availability, media kits, accounts, sustainability, careers, contact |
| kit-nextjs-location-finder (Alaris) | 4 | Maintenance scheduling, dealership inventory, international availability, account setup |
| kit-nextjs-product-listing (SYNC) | 7 | VIP access, manufacturing, shipping/returns, warranty, product categories, support, offers |
| kit-nextjs-skate-park (Skate Park) | 6 | About, who can buy, offerings, contact, location, learn more |
Note: If the data source yields fewer than 3 valid items, the endpoint returns an empty array
[]rather than incomplete data.
Data sources differ by starter:
- 3 kit starters (article-starter, product-listing, location-finder) -- FAQ items are fetched from Experience Edge via the
fetchFaqFromEdge()function insrc/lib/faq-from-edge.ts. Content is managed in Sitecore at/Data/AI Config/FAQusing theAIFAQItemtemplate. - skate-park -- FAQ items are read from a static data file (
src/data/faq.json) using a wrapper object with anitemsarray and alastModifiedtimestamp for maintainers.
Validation rules enforced by the route handler:
- Items must have both
questionandansweras non-empty strings (items missing either are filtered out). - A maximum of 10 items are returned (excess items are truncated via
.slice(0, MAX_ITEMS)). - If fewer than 3 valid items remain after filtering, the endpoint returns an empty array
[]. - The response must pass JSON lint validation.
Three kit starters (article-starter, product-listing, location-finder) use a two-file pattern to fetch FAQ data from Experience Edge:
- Library file (
src/lib/faq-from-edge.ts) -- ContainsfetchFaqFromEdge(), which queries Experience Edge viaSitecoreClient.getDatafor children of the/Data/AI Config/FAQitem. Uses theAIFAQItemGraphQL fragment type to accessquestionandanswerfields (both returned asjsonValuescalars). - Route handler (
src/app/api/ai/faq/route.ts) -- CallsfetchFaqFromEdge(), slices toMAX_ITEMS, and applies the minimum-3-items rule.
The content path is dynamically constructed from the configured site name (NEXT_PUBLIC_DEFAULT_SITE_NAME):
| Starter | Content Path Pattern |
|---|---|
| article-starter | /sitecore/content/solterra/{siteName}/Data/AI Config/FAQ |
| product-listing | /sitecore/content/sync/{siteName}/Data/AI Config/FAQ |
| location-finder | /sitecore/content/alaris/{siteName}/Data/AI Config/FAQ |
This means FAQ content is managed entirely in Sitecore -- no code changes are needed to update questions and answers.
The skate-park starter still uses the original pattern with a static data file (src/data/faq.json) imported by the route handler. The data file uses a wrapper object with an items array.
The route handler is located at:
src/app/api/ai/faq/route.ts
In each kit starter's next.config.ts, a rewrite maps the public-facing path to the API route:
{
source: '/ai/faq.json',
destination: '/api/ai/faq',
locale: false,
}The locale: false setting ensures the path works without any i18n locale prefix.
The /ai/faq.json endpoint is implemented in 4 kit starters only:
| Starter | Route File | Library / Data File | Data Source |
|---|---|---|---|
kit-nextjs-article-starter |
examples/kit-nextjs-article-starter/src/app/api/ai/faq/route.ts |
src/lib/faq-from-edge.ts |
Experience Edge |
kit-nextjs-location-finder |
examples/kit-nextjs-location-finder/src/app/api/ai/faq/route.ts |
src/lib/faq-from-edge.ts |
Experience Edge |
kit-nextjs-product-listing |
examples/kit-nextjs-product-listing/src/app/api/ai/faq/route.ts |
src/lib/faq-from-edge.ts |
Experience Edge |
kit-nextjs-skate-park |
examples/kit-nextjs-skate-park/src/app/api/ai/faq/route.ts |
src/data/faq.json |
Static data file |
The /ai/service.json endpoint exposes a structured list of the site's services and capabilities so that AI assistants can accurately reference what the site offers. Rather than marketing copy, this endpoint should reflect the real, functional capabilities of the site -- features a user or developer can actually use.
This endpoint supports GEO by providing AI systems with a categorized inventory of site features, enabling them to give precise, context-aware answers about what the site can do.
Specification:
| Property | Value |
|---|---|
| Location | /ai/service.json |
| Content-Type | application/json |
| Cache-Control | public, max-age=3600, s-maxage=3600, stale-while-revalidate=86400 |
| Authentication | None required -- publicly accessible |
| Generation | ISR (revalidated every 1 hour) -- 3 kit starters fetch from Experience Edge via GraphQL; skate-park and basic-nextjs use hardcoded content |
| Rewrite | /ai/service.json -> /api/ai/service (kit starters); none for basic-nextjs (served directly via App Router directory convention) |
Note: The caching strategy for
service.jsondiffers from the other AI endpoints. It uses a 1-hour cache with a 24-hour stale-while-revalidate window (via ISR), rather than the 24-hour static cache used bysummary.jsonandfaq.json. This allows service data to refresh more frequently while still serving stale content during revalidation.
When a request is made to https://www.example.com/ai/service.json, the endpoint returns:
{
"services": [
{
"name": "Editorial Content Publishing",
"description": "Publish and manage editorial articles with rich text, images, and multimedia content for lifestyle and brand storytelling.",
"category": "Content Management"
},
{
"name": "Multi-Locale Content Delivery",
"description": "Deliver localized content in multiple languages (English and Canadian English) with automatic locale detection.",
"category": "Localization"
},
{
"name": "XM Cloud Content Integration",
"description": "Seamlessly integrate with Sitecore XM Cloud for headless content management and delivery using the Content SDK.",
"category": "Content Delivery"
}
],
"lastModified": "2026-02-06T12:00:00.000Z"
}Each starter returns its own site-specific services:
| Starter | # of Services | Distinguishing Categories |
|---|---|---|
| kit-nextjs-article-starter (Solterra & Co.) | 8 | Content Management (3), Localization, Content Delivery, Development, Performance, SEO |
| kit-nextjs-location-finder (Alaris) | 10 | Location Services (4), plus shared categories |
| kit-nextjs-product-listing (SYNC) | 11 | E-Commerce (3), Media, Design, plus shared categories |
| kit-nextjs-skate-park (Skate Park) | 8 | Development (3), plus shared categories |
| basic-nextjs | 7 | Development (2), Content Delivery, plus shared categories |
Shared services across starters with hardcoded content include: Multi-Locale Content Delivery, Component-Based Page Building, Responsive Image Optimization, SEO Metadata Management, and Content Preview and Editing. For the 3 refactored starters, the service list is fully managed in Sitecore.
Three kit starters (article-starter, product-listing, location-finder) use a two-file pattern to fetch service data from Experience Edge:
- Library file (
src/lib/service-from-edge.ts) -- ContainsfetchServicesFromEdge(), which queries Experience Edge viaSitecoreClient.getDatafor children of the/Data/AI Config/Servicesitem. Uses theAIServiceGraphQL fragment type. - Route handler (
src/app/api/ai/service/route.ts) -- CallsfetchServicesFromEdge()and returns the JSON response with alastModifiedtimestamp.
The content path is dynamically constructed from the configured site name (NEXT_PUBLIC_DEFAULT_SITE_NAME):
| Starter | Content Path Pattern |
|---|---|
| article-starter | /sitecore/content/solterra/{siteName}/Data/AI Config/Services |
| product-listing | /sitecore/content/sync/{siteName}/Data/AI Config/Services |
| location-finder | /sitecore/content/alaris/{siteName}/Data/AI Config/Services |
Note: The
namefield on service items is a built-in Sitecore item property (String!scalar in GraphQL), so it is queried directly without{ jsonValue }. Thedescriptionandcategoryfields are custom template fields returned asjsonValuescalars.
The skate-park and basic-nextjs starters still use the original single-file pattern with the services array defined inline in the route handler.
The route handler is located at:
- Kit starters (4):
src/app/api/ai/service/route.ts - basic-nextjs:
src/app/ai/service.json/route.ts(original App Router directory convention)
Key aspects of the implementation:
export const revalidate = 3600-- Enables ISR with a 1-hour revalidation period. Next.js statically generates the page at build time and revalidates it in the background after 1 hour, unlike theforce-dynamicapproach used byai.txt.GET()handler -- In the 3 refactored starters, callsfetchServicesFromEdge()and wraps the result in aServiceResponse. In skate-park and basic-nextjs, uses an inlineservicesarray.- Cache-Control header --
public, max-age=3600, s-maxage=3600, stale-while-revalidate=86400provides:- 1-hour fresh cache for both browser and shared caches
- 24-hour stale-while-revalidate window so CDNs can serve stale content while fetching fresh data in the background
In each kit starter's next.config.ts, a rewrite maps the public-facing path to the API route:
{
source: '/ai/service.json',
destination: '/api/ai/service',
locale: false,
}The basic-nextjs starter does not need a rewrite -- the file at src/app/ai/service.json/route.ts directly maps to the /ai/service.json URL via App Router conventions.
The /ai/service.json endpoint is implemented in 5 starters (all App Router starters):
| Starter | Route File | Library File | Data Source |
|---|---|---|---|
kit-nextjs-article-starter |
examples/kit-nextjs-article-starter/src/app/api/ai/service/route.ts |
src/lib/service-from-edge.ts |
Experience Edge |
kit-nextjs-location-finder |
examples/kit-nextjs-location-finder/src/app/api/ai/service/route.ts |
src/lib/service-from-edge.ts |
Experience Edge |
kit-nextjs-product-listing |
examples/kit-nextjs-product-listing/src/app/api/ai/service/route.ts |
src/lib/service-from-edge.ts |
Experience Edge |
kit-nextjs-skate-park |
examples/kit-nextjs-skate-park/src/app/api/ai/service/route.ts |
-- | Hardcoded in route |
basic-nextjs |
examples/basic-nextjs/src/app/ai/service.json/route.ts |
-- | Hardcoded in route |
Not present in:
| Starter | Reason |
|---|---|
basic-nextjs-pages-router |
Pages Router starter -- no App Router ai/ routes |
The /sitemap-llm.xml endpoint serves a filtered XML sitemap that includes only pages relevant for AI/LLM consumption. Unlike the standard /sitemap.xml (which lists all crawlable pages), this sitemap is curated to contain only high-value content pages -- pages that AI systems should prioritize when building their understanding of the site.
This endpoint is referenced from both ai.txt (via the Sitemap directive) and robots.txt, making it discoverable by AI crawlers through multiple channels.
Specification:
| Property | Value |
|---|---|
| Location | /sitemap-llm.xml |
| Content-Type | application/xml; charset=utf-8 |
| Cache-Control | public, max-age=3600, s-maxage=3600 (1 hour) |
| Authentication | None required -- publicly accessible |
| Generation | Dynamic (force-dynamic in App Router); fetches and filters the standard sitemap per-request |
| XML Schema | Standard Sitemaps protocol (http://www.sitemaps.org/schemas/sitemap/0.9) |
When a request is made to https://www.example.com/sitemap-llm.xml, the endpoint returns:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.example.com</loc>
<lastmod>2026-02-06</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://www.example.com/Articles</loc>
<lastmod>2026-02-06</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>Each <url> entry includes:
| Element | Description | Default |
|---|---|---|
<loc> |
Absolute URL of the page | (required) |
<lastmod> |
Date the page was last modified (ISO format) | Today's date |
<changefreq> |
How often the page content changes | weekly |
<priority> |
Relative priority of this page (0.0 to 1.0) | 0.5 |
The endpoint does not maintain its own list of pages. Instead, it follows a fetch-and-filter approach:
- Fetch the standard
/sitemap.xmlfrom the same site via an internal HTTP request. - Parse the XML response using regex to extract
<url>blocks with their<loc>,<lastmod>,<changefreq>, and<priority>values. - Filter each URL through two pattern lists:
EXCLUDED_PATTERNS-- URLs matching any of these are always rejected.ALLOWED_PATTERNS-- Only URLs matching at least one of these (and not excluded) are included.
- Generate a new XML sitemap containing only the filtered URLs.
- Fallback -- If the standard sitemap is unavailable or empty, a hardcoded set of site-specific URLs is used instead.
- Error fallback -- On any unrecoverable error, a minimal sitemap with just the home page is returned.
The route handler is located at:
src/app/api/sitemap-llm/route.ts
Key aspects:
export const dynamic = 'force-dynamic'-- Ensures the route is generated per-request, not statically built.new URL(request.url).origin-- Resolves the base URL from the incoming request.NextRequest/NextResponse-- Uses App Router request/response types.- Internal fetch -- Calls
${baseUrl}/sitemap.xmlto retrieve the standard sitemap.
The route handler is located at:
src/pages/api/sitemap-llm.ts
Differences from the App Router version:
- Uses
NextApiRequest/NextApiResponseinstead ofNextRequest/NextResponse. - Includes an explicit HTTP method check -- returns
405for non-GET requests. - Resolves the base URL from
x-forwarded-protoandhostheaders (instead ofnew URL(request.url).origin).
All starters include a rewrite in next.config.ts (or next.config.js):
{
source: '/sitemap-llm.xml',
destination: '/api/sitemap-llm',
locale: false,
}The locale: false setting ensures the sitemap is accessible at exactly /sitemap-llm.xml without locale prefixes.
Each starter defines its own ALLOWED_PATTERNS to include only content pages relevant for LLM consumption. All starters share the same EXCLUDED_PATTERNS.
These patterns are always rejected, regardless of the allowed list:
| Pattern | What It Excludes |
|---|---|
/404 |
Not-found error page |
/api/ |
API routes |
/500 |
Server error page |
/error |
Error pages |
/_ |
Internal/system routes |
sitemap |
Sitemap files themselves (prevents recursion) |
/robots |
Robots.txt route |
.xml |
Other XML files |
.json, .txt, .css, .js, .ico, image/font files |
Static assets |
? |
URLs with query strings |
| Starter | Allowed Patterns | Description |
|---|---|---|
| basic-nextjs | / |
Home page only |
| kit-nextjs-article-starter | /, /Article-Page, /Articles/** |
Home + article content pages |
| kit-nextjs-location-finder | /, /Products/Aero, /Products/Terra, /Products/Nexa, /Test Drive |
Home + product pages + test drive |
| kit-nextjs-product-listing | /, /Speakers/**, /Video |
Home + product categories |
| kit-nextjs-skate-park | /, /About |
Home + about page |
| basic-nextjs-pages-router | / |
Home page only |
The /sitemap-llm.xml endpoint is implemented in all 6 starters:
| Starter | Router | Route File | Config File |
|---|---|---|---|
basic-nextjs |
App Router | examples/basic-nextjs/src/app/api/sitemap-llm/route.ts |
examples/basic-nextjs/next.config.ts |
kit-nextjs-article-starter |
App Router | examples/kit-nextjs-article-starter/src/app/api/sitemap-llm/route.ts |
examples/kit-nextjs-article-starter/next.config.ts |
kit-nextjs-location-finder |
App Router | examples/kit-nextjs-location-finder/src/app/api/sitemap-llm/route.ts |
examples/kit-nextjs-location-finder/next.config.ts |
kit-nextjs-product-listing |
App Router | examples/kit-nextjs-product-listing/src/app/api/sitemap-llm/route.ts |
examples/kit-nextjs-product-listing/next.config.ts |
kit-nextjs-skate-park |
App Router | examples/kit-nextjs-skate-park/src/app/api/sitemap-llm/route.ts |
examples/kit-nextjs-skate-park/next.config.ts |
basic-nextjs-pages-router |
Pages Router | examples/basic-nextjs-pages-router/src/pages/api/sitemap-llm.ts |
examples/basic-nextjs-pages-router/next.config.js |
Note: The handler logic (fetch, parse, filter, generate XML) is identical across all starters. The only differences are the
ALLOWED_PATTERNSregex array and the hardcoded fallback URLs, which are tailored to each site's content structure.
The sitemap-llm.xml URL is referenced from two other GEO endpoints:
ai.txt--Sitemap: ${siteUrl}/sitemap-llm.xml(ingenerateAiTxtContent())robots.txt--Sitemap: ${baseUrl}/sitemap-llm.xml(in the robots route handler)
The GEO endpoint route handlers should be updated when:
ai.txt:
- Adding a new AI endpoint -- Add a new
AI-Endpointline in thegenerateAiTxtContent()function. - Removing an AI endpoint -- Remove the corresponding
AI-Endpointline. - Changing crawler permissions -- Add or modify
User-Agent/Allow/Disallowdirectives (e.g., to block a new crawler or restrict access to additional paths). - Adding a new sitemap -- Add a new
Sitemapline.
summary.json:
- Changing the site name or description (3 kit starters) -- Update the content in Sitecore content editor at
/sitecore/content/{templateName}/{siteName}/Data/AI Config/Summary. Thetitleanddescriptionfields are fetched from Experience Edge at runtime. No code changes needed. - Changing the site name or description (skate-park) -- Update the hardcoded
titleanddescriptionvalues in the starter'sroute.ts. - Changing the description limit -- Modify the
MAX_DESCRIPTION_LENGTHconstant in the route handler (must be coordinated with the GEO contract requirements). - Changing the content path or GraphQL type -- Update the
SUMMARY_DATA_PATH_SUFFIXorSUMMARY_GRAPHQL_TYPEconstants insrc/lib/summary-from-edge.ts.
faq.json:
- Adding or removing FAQ items (3 kit starters) -- Update the content in Sitecore content editor under
/sitecore/content/{templateName}/{siteName}/Data/AI Config/FAQ. Items are fetched from Experience Edge at runtime using theAIFAQItemtemplate. No code changes needed. - Adding or removing FAQ items (skate-park) -- Edit the
src/data/faq.jsonfile. No changes to the route handler are needed. - Changing item count limits -- Modify the
MIN_ITEMSorMAX_ITEMSconstants in the route handler (must be coordinated with the GEO contract requirements). - Changing the content path or GraphQL type -- Update the
FAQ_DATA_PATH_SUFFIXorFAQ_GRAPHQL_TYPEconstants insrc/lib/faq-from-edge.ts.
service.json:
- Adding or removing services (3 kit starters) -- Update the content in Sitecore content editor under
/sitecore/content/{templateName}/{siteName}/Data/AI Config/Services. Items are fetched from Experience Edge at runtime using theAIServicetemplate. No code changes needed. - Adding or removing services (skate-park, basic-nextjs) -- Modify the inline
servicesarray in the starter'sroute.ts. - Changing the revalidation period -- Modify the
revalidateconstant (currently 3600 seconds / 1 hour). - Changing the content path or GraphQL type -- Update the
SERVICE_DATA_PATH_SUFFIXorSERVICE_GRAPHQL_TYPEconstants insrc/lib/service-from-edge.ts.
sitemap-llm.xml:
- Adding new content pages -- Add a new regex to the
ALLOWED_PATTERNSarray in the starter's route handler. Also add corresponding fallback URLs in the empty-sitemap fallback block. - Removing content pages from the LLM sitemap -- Remove the corresponding regex from
ALLOWED_PATTERNSand its fallback URL. - Excluding additional paths -- Add a new regex to the
EXCLUDED_PATTERNSarray. Since this array is shared logic, the same exclusion should be applied across all starters. - Changing the standard sitemap source -- The handler fetches
${baseUrl}/sitemap.xmlby default. If the standard sitemap URL changes, update the fetch URL in the handler.
All 6 starters contain their own copy of the ai.txt and sitemap-llm route handlers, 4 kit starters contain the summary.json and faq.json route handlers, and 5 App Router starters contain the service.json route handler. When making changes:
- Update the route handler (and library file, if applicable) in one starter.
- Copy the same change to all other starters, adjusting the content path root (
solterra,sync,alaris) in the library file as needed. - For the Pages Router starter (
basic-nextjs-pages-router), adapt the change to the Pages Router API conventions (NextApiRequest/NextApiResponse, explicit method check, array header handling).
Route handler files to update across starters:
src/app/api/well-known/ai-txt/route.ts(App Router starters x5)src/pages/api/well-known/ai-txt.ts(Pages Router starter x1)src/app/api/ai/summary/route.ts(Kit starters x4)src/app/api/ai/faq/route.ts(Kit starters x4)src/app/api/ai/service/route.ts(Kit starters x4)src/app/ai/service.json/route.ts(basic-nextjsonly -- still uses old App Router directory convention)src/app/api/sitemap-llm/route.ts(App Router starters x5 -- allowed patterns are site-specific)src/pages/api/sitemap-llm.ts(Pages Router starter x1)
Experience Edge library files (3 kit starters only -- article-starter, product-listing, location-finder):
src/lib/faq-from-edge.ts-- content path root is site-specificsrc/lib/service-from-edge.ts-- content path root is site-specificsrc/lib/summary-from-edge.ts-- content path root is site-specific
Static data files (skate-park only):
src/data/faq.json-- FAQ content is site-specific
If a new starter application is added to the repository:
For ai.txt:
- Copy the
src/app/api/well-known/ai-txt/route.tsfile (or the Pages Router equivalent) into the new starter's API directory. - Add the rewrite rule to the new starter's
next.config.ts:{ source: '/.well-known/ai.txt', destination: '/api/well-known/ai-txt', locale: false, }
- Ensure the new starter's
tsconfig.jsonincludes the.sitecore/*path mapping so thesites.jsonimport resolves correctly.
For summary.json:
- Copy
src/app/api/ai/summary/route.tsfrom an existing kit starter into the new starter's API directory. - Copy
src/lib/summary-from-edge.tsfrom an existing kit starter. - Update the content path root in
buildSummaryPath()to match the new starter's Sitecore content root (e.g., replacesolterrawith the new site's content root name). - Ensure a
Summaryitem exists in Sitecore content editor at/sitecore/content/{templateName}/{siteName}/Data/AI Config/Summary, using theAISummarytemplate withtitleanddescriptionfields. - Add the rewrite rule to the new starter's
next.config.ts:{ source: '/ai/summary.json', destination: '/api/ai/summary', locale: false, }
For faq.json:
- Copy
src/app/api/ai/faq/route.tsfrom an existing kit starter into the new starter'ssrc/app/api/ai/faq/directory. - Copy
src/lib/faq-from-edge.tsfrom an existing kit starter. - Update the content path root in
buildFaqPath()to match the new starter's Sitecore content root. - Ensure FAQ items exist in Sitecore content editor under
/sitecore/content/{templateName}/{siteName}/Data/AI Config/FAQ, using theAIFAQItemtemplate withquestionandanswerfields. Include at least 3 items. - Add the rewrite rule to the new starter's
next.config.ts:{ source: '/ai/faq.json', destination: '/api/ai/faq', locale: false, }
For service.json:
- Copy
src/app/api/ai/service/route.tsfrom an existing kit starter into the new starter'ssrc/app/api/ai/service/directory. - Copy
src/lib/service-from-edge.tsfrom an existing kit starter. - Update the content path root in
buildServicePath()to match the new starter's Sitecore content root. - Ensure service items exist in Sitecore content editor under
/sitecore/content/{templateName}/{siteName}/Data/AI Config/Services, using theAIServicetemplate withdescriptionandcategoryfields. Thenamefield is the built-in Sitecore item name. - Add the rewrite rule to the new starter's
next.config.ts:{ source: '/ai/service.json', destination: '/api/ai/service', locale: false, }
For sitemap-llm.xml:
- Copy
src/app/api/sitemap-llm/route.ts(or the Pages Router equivalent) into the new starter's API directory. - Update the
ALLOWED_PATTERNSarray with regex patterns matching the new starter's key content pages. - Update the fallback URLs in the empty-sitemap block to match the allowed patterns.
- Add the rewrite rule to the new starter's
next.config.ts:{ source: '/sitemap-llm.xml', destination: '/api/sitemap-llm', locale: false, }
The rewrite rule uses locale: false, which means the /.well-known/ai.txt path works directly without any locale prefix. Crawlers access it at exactly https://your-site/.well-known/ai.txt regardless of the site's i18n configuration.
The route is marked with export const dynamic = 'force-dynamic', so Next.js will not statically generate this page at build time. The content is generated on every request. This ensures:
- The
Last-Modifieddate is always current. - The site URL is resolved from the actual request headers (important for multi-domain deployments or when behind reverse proxies).
Despite being dynamically generated, the response includes cache headers:
Cache-Control: public, max-age=86400, s-maxage=86400
This allows CDN and edge servers to cache the response for up to 24 hours, reducing load on the origin server. The s-maxage directive specifically targets shared caches (CDNs, reverse proxies).
The endpoint is publicly accessible without any authentication. This is required because AI crawlers do not send authentication tokens. The endpoint does not expose any sensitive information.
After deploying, verify the endpoints are working correctly:
Verify ai.txt:
# Check that the endpoint returns 200 with correct content type
curl -I https://your-site/.well-known/ai.txt
# Expected response headers:
# HTTP/2 200
# content-type: text/plain; charset=utf-8
# cache-control: public, max-age=86400, s-maxage=86400
# x-content-type-options: nosniff# View the full response body
curl https://your-site/.well-known/ai.txtVerify that:
- The response status is
200. - The
Content-Typeheader istext/plain; charset=utf-8. - The
Cache-Controlheader includesmax-age=86400. - The response body contains the correct site URL (not
localhostor a placeholder). - All
AI-EndpointURLs are reachable. - The
Last-Modifieddate is the current date.
Verify summary.json:
# Check headers
curl -I https://your-site/ai/summary.json
# Expected response headers:
# HTTP/2 200
# content-type: application/json; charset=utf-8
# cache-control: public, max-age=86400# View the full JSON payload
curl https://your-site/ai/summary.jsonVerify that:
- The response status is
200. - The
Content-Typeheader isapplication/json; charset=utf-8. - The JSON is valid (passes JSON lint).
- The
titlefield is present and non-empty. - The
descriptionfield is present and does not exceed 800 characters. - The
lastModifiedfield is a valid ISO 8601 timestamp.
Verify faq.json:
# Check headers
curl -I https://your-site/ai/faq.json
# Expected response headers:
# HTTP/2 200
# content-type: application/json
# cache-control: public, max-age=86400# View the full JSON payload
curl https://your-site/ai/faq.jsonVerify that:
- The response status is
200. - The
Content-Typeheader isapplication/json. - The JSON is valid (passes JSON lint).
- The response is an array of objects, each with
questionandanswerfields. - The array contains between 3 and 10 items.
- The convenience path also works:
curl https://your-site/faq.json.
Verify service.json:
# Check headers
curl -I https://your-site/ai/service.json
# Expected response headers:
# HTTP/2 200
# content-type: application/json
# cache-control: public, max-age=3600, s-maxage=3600, stale-while-revalidate=86400# View the full JSON payload
curl https://your-site/ai/service.jsonVerify that:
- The response status is
200. - The
Content-Typeheader isapplication/json. - The JSON is valid (passes JSON lint).
- The
servicesfield is an array of objects, each withname,description, andcategory. - The
lastModifiedfield is a valid ISO 8601 timestamp. - Services reflect real site capabilities (not marketing copy).
Verify sitemap-llm.xml:
# Check headers
curl -I https://your-site/sitemap-llm.xml
# Expected response headers:
# HTTP/2 200
# content-type: application/xml; charset=utf-8
# cache-control: public, max-age=3600, s-maxage=3600# View the full XML response
curl https://your-site/sitemap-llm.xmlVerify that:
- The response status is
200. - The
Content-Typeheader isapplication/xml; charset=utf-8. - The XML is well-formed and valid.
- The
<urlset>uses the standard sitemaps namespace (http://www.sitemaps.org/schemas/sitemap/0.9). - Only expected content pages are listed (no API routes, error pages, or static assets).
- Each
<url>entry includes<loc>,<lastmod>,<changefreq>, and<priority>. - The URLs use the correct public-facing domain (not
localhost).