Skip to content

Commit 7c5841f

Browse files
authored
Merge pull request #1060 from DuendeSoftware/mb/md-conneg
Enable Markdown content negotiation and implement Markdown file generation
2 parents 5e0e7f7 + 944509b commit 7c5841f

13 files changed

Lines changed: 494 additions & 55 deletions

File tree

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,25 @@ redirects: {
216216
This will remove the old page from the navigation structure, but keeps the URL around
217217
with a redirect to the new location.
218218

219+
## 🤖 AI-Friendly Documentation
220+
221+
We make our docs consumable by AI agents and LLMs, not just humans.
222+
223+
### What we do
224+
225+
* **[`llms.txt`](https://docs.duendesoftware.com/llms.txt) and [`llms-full.txt`](https://docs.duendesoftware.com/llms-full.txt)** — Machine-readable site index and full content dump following the [llms.txt proposal](https://llmstxt.org/), so AI tools can discover and ingest our docs.
226+
* **Content negotiation** — The server supports `Accept: text/markdown` to return raw Markdown for any docs page, giving AI agents clean content without HTML noise.
227+
* **`robots.txt` signals** — We don't block AI crawlers. The robots.txt includes references to `llms.txt` so crawlers can find structured content.
228+
229+
Beyond this repo, Duende also provides tools that give AI coding assistants specialized knowledge (see [AI Agent Tools](https://docs.duendesoftware.com/general/ai-agent-tools/)):
230+
231+
* **[Agent Skills](https://github.com/DuendeSoftware/duende-skills)** — Structured `SKILL.md` files following the [Agent Skills format](https://agentskills.io/) that give AI assistants domain expertise on IdentityServer, BFF, token management, and more. Loaded automatically by compatible IDEs.
232+
* **[MCP Server](https://github.com/DuendeSoftware/products/blob/main/docs-mcp/README.md)** — A local [Model Context Protocol](https://modelcontextprotocol.io/) server that gives AI assistants search and fetch access to the full Duende docs, blog, and sample code via SQLite full-text search.
233+
234+
### Why
235+
236+
Developers increasingly use AI assistants to find answers. If our docs aren't AI-friendly, those assistants hallucinate or point elsewhere. Making content machine-readable means Duende products get accurate representation in AI-generated answers.
237+
219238
## ⚖️ License
220239

221240
For all licensing information, refer to the relevant license files:

astro/astro.config.mjs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import * as fs from "node:fs";
1717
import { duendeOpenGraphImage } from "./src/plugins/duende-og-image.js";
1818
import removeMarkdownExtensions from "./src/plugins/remove-markdown-extensions.js";
1919
import staticRedirects from "./src/plugins/static-redirects.js";
20+
import markdownOutput from "./src/plugins/markdown-output.js";
2021

2122
// https://astro.build/config
2223
export default defineConfig({
@@ -233,6 +234,7 @@ export default defineConfig({
233234
contentDir: "./src/content/docs",
234235
}),
235236
staticRedirects(),
237+
markdownOutput(),
236238
opengraphImages({
237239
options: {
238240
fonts: [

astro/package-lock.json

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

astro/package.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,13 @@
3434
"astro-opengraph-images": "^1.14.3",
3535
"astro-redirect-from": "^1.3.5",
3636
"astro-rehype-relative-markdown-links": "^0.19.0",
37+
"hast-util-to-text": "^4.0.2",
3738
"jsdom": "^29.0.2",
3839
"patch-package": "^8.0.1",
3940
"react": "^19.2.4",
4041
"rehype-external-links": "^3.0.0",
42+
"rehype-parse": "^9.0.1",
43+
"rehype-remark": "^10.0.1",
4144
"satori": "^0.26.0",
4245
"sharp": "^0.34.5",
4346
"starlight-auto-sidebar": "^0.2.0",

astro/src/content/docs/identityserver/overview/big-picture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ depending on your needs)
7878
and add the IdentityServer middleware to that application. The middleware adds the necessary protocol heads to the
7979
application so that clients can talk to it using those standard protocols.
8080

81-
![IdentityServer middleware diagram and its relatinship in the ASP.NET Core pipeline](images/middleware.svg)
81+
![IdentityServer middleware diagram and its relationship in the ASP.NET Core pipeline](images/middleware.svg)
8282

8383
The hosting application can be as complex as you want, but we typically recommend to keep the attack surface as small as
8484
possible by including
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
import url from "node:url";
2+
import path from "node:path";
3+
import fs from "node:fs/promises";
4+
import type { AstroIntegrationLogger } from "astro";
5+
import { unified } from "unified";
6+
import rehypeParse from "rehype-parse";
7+
import rehypeRemark from "rehype-remark";
8+
import remarkStringify from "remark-stringify";
9+
import remarkGfm from "remark-gfm";
10+
import { JSDOM } from "jsdom";
11+
import { toText } from "hast-util-to-text";
12+
13+
/**
14+
* Astro integration that generates a Markdown file (index.md) next to every
15+
* rendered HTML file (index.html) in the build output.
16+
*
17+
* The Markdown is derived from the rendered HTML, so all links, includes,
18+
* components, etc. are already resolved.
19+
*/
20+
export default function markdownOutput() {
21+
let siteUrl = "";
22+
23+
return {
24+
name: "markdown-output",
25+
hooks: {
26+
"astro:config:done": ({ config }: { config: { site?: string } }) => {
27+
siteUrl = config.site ? new URL(config.site).origin : "";
28+
},
29+
"astro:build:done": async ({
30+
dir,
31+
pages,
32+
logger,
33+
}: {
34+
dir: URL;
35+
pages: Array<{ pathname: string }>;
36+
logger: AstroIntegrationLogger;
37+
}) => {
38+
const outDir = url.fileURLToPath(dir);
39+
const processor = unified()
40+
.use(rehypeParse, { fragment: true })
41+
.use(rehypeRemark, {
42+
handlers: {
43+
// Preserve language hints on code fences from <pre data-language="...">
44+
pre(state: any, node: any) {
45+
const lang =
46+
node.properties?.dataLanguage || "";
47+
const value = toText(node);
48+
const result = {
49+
type: "code" as const,
50+
lang: lang || null,
51+
meta: null,
52+
value: value.replace(/\n$/, ""),
53+
};
54+
state.patch(node, result);
55+
return result;
56+
},
57+
// Handle <figure> with code blocks: extract title from figcaption
58+
figure(state: any, node: any) {
59+
// Find figcaption title
60+
const figcaption = node.children?.find(
61+
(c: any) => c.tagName === "figcaption",
62+
);
63+
const titleSpan = figcaption?.children?.find(
64+
(c: any) =>
65+
c.properties?.className?.includes("title"),
66+
);
67+
const title = titleSpan ? toText(titleSpan).trim() : "";
68+
69+
// Find <pre> child
70+
const pre = node.children?.find(
71+
(c: any) => c.tagName === "pre",
72+
);
73+
if (!pre) {
74+
// Not a code figure, fall back to default
75+
return state.all(node);
76+
}
77+
78+
const lang = pre.properties?.dataLanguage || "";
79+
const value = toText(pre);
80+
const codeNode = {
81+
type: "code" as const,
82+
lang: lang || null,
83+
meta: null,
84+
value: value.replace(/\n$/, ""),
85+
};
86+
state.patch(pre, codeNode);
87+
88+
if (title) {
89+
const titleNode = {
90+
type: "paragraph" as const,
91+
children: [
92+
{
93+
type: "inlineCode" as const,
94+
value: title,
95+
},
96+
{
97+
type: "text" as const,
98+
value: ":",
99+
},
100+
],
101+
};
102+
return [titleNode, codeNode];
103+
}
104+
105+
return codeNode;
106+
},
107+
},
108+
})
109+
.use(remarkGfm)
110+
.use(remarkStringify, {
111+
bullet: "-",
112+
emphasis: "*",
113+
strong: "*",
114+
rule: "-",
115+
});
116+
117+
let count = 0;
118+
let errors = 0;
119+
120+
await Promise.all(
121+
pages.map(async ({ pathname }) => {
122+
const htmlPath = path.join(outDir, pathname, "index.html");
123+
const mdPath = path.join(outDir, pathname, "index.md");
124+
125+
try {
126+
const html = await fs.readFile(htmlPath, "utf-8");
127+
const dom = new JSDOM(html);
128+
const doc = dom.window.document;
129+
130+
const main = doc.querySelector("main");
131+
if (!main) return;
132+
133+
// Restore mermaid diagrams as code fences
134+
main.querySelectorAll("div.mermaid").forEach((el) => {
135+
const content = el.getAttribute("data-content");
136+
if (content) {
137+
const pre = doc.createElement("pre");
138+
pre.setAttribute("data-language", "mermaid");
139+
pre.textContent = content;
140+
el.replaceWith(pre);
141+
}
142+
});
143+
144+
// Remove banner
145+
main.querySelectorAll(".sl-banner").forEach((el) => el.remove());
146+
147+
// Remove "Section titled" anchor links in headings
148+
main.querySelectorAll("a").forEach((el) => {
149+
if (el.textContent?.trim().startsWith("Section titled")) el.remove();
150+
});
151+
152+
// Remove "Edit page" link and "Last updated" meta section
153+
main.querySelectorAll("footer .meta").forEach((el) => el.remove());
154+
155+
// Resolve image paths: /_astro/... URLs are build artifacts;
156+
// rewrite them to absolute URLs so they resolve outside the build output.
157+
main.querySelectorAll("img").forEach((img) => {
158+
const src = img.getAttribute("src");
159+
if (src && src.startsWith("/")) {
160+
img.setAttribute("src", `${siteUrl}${src}`);
161+
}
162+
});
163+
164+
// Resolve link hrefs to absolute URLs
165+
main.querySelectorAll("a").forEach((a) => {
166+
const href = a.getAttribute("href");
167+
if (href && href.startsWith("/")) {
168+
a.setAttribute("href", `${siteUrl}${href}`);
169+
}
170+
});
171+
172+
// Remove giscus comments
173+
main.querySelectorAll("giscus-comments").forEach((el) => el.remove());
174+
175+
// Remove copyright footer (the <hr> + copyright div)
176+
main.querySelectorAll("footer > hr").forEach((el) => el.remove());
177+
main.querySelectorAll("footer > div:not(.pagination-links)").forEach((el) => el.remove());
178+
179+
// Flatten pagination links so Previous/Next text is on one line
180+
// Structure: <a> <svg/> <span> Previous <br> <span class="link-title">Title</span> </span> </a>
181+
main.querySelectorAll(".pagination-links a").forEach((a) => {
182+
a.querySelectorAll("svg").forEach((svg) => svg.remove());
183+
a.querySelectorAll("br").forEach((br) => br.remove());
184+
const label = a.querySelector("span")?.childNodes[0]?.textContent?.trim(); // "Previous" or "Next"
185+
const title = a.querySelector(".link-title")?.textContent?.trim();
186+
if (label && title) {
187+
a.textContent = `${label}: ${title}`;
188+
}
189+
});
190+
191+
const content = main.innerHTML;
192+
const result = await processor.process(content);
193+
194+
// Add page title and source URL as YAML frontmatter
195+
const pageTitle = doc.querySelector("title")?.textContent?.trim() || "";
196+
const pageSource = siteUrl ? `${siteUrl}/${pathname}` : `/${pathname}`;
197+
const frontmatter = `---\ntitle: ${pageTitle}\nsource: ${pageSource}\n---\n\n`;
198+
199+
await fs.writeFile(mdPath, frontmatter + String(result));
200+
count++;
201+
} catch (e: any) {
202+
if (e.code === "ENOENT") {
203+
// No index.html for this page (e.g. redirects, API routes)
204+
return;
205+
}
206+
errors++;
207+
logger.warn(`Failed to generate Markdown for ${pathname}: ${e.message}`);
208+
}
209+
}),
210+
);
211+
212+
logger.info(
213+
`Generated ${count} Markdown files${errors > 0 ? ` (${errors} errors)` : ""}`,
214+
);
215+
},
216+
},
217+
};
218+
}
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
namespace Docs.Web.Middleware;
2+
3+
/// <summary>
4+
/// Middleware that serves .md files when the client sends Accept: text/markdown.
5+
/// </summary>
6+
public class MarkdownContentNegotationMiddleware(IWebHostEnvironment environment) : IMiddleware
7+
{
8+
public async Task InvokeAsync(HttpContext context, RequestDelegate next)
9+
{
10+
var accept = context.Request.Headers.Accept.ToString();
11+
if (accept.Contains("text/markdown", StringComparison.OrdinalIgnoreCase))
12+
{
13+
var requestPath = context.Request.Path.Value?.TrimEnd('/') ?? "";
14+
15+
// Try the exact path with .md extension, then index.md inside the directory
16+
var candidates = new[]
17+
{
18+
Path.Combine(environment.WebRootPath, requestPath.TrimStart('/') + ".md"),
19+
Path.Combine(environment.WebRootPath, requestPath.TrimStart('/'), "index.md")
20+
};
21+
22+
foreach (var mdPath in candidates)
23+
{
24+
if (File.Exists(mdPath))
25+
{
26+
context.Response.ContentType = "text/markdown; charset=utf-8";
27+
context.Response.Headers.TryAdd("content-signal", "ai-train=yes, search=yes, ai-input=yes");
28+
await context.Response.SendFileAsync(mdPath);
29+
return;
30+
}
31+
}
32+
}
33+
34+
await next(context);
35+
}
36+
}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
namespace Docs.Web.Middleware;
2+
3+
/// <summary>
4+
/// Middleware that serves a custom 404.html page when the response status code is 404.
5+
/// </summary>
6+
public class NotFoundMiddleware(IWebHostEnvironment environment) : IMiddleware
7+
{
8+
public async Task InvokeAsync(HttpContext context, RequestDelegate next)
9+
{
10+
await next(context);
11+
12+
if (context.Response.StatusCode == 404 && !context.Response.HasStarted)
13+
{
14+
var notFoundPath = Path.Combine(environment.WebRootPath, "404.html");
15+
16+
if (File.Exists(notFoundPath))
17+
{
18+
context.Response.ContentType = "text/html";
19+
await context.Response.SendFileAsync(notFoundPath);
20+
}
21+
}
22+
}
23+
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
namespace Docs.Web.Middleware;
2+
3+
/// <summary>
4+
/// Middleware that redirects old URLs to new destinations using a preloaded redirect map (301 permanent).
5+
/// </summary>
6+
public class RedirectMiddleware(IReadOnlyDictionary<string, string> redirectMap) : IMiddleware
7+
{
8+
public async Task InvokeAsync(HttpContext context, RequestDelegate next)
9+
{
10+
var path = context.Request.Path.Value?.TrimEnd('/') ?? "";
11+
12+
if (redirectMap.TryGetValue(path, out var destination))
13+
{
14+
var queryString = context.Request.QueryString.HasValue ? context.Request.QueryString.Value : "";
15+
context.Response.StatusCode = 301;
16+
context.Response.Headers.Location = $"{destination}{queryString}";
17+
return;
18+
}
19+
20+
await next(context);
21+
}
22+
}

0 commit comments

Comments
 (0)