Skip to content

Commit f4a1d8f

Browse files
authored
Create compliant robots.txt (#2004)
`robots.txt` must adhere to SAP's AI and chatbot crawler policies (See AI IP Protection, e.g. https://www.sap.com/robots.txt): - Block all crawlers by default. - Allow access for search engines (like Google). - Grant specific exceptions to authorized AI/ChatBots.
1 parent ae971e2 commit f4a1d8f

2 files changed

Lines changed: 165 additions & 1 deletion

File tree

.vitepress/config.js

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,9 @@ import { promises as fs } from 'node:fs'
200200
config.buildEnd = async ({ outDir, site }) => {
201201
const sitemapURL = new URL(siteURL.href)
202202
sitemapURL.pathname = path.join(sitemapURL.pathname, 'sitemap.xml')
203-
await fs.writeFile(path.resolve(outDir, 'robots.txt'), `Sitemap: ${sitemapURL}\n`)
203+
console.debug('✓ writing robots.txt with sitemap URL', sitemapURL.href) // eslint-disable-line no-console
204+
const robots = (await fs.readFile(path.resolve(__dirname, 'robots.txt'))).toString().replace('{{SITEMAP}}', sitemapURL.href)
205+
await fs.writeFile(path.join(outDir, 'robots.txt'), robots)
204206

205207
// disabled by default to avoid online fetches during local build
206208
if (process.env.VITE_CAPIRE_EXTRA_ASSETS) {

.vitepress/robots.txt

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# Based on https://www.sap.com/robots.txt
2+
3+
User-agent: *
4+
Disallow: /
5+
6+
# Search Engines
7+
8+
User-agent: Googlebot
9+
Allow: /
10+
11+
User-agent: Googlebot-Image
12+
Allow: /
13+
14+
User-agent: Googlebot-News
15+
Allow: /
16+
17+
User-agent: Googlebot-Video
18+
Allow: /
19+
20+
User-agent: GoogleOther
21+
Allow: /
22+
23+
User-agent: Storebot-Google
24+
Allow: /
25+
26+
User-agent: Bingbot
27+
Allow: /
28+
29+
User-agent: 360Spider
30+
Allow: /
31+
32+
User-agent: Baiduspider
33+
Allow: /
34+
35+
User-agent: coccocbot
36+
Allow: /
37+
38+
User-agent: Daum
39+
Allow: /
40+
41+
User-agent: DuckDuckBot
42+
Allow: /
43+
44+
User-agent: Ecosia_bot
45+
Allow: /
46+
47+
User-agent: MojeekBot
48+
Allow: /
49+
50+
User-agent: Yeti
51+
Allow: /
52+
53+
User-agent: SeznamBot
54+
Allow: /
55+
56+
User-agent: Sogou web spider
57+
Allow: /
58+
59+
User-agent: Yahoo! Slurp
60+
Allow: /
61+
62+
User-agent: YandexAccessibilityBot
63+
Allow: /
64+
65+
User-agent: YandexMobileBot
66+
Allow: /
67+
68+
User-agent: Yandex
69+
Allow: /
70+
71+
# AI and Chat Agents
72+
73+
User-agent: Amazonbot
74+
Allow: /
75+
76+
User-agent: ClaudeBot
77+
Allow: /
78+
79+
User-agent: CCBot
80+
Allow: /
81+
82+
User-agent: Google-Extended
83+
Allow: /
84+
85+
User-agent: FacebookBot
86+
Allow: /
87+
88+
User-agent: MistralAI-User
89+
Allow: /
90+
91+
User-agent: GPTBot
92+
Allow: /
93+
94+
User-agent: ChatGPT-User
95+
Allow: /
96+
97+
User-agent: PerplexityBot
98+
Allow: /
99+
100+
User-agent: Perplexity-User
101+
Allow: /
102+
103+
# Fetcher and other
104+
105+
User-agent: AdsBot-Google
106+
Allow: /
107+
108+
User-agent: AdsBot-Google-Mobile
109+
Allow: /
110+
111+
User-agent: AhrefsBot
112+
Allow: /
113+
114+
User-agent: Google-Safety
115+
Allow: /
116+
117+
User-agent: Mediapartners-Google
118+
Allow: /
119+
120+
User-agent: facebookexternalhit
121+
Allow: /
122+
123+
User-agent: Google-Read-Aloud
124+
Allow: /
125+
126+
User-agent: Hatena
127+
Allow: /
128+
129+
User-agent: linkedinbot
130+
Allow: /
131+
132+
User-agent: Pinterestbot
133+
Allow: /
134+
135+
User-agent: SchemaBot
136+
Allow: /
137+
138+
User-agent: Slackbot-LinkExpanding
139+
Allow: /
140+
141+
User-agent: Telegram
142+
Allow: /
143+
144+
User-agent: Twitterbot
145+
Allow: /
146+
147+
User-agent: SiteAuditBot
148+
Allow: /
149+
150+
User-agent: Chrome-Lighthouse
151+
Allow: /
152+
153+
User-agent: Google-InspectionTool
154+
Allow: /
155+
156+
User-agent: BingPreview
157+
Allow: /
158+
159+
User-agent: archive.org_bot
160+
Allow: /
161+
162+
Sitemap: {{SITEMAP}}

0 commit comments

Comments
 (0)