Skip to content

docs: add llms.txt ecosystem hub at site root#22003

Open
timsaucer wants to merge 1 commit intoapache:mainfrom
timsaucer:feat/add-llms-txt-to-site
Open

docs: add llms.txt ecosystem hub at site root#22003
timsaucer wants to merge 1 commit intoapache:mainfrom
timsaucer:feat/add-llms-txt-to-site

Conversation

@timsaucer
Copy link
Copy Markdown
Member

@timsaucer timsaucer commented May 3, 2026

Which issue does this PR close?

Rationale for this change

llms.txt is an emerging convention for exposing a machine-readable, agent-facing entry point at a site's docs root. Subprojects in the DataFusion ecosystem are starting to publish their own (apache/datafusion-python PR apache/datafusion-python#1505 added one). The main datafusion.apache.org site is the natural top-level discovery point for the whole ecosystem, so it should expose a hub llms.txt that points agents at:

  • the core DataFusion (Rust) user / library / contributor guides and Rust API docs,
  • each subproject's docs root, where agents following the llmstxt.org convention can probe <docs root>/llms.txt for project-specific guidance.

Net effect: an agent fetching https://datafusion.apache.org/llms.txt lands in a categorized directory of the entire ecosystem's agent guidance.

What changes are included in this PR?

  • docs/source/llms.txt — new file, llmstxt.org schema. Sections: Core DataFusion (Rust), Subprojects, Optional. The Subprojects section links to docs roots (not pending llms.txt URLs) and includes a one-line note describing the probe convention so the hub stays correct as subprojects ship their own files.
  • docs/source/conf.pyhtml_extra_path = ["llms.txt"] so Sphinx copies the file verbatim to the build output root, served at https://datafusion.apache.org/llms.txt.
  • dev/release/rat_exclude_files.txt — exclude docs/source/llms.txt from the RAT license-header check (the file body is rendered markdown and cannot carry the standard .. comment header without breaking the format).

Are these changes tested?

No automated tests. The change is a single static file plus a Sphinx config line that mirrors a pattern already used in apache/datafusion-python (html_extra_path = ["llms.txt"], PR apache/datafusion-python#1505). Verification will be done at deploy time: confirm https://datafusion.apache.org/llms.txt resolves and renders.

Are there any user-facing changes?

Yes — adds a new public URL https://datafusion.apache.org/llms.txt. No existing pages are modified. No API changes.

Adds docs/source/llms.txt following the llmstxt.org schema as a directory
hub for the DataFusion ecosystem: links to the core Rust user/library/
contributor guides, Rust API docs, and the Python/Ballista/Comet
subproject docs roots. Configures Sphinx html_extra_path so the file is
served verbatim at https://datafusion.apache.org/llms.txt, and excludes
it from the RAT license-header check (markdown body cannot carry the
standard "..." comment header).

Per the convention noted in the file, agents can probe each subproject
docs root for its own llms.txt — keeps the hub future-proof without
hardcoding pending URLs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion labels May 3, 2026
@timsaucer timsaucer self-assigned this May 3, 2026
@timsaucer timsaucer requested review from Copilot and xudong963 May 3, 2026 16:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an agent-facing llms.txt “ecosystem hub” at the documentation site root (https://datafusion.apache.org/llms.txt) to improve automated discovery of DataFusion and subproject documentation.

Changes:

  • Adds docs/source/llms.txt describing core DataFusion docs, subproject docs roots, and optional links.
  • Configures Sphinx to copy llms.txt verbatim to the built site root via html_extra_path.
  • Excludes docs/source/llms.txt from the Apache RAT license-header check.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
docs/source/llms.txt New hub file linking to core DataFusion docs and subproject docs roots.
docs/source/conf.py Copies llms.txt into the built HTML output root.
dev/release/rat_exclude_files.txt Excludes the new llms.txt from RAT header enforcement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/source/llms.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants