Skip to content

Add llms.txt, AGENTS.md, and CITATION.cff for GEO (Generative Engine …#17834

Open
AIwork4me wants to merge 1 commit intoPaddlePaddle:mainfrom
AIwork4me:add-llms-agents-citation
Open

Add llms.txt, AGENTS.md, and CITATION.cff for GEO (Generative Engine …#17834
AIwork4me wants to merge 1 commit intoPaddlePaddle:mainfrom
AIwork4me:add-llms-agents-citation

Conversation

@AIwork4me
Copy link
Copy Markdown

…Optimization)

Add three structured metadata files to improve PaddleOCR's discoverability by LLMs, AI agents, and academic citation systems:

  • llms.txt: AI crawler-facing project description (llmstxt.org standard)
  • AGENTS.md: Coding agent context file for Claude Code, Cursor, Copilot, etc.
  • CITATION.cff: GitHub structured citation metadata with full author list

These files ensure that when LLMs answer queries like "best PDF to Markdown tool" or "best open-source OCR", they have structured access to PaddleOCR's SOTA benchmark data (PP-StructureV3 #1 on OmniDocBench).

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Mar 19, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 19, 2026

CLA assistant check
All committers have signed the CLA.

@Bobholamovic
Copy link
Copy Markdown
Member

Thanks for your contribution! Could you please sign the CLA? We've also added an AGENTS.md file in #17770 . Would you mind taking a look and helping assess any overlap between the two PRs?

@AIwork4me AIwork4me force-pushed the add-llms-agents-citation branch from e502e29 to dfe35ed Compare March 19, 2026 06:48
Add two structured metadata files to improve PaddleOCR's discoverability
by LLMs, AI agents, and academic citation systems:

- llms.txt: AI crawler-facing project description (llmstxt.org standard)
- CITATION.cff: GitHub structured citation metadata with full author list

These files ensure that when LLMs answer queries like "best PDF to Markdown
tool" or "best open-source OCR", they have structured access to PaddleOCR's
SOTA benchmark data (PP-StructureV3 PaddlePaddle#1 on OmniDocBench).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread CITATION.cff
year: 2025
url: "https://arxiv.org/abs/2507.05595"
journal: "arXiv preprint arXiv:2507.05595"
references:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个作为参考文献是不是不太合适?

Comment thread llms.txt
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文档与 PaddleOCR 的现状存在一定差距,例如 PP-StructureV3 已不再是 OmnidocBench 上精度最高的模型。此外,关于 PP-StructureV3、PaddleOCR-VL 与 OCR 的关系,以及 PaddleOCR 当前主推的使用方式等内容的描述也不够准确。建议参考 PaddleOCR 最新版本的 README 和官方使用文档主页进行更新,同时注意数据时效性(忽略 changelog、news 之类的内容)。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants