docs: add Chinese translation for README#332
Conversation
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughDocumentation updates adding language localization support to the OpenDataLoader PDF project. A language navigation link is added to the English README, and a comprehensive new Chinese README is introduced with project overview, feature descriptions, quickstart instructions, usage workflows, output formats, accessibility pipeline details, and contribution guidelines. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
README.zh.md (1)
16-19: 建议在中文 README 顶部加反向语言切换英文 README 已有语言切换;这里也加一行
[English](README.md) | [中文版](README.zh.md)会更一致、可发现性更好。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.zh.md` around lines 16 - 19, Add a language-switch line at the top of README.zh.md to match the English README; insert a single-line navigation like "[English](README.md) | [中文版](README.zh.md)" immediately above or below the main title/header so readers can quickly switch languages and improve discoverability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.zh.md`:
- Line 33: The README contains broken internal anchors (`#langchain-集成` and
`#自动标记预览-2026-q2-发布`) referenced in the sentence about RAG on Line 33/38; locate
the hyperlink targets (the section headings or TOC entries) that should match
these anchors and either rename those headings to exactly "LangChain 集成" and
"自动标记预览 2026 Q2 发布" (or the exact anchor strings) or update the links in the
Line 33/38 text to match the existing heading anchors; ensure the anchor
casing/spacing/encoding matches the generated markdown anchor for the headings
referenced by the symbols '#langchain-集成' and '#自动标记预览-2026-q2-发布'.
---
Nitpick comments:
In `@README.zh.md`:
- Around line 16-19: Add a language-switch line at the top of README.zh.md to
match the English README; insert a single-line navigation like
"[English](README.md) | [中文版](README.zh.md)" immediately above or below the main
title/header so readers can quickly switch languages and improve
discoverability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: deea6042-4946-4dc8-9c9e-150c1416c42f
📒 Files selected for processing (2)
README.mdREADME.zh.md
| - **准确度如何?** —— 基准测试排名第一:在 200 份包含多栏和科学论文的真实 PDF 中,综合准确度达 0.90,表格准确度达 0.93。本地确定性模式 + 复杂页面的 AI 混合模式 ([查看基准测试](#提取基准测试))。 | ||
| - **支持扫描件和 OCR 吗?** —— 支持。在混合模式下内置 OCR(支持 80 多种语言)。适用于 300 DPI 以上的低质量扫描件 ([混合模式详情](#混合模式复杂-pdf-的准确度冠军))。 | ||
| - **支持表格、公式、图像和图表吗?** —— 支持。通过混合模式处理复杂/无边框表格、LaTeX 公式以及 AI 生成的图片/图表描述 ([混合模式详情](#混合模式复杂-pdf-的准确度冠军))。 | ||
| - **如何用于 RAG?** —— `pip install opendataloader-pdf`,3 行代码即可转换。输出用于分块(chunking)的结构化 Markdown,用于源码引用的含边界框 JSON,以及 HTML。支持 LangChain 集成。提供 Python、Node.js、Java SDK ([快速开始](#30-秒快速上手) | [LangChain 集成](#langchain-集成))。 |
There was a problem hiding this comment.
修复 README 内部锚点失效链接
Line 33 和 Line 38 的片段链接目标在当前文档中不存在,会导致跳转失败(#langchain-集成、#自动标记预览-2026-q2-发布)。
🔧 Suggested diff
-- **如何用于 RAG?** —— `pip install opendataloader-pdf`,3 行代码即可转换。输出用于分块(chunking)的结构化 Markdown,用于源码引用的含边界框 JSON,以及 HTML。支持 LangChain 集成。提供 Python、Node.js、Java SDK ([快速开始](`#30-秒快速上手`) | [LangChain 集成](`#langchain-集成`))。
+- **如何用于 RAG?** —— `pip install opendataloader-pdf`,3 行代码即可转换。输出用于分块(chunking)的结构化 Markdown,用于源码引用的含边界框 JSON,以及 HTML。支持 LangChain 集成。提供 Python、Node.js、Java SDK ([快速开始](`#30-秒快速上手`) | [LangChain 集成](https://opendataloader.org/docs/langchain-opendataloader-pdf))。
-- **哪些功能免费?** —— 布局分析 + 自动标记(2026 Q2,Apache 2.0 协议)。将未标记的 PDF 转为 Tagged PDF,无需依赖专有 SDK ([自动标记预览](`#自动标记预览-2026-q2-发布`))。
+- **哪些功能免费?** —— 布局分析 + 自动标记(2026 Q2,Apache 2.0 协议)。将未标记的 PDF 转为 Tagged PDF,无需依赖专有 SDK ([工作流](`#无障碍流水线`))。Also applies to: 38-38
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 33-33: Link fragments should be valid
(MD051, link-fragments)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.zh.md` at line 33, The README contains broken internal anchors
(`#langchain-集成` and `#自动标记预览-2026-q2-发布`) referenced in the sentence about RAG on
Line 33/38; locate the hyperlink targets (the section headings or TOC entries)
that should match these anchors and either rename those headings to exactly
"LangChain 集成" and "自动标记预览 2026 Q2 发布" (or the exact anchor strings) or update
the links in the Line 33/38 text to match the existing heading anchors; ensure
the anchor casing/spacing/encoding matches the generated markdown anchor for the
headings referenced by the symbols '#langchain-集成' and '#自动标记预览-2026-q2-发布'.
|
Thanks for adding Chinese localization! Great community contribution. A few items to address before we can merge: Must fix:
Should fix: Minor:
Happy to merge once the must-fix items are resolved! |
|
Thanks for the translation — the quality is solid. We've decided not to maintain translated README files in the repository. The README changes frequently, and keeping translations in sync is a maintenance burden we can't commit to right now. Appreciate the effort, and sorry we can't merge this one! |
|
Thank you for your reply. Now I understand the situation. Thank you very much. |
docs: add Chinese translation for README
Description
This PR introduces a high-quality Chinese translation of the README (
README.zh.md) and adds a language switcher to the mainREADME.md.Motivation
OpenDataLoader PDF is a top-tier tool for AI-ready data extraction and PDF accessibility. Given the booming interest in RAG (Retrieval-Augmented Generation) and document intelligence within the Chinese developer community, providing localized documentation will significantly lower the barrier for users to leverage this project's capabilities—especially its #1 benchmark-ranked extraction accuracy and its hybrid mode support for Chinese OCR (
ch_sim,ch_tra).This translation covers:
Changes
README.zh.mdwith localized technical terminology (e.g., RAG, Bounding Boxes, Auto-tagging).Translated and contributed by @JasonYeYuhe
Summary by CodeRabbit
Documentation