Skip to content

Commit 2d800f1

Browse files
committed
docs: sync English README with Chinese update
1 parent 78ae8b9 commit 2d800f1

2 files changed

Lines changed: 198 additions & 30 deletions

File tree

README.md

Lines changed: 86 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,38 @@
66

77
English README. For Chinese documentation, see [README.zh-CN.md](./README.zh-CN.md).
88

9-
`code2skill` turns a real Python repository into structured repository knowledge, AI-ready Skill documents, and target-specific rule files for Cursor, Claude Code, Codex, GitHub Copilot, and Windsurf.
9+
`code2skill` is a CLI for real Python repositories. It turns source code into structured project knowledge, Skill documents that AI coding assistants can consume directly, and rule files adapted for tools such as Cursor, Claude Code, Codex, GitHub Copilot, and Windsurf.
1010

11-
It is built for repositories that need repeatable AI context instead of one-off prompts. The pipeline scans code first, extracts structure and evidence, plans focused skills, and then generates grounded Markdown that can be reused locally or in CI.
11+
It provides a full chain from repository scanning and structural analysis to Skill generation and rule adaptation, with incremental updates based on diffs and historical state. The generated outputs are written to disk so they can be reviewed, committed, reused, and continuously integrated into local development and CI workflows.
12+
13+
## Why Skills Matter
14+
15+
In traditional software development, the `README` is the standard entry document for a project. It is written for human developers and usually covers project introduction, installation, usage, development setup, and examples.
16+
17+
In the AI IDE era, AI tools also read READMEs, documentation, and source code to understand a project. At that point, a repository needs a form of knowledge that is better suited for direct AI consumption. READMEs still matter, but they often mix user guidance, developer guidance, background context, historical notes, sample snippets, and presentation-oriented material. That structure is natural for human readers. For AI systems, however, project conventions, important patterns, and execution boundaries are more useful when they are presented in a more unified and structured form.
18+
19+
A Skill is that AI-oriented project document form.
20+
21+
In practice, a Skill can be treated as an engineering-grade README for AI. It organizes implementation-relevant knowledge into stable, clear, and maintainable documents so that AI can read consistent project context across different tools, sessions, and stages of work.
22+
23+
Skills let a repository express information such as:
24+
25+
- the core structure of the project and the responsibility boundaries of modules
26+
- the important roles, call relationships, and behavioral constraints in the code
27+
- existing patterns, conventions, and preferred extension paths
28+
- the implementation path and modification style expected for specific tasks
29+
- a unified source for downstream tool-specific rule files
30+
31+
Once that information is materialized as Skills, it can be consumed directly by AI IDEs, agents, and automation workflows. Developers can also iterate on collaboration practices around those Skills and turn "how this repository should be worked on" into an auditable, commit-friendly, evolvable engineering asset.
32+
33+
## What code2skill Provides
34+
35+
`code2skill` builds project knowledge from real Python repositories and generates a set of outputs that can be written to disk, tracked over time, and integrated into normal engineering workflows.
36+
37+
It covers the full chain from repository scanning, structural analysis, Skill planning, and document generation to tool-specific rule adaptation. It also supports incremental regeneration so Skills can stay aligned as the repository evolves.
38+
39+
For one-off local analysis, `code2skill` can scan an entire repository and generate the full result set.
40+
For ongoing development workflows, it can combine historical state and code diffs to rebuild only the affected Skills, reducing repeated generation cost and making CI-based updates practical.
1241

1342
## What It Guarantees
1443

@@ -39,6 +68,61 @@ From one Python repository, `code2skill` can produce:
3968
- `report.json` for execution metrics, token estimates, and impact summaries
4069
- `state/analysis-state.json` for incremental CI reuse
4170

71+
## The Role Of Skills In A Repository
72+
73+
Skills are the standardized AI-facing expression layer of repository knowledge.
74+
75+
They connect repository structure, implementation details, team conventions, and tool rules so an AI system can enter the project with one consistent context source instead of repeatedly reconstructing it from README files, scattered docs, previous implementations, and chat history.
76+
77+
In engineering practice, that creates direct value:
78+
79+
- it gives AI IDEs a unified, stable, low-noise project entry point
80+
- it lets developers turn recurring implementation patterns into reusable guidance
81+
- it helps future changes follow the same boundaries and extension paths already present in the repository
82+
- it gives rule-file generation a single consistent source of truth
83+
- it keeps repository knowledge incrementally maintained as code changes, instead of periodically rewritten by hand
84+
85+
That is why `code2skill` is really about organizing, transmitting, and updating repository knowledge for AI collaboration.
86+
87+
## Incremental Updates And Ongoing Maintenance
88+
89+
Repository knowledge needs to evolve with the code.
90+
91+
`code2skill` supports incremental regeneration based on historical analysis state and the current change scope. After code changes, it can identify the affected areas, rebuild the relevant Skills, and preserve outputs that are still valid. That makes it suitable for local development loops, pull request checks, and continuous CI automation.
92+
93+
This workflow has several practical benefits:
94+
95+
- it reduces the cost of repeated full regeneration on larger repositories
96+
- it keeps Skills synchronized with the current code state
97+
- it moves project-knowledge maintenance into the normal development process
98+
- it makes generated outputs reviewable, comparable, and commit-friendly
99+
100+
Skills therefore become a long-lived engineering asset rather than a temporary prompt artifact.
101+
102+
## Adapting To Multiple AI Tools
103+
104+
Different AI coding tools use different rule file formats, but they all need high-quality project context.
105+
106+
`code2skill` first generates a unified Skill-centered knowledge layer, then uses `adapt` to copy or merge that layer into target-specific formats, including:
107+
108+
- `AGENTS.md`
109+
- `CLAUDE.md`
110+
- `.cursor/rules/*`
111+
- `.github/copilot-instructions.md`
112+
- `.windsurfrules`
113+
114+
That approach lets a repository maintain one core knowledge representation and distribute consistent context and constraints to multiple AI tools without duplicating maintenance effort.
115+
116+
## When To Use code2skill
117+
118+
`code2skill` is a good fit for:
119+
120+
- Python repositories that want a stable project context for AI IDEs
121+
- teams that want repository knowledge committed as files instead of kept in chat threads
122+
- engineering workflows that need CI-based updates for AI rule files
123+
- projects that want diff-aware control over regeneration scope and cost
124+
- repositories that need one knowledge source adapted to multiple AI coding tools
125+
42126
## Pipeline
43127

44128
### Phase 1: Structural Scan

README.zh-CN.md

Lines changed: 112 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,38 +6,122 @@
66

77
中文文档。英文主页见 [README.md](./README.md)
88

9-
`code2skill` 是一个面向真实 Python 仓库的 CLI。它会把仓库转换成结构化项目知识、可直接给 AI 编程助手消费的 Skill 文档,以及适配 Cursor、Claude Code、Codex、GitHub Copilot、Windsurf 的规则文件
9+
`code2skill` 是一个面向真实 Python 仓库的 CLI,用于把代码仓库转换为结构化项目知识、供 AI 编程助手直接消费的 Skill 文档,以及适配 Cursor、Claude Code、Codex、GitHub Copilot、Windsurf 等工具的规则文件
1010

11-
它的目标不是一次性“总结仓库”,而是把仓库知识沉淀成可以复用、可审查、可提交、可增量更新的文件产物。
11+
它提供一条从仓库扫描、结构分析、Skill 生成到规则适配的完整链路,并支持基于 diff 和历史 state 的增量更新。生成结果直接落盘,可以被审查、提交、复用,并持续纳入本地开发与 CI 流程。
12+
13+
## 为什么需要 Skill
14+
15+
在传统软件开发中,`README` 是项目的标准入口文档。它面向人类开发者,承担项目介绍、安装运行说明、开发约定和使用示例等职责。
16+
17+
在 AI IDE 时代,AI 也会读取 README、文档和源代码来理解项目。此时,项目需要一种更适合 AI 消费的知识载体。README 依然重要,但它通常同时承载用户说明、开发说明、背景信息、历史讨论、示例片段和展示性内容。对于人类读者,这种组织方式是自然的;对于 AI 来说,项目约定、关键模式和执行边界需要以更统一、更结构化的形式呈现。
18+
19+
Skill 就是这种面向 AI 的项目文档形态。
20+
21+
在具体项目中,Skill 可以视为供给 AI 阅读的工程化 README。它将与实现直接相关的知识整理成稳定、清晰、可持续维护的文档,使 AI 在不同工具、不同会话、不同阶段中都能读取到一致的项目上下文。
22+
23+
Skill 让项目能够显式表达以下信息:
24+
25+
- 项目的核心结构与模块职责
26+
- 代码中的关键角色、调用关系与约束边界
27+
- 已存在的模式、惯例与推荐扩展方式
28+
- 特定任务应该沿用的实现路径与修改方式
29+
- 工具级规则文件所需的统一来源
30+
31+
这类信息一旦形成 Skill,就可以被 AI IDE、Agent 和自动化流程直接消费。开发者也可以围绕 Skill 持续迭代项目协作方式,把“如何在这个仓库里工作”沉淀为可审查、可提交、可演进的工程资产。
32+
33+
## code2skill 提供什么
34+
35+
`code2skill` 围绕真实 Python 仓库构建项目知识,并生成一套可落盘、可追踪、可集成进开发流程的产物。
36+
37+
它覆盖从仓库扫描、结构分析、Skill 规划、文档生成到规则适配的完整链路,并支持增量更新,使 Skill 能够随着项目演进持续保持同步。
38+
39+
对于单次本地分析,`code2skill` 可以扫描整个仓库并生成完整结果。
40+
对于持续开发场景,`code2skill` 可以结合历史 state 和代码 diff,只重建受影响的 Skill,减少重复生成成本,并使 CI 中的自动更新成为可行方案。
1241

1342
## 它保证什么
1443

1544
- Python 优先分析,核心基于 `ast`、import graph、角色推断和模式检测
1645
- Prompt 内建且有明确约束:英文输出、禁止 emoji、只允许基于证据推断
17-
- 输出落盘,不依赖聊天上下文
46+
- 输出直接落盘,不依赖聊天上下文
1847
- 每次执行都会生成 `report.json`,可追踪文件量、字符量、影响范围和成本估算
1948
- 支持增量执行,CI 可以复用历史 state,只重建受影响的 Skill
2049

2150
## 命令模型
2251

23-
| 命令 | 是否调用 LLM | 是否写输出 | 主要用途 |
24-
|---|---|---|---|
25-
| `scan` | 会,除非加 `--structure-only` || 本地全量生成 |
26-
| `estimate` | 不会 | 只写 `report.json` | 预估成本与影响范围 |
27-
| `ci` | 会,除非加 `--structure-only` || CI 中自动选择 full 或 incremental |
28-
| `adapt` | 不会 || 把 Skill 复制或合并成目标工具需要的规则文件 |
52+
| 命令 | 是否调用 LLM | 是否写输出 | 主要用途 |
53+
| ---------- | ----------------------------- | ------------------ | ------------------------------------------- |
54+
| `scan` | 会,除非加 `--structure-only` | | 本地全量生成 |
55+
| `estimate` | 不会 | 只写 `report.json` | 预估成本与影响范围 |
56+
| `ci` | 会,除非加 `--structure-only` | | CI 中自动选择 full 或 incremental |
57+
| `adapt` | 不会 | | 把 Skill 复制或合并成目标工具需要的规则文件 |
2958

3059
## 会生成什么
3160

3261
针对一个 Python 仓库,`code2skill` 可以产出:
3362

34-
- `project-summary.md`面向人的项目概览
35-
- `skill-blueprint.json`Phase 1 的结构蓝图
36-
- `skill-plan.json`LLM 规划出的 Skill 列表
37-
- `skills/index.md``skills/*.md`,真正给 AI 助手消费的 Skill 文档
63+
- `project-summary.md`面向人的项目概览
64+
- `skill-blueprint.json`Phase 1 的结构蓝图
65+
- `skill-plan.json`LLM 规划出的 Skill 列表
66+
- `skills/index.md``skills/*.md`:真正供 AI 助手消费的 Skill 文档
3867
- 通过 `adapt` 生成 `AGENTS.md``CLAUDE.md``.cursor/rules/*``.github/copilot-instructions.md``.windsurfrules`
39-
- `report.json`,记录执行指标、token 估算和影响摘要
40-
- `state/analysis-state.json`,用于 CI 增量复用
68+
- `report.json`:记录执行指标、token 估算和影响摘要
69+
- `state/analysis-state.json`:用于 CI 增量复用
70+
71+
## Skill 在项目里的角色
72+
73+
Skill 是项目知识面向 AI 的标准化表达层。
74+
75+
它连接仓库结构、代码实现、团队约定和工具规则,使 AI 在进入项目时可以直接读取统一上下文,而不需要在 README、零散文档、历史实现和对话上下文中反复拼接信息。
76+
77+
在工程实践中,Skill 具备几个直接价值:
78+
79+
- 为 AI IDE 提供统一、稳定、低噪声的项目入口
80+
- 让开发者可以把已有实现模式沉淀为可复用规范
81+
- 让后续开发自然延续 Skill 中描述的方式和边界
82+
- 让规则文件生成拥有一致的数据来源
83+
- 让项目知识随代码变化进行增量维护,而不是周期性重写
84+
85+
因此,`code2skill` 处理的是项目知识在 AI 协作环境中的组织、传递和更新。
86+
87+
## 增量更新与持续维护
88+
89+
项目知识需要随着代码一起演进。
90+
91+
`code2skill` 支持基于历史分析状态和变更范围进行增量更新。仓库发生修改后,工具可以识别受影响的部分,重建相关 Skill,并保留未受影响的结果。这种机制适合在本地开发、Pull Request 检查和 CI 自动化中持续使用。
92+
93+
这种工作流带来几个实际收益:
94+
95+
- 降低大仓库反复全量生成的成本
96+
- 保持 Skill 与当前代码状态同步
97+
- 让项目知识更新进入常规开发流程
98+
- 让生成结果可以被审查、比较和提交
99+
100+
Skill 因此成为一种可以长期维护的工程资产。
101+
102+
## 适配不同 AI 工具
103+
104+
不同 AI 编程工具有不同的规则文件格式,但它们对高质量项目上下文的需求是一致的。
105+
106+
`code2skill` 先生成以 Skill 为中心的统一知识产物,再通过 `adapt` 将其复制或合并为目标工具所需格式,包括:
107+
108+
- `AGENTS.md`
109+
- `CLAUDE.md`
110+
- `.cursor/rules/*`
111+
- `.github/copilot-instructions.md`
112+
- `.windsurfrules`
113+
114+
这种方式让项目只需要维护一套核心知识表达,就可以向多个 AI 工具分发一致的上下文和约束,减少重复维护和规则漂移。
115+
116+
## 适用场景
117+
118+
`code2skill` 适合以下场景:
119+
120+
- 希望为 AI IDE 提供稳定项目上下文的 Python 仓库
121+
- 希望把仓库知识沉淀为可提交文件而非聊天内容的团队
122+
- 需要在 CI 中自动更新 AI 规则文件的工程
123+
- 需要基于 diff 控制更新范围和成本的项目
124+
- 希望统一适配多种 AI 编程工具的代码库
41125

42126
## 整体流程
43127

@@ -134,19 +218,19 @@
134218

135219
下面这组数字是在 `2026-03-17`,对本仓库当前提交 `3714510`,在 Windows + Python `3.10.6` 环境下,使用默认限制和默认 heuristic pricing 实测得到的。
136220

137-
| 指标 | 结果 |
138-
|---|---|
139-
| `scan --structure-only` 耗时 | `1.33s` |
140-
| `estimate` 耗时 | `1.30s` |
141-
| 候选文件 / 最终选中文件 | `51 / 31` |
142-
| 完整结构扫描读取字节数 | `314,585` |
143-
| 最终保留上下文字符量 | `119,984 chars` |
144-
| 启发式推荐 Skill 数量 | `2` |
145-
| 首次生成估算 | `6,138` 输入 token,`1,610` 输出 token |
146-
| 单 Skill 估算 | `project-overview: 450 in / 850 out``backend-architecture: 5,688 in / 760 out` |
147-
| 复用 state 后第二次 `ci --mode auto` | `incremental` |
148-
| 无 diff 的增量读取量 | `20,939 bytes` |
149-
| 无 diff 的增量受影响 Skill | `0` |
221+
| 指标 | 结果 |
222+
| ------------------------------------ | -------------------------------------------------------------------------------- |
223+
| `scan --structure-only` 耗时 | `1.33s` |
224+
| `estimate` 耗时 | `1.30s` |
225+
| 候选文件 / 最终选中文件 | `51 / 31` |
226+
| 完整结构扫描读取字节数 | `314,585` |
227+
| 最终保留上下文字符量 | `119,984 chars` |
228+
| 启发式推荐 Skill 数量 | `2` |
229+
| 首次生成估算 | `6,138` 输入 token,`1,610` 输出 token |
230+
| 单 Skill 估算 | `project-overview: 450 in / 850 out``backend-architecture: 5,688 in / 760 out` |
231+
| 复用 state 后第二次 `ci --mode auto` | `incremental` |
232+
| 无 diff 的增量读取量 | `20,939 bytes` |
233+
| 无 diff 的增量受影响 Skill | `0` |
150234

151235
补充说明:
152236

0 commit comments

Comments
 (0)