Skip to content

Commit ea2a968

Browse files
committed
doc: add docs
1 parent 4a853dd commit ea2a968

9 files changed

Lines changed: 2137 additions & 46 deletions

File tree

README.md

Lines changed: 8 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,22 @@
1-
<!--
2-
Copyright 2025 CloudWeGo Authors
3-
4-
Licensed under the Apache License, Version 2.0 (the "License");
5-
you may not use this file except in compliance with the License.
6-
You may obtain a copy of the License at
7-
8-
https://www.apache.org/licenses/LICENSE-2.0
9-
10-
Unless required by applicable law or agreed to in writing, software
11-
distributed under the License is distributed on an "AS IS" BASIS,
12-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13-
See the License for the specific language governing permissions and
14-
limitations under the License.
15-
-->
16-
171
# ABCoder: AI-Based Coder(AKA: A Brand-new Coder)
182

193
![ABCoder](images/ABCoder.png)
204

21-
ABCoder, an AI-oriented code-processing tool, is designed to enhance coding-context for Large-Language-Model (LLM), simplify AI-assisted-coding process.
5+
ABCoder, an AI-oriented code-processing SDK, is designed to enhance coding context for Large-Language-Model (LLM), and boost developing AI-assisted-coding workflow.
226

237
## Features
248

25-
- Universal Abstract Syntax Tree (UniAST), an language-independent and AI-friendly coding-context AST specfication, providing ample and recursive code information for both AI and hunman.
9+
- Universal Abstract Syntax Tree (UniAST), an language-independent, AI-friendly code-struct specfication, providing flexible and structrual coding-context for both AI and hunman.
2610

2711
- Universal Parser, parses abitary languages to UniAST.
2812

2913
- Univeral Writer, transforms UniAST back to codes.
14+
15+
- (Comming Soon) Univeral Iterator, provides a set of interfaces and tools to help developers to implement their agents without deep knowledge of the UniAST structure.
16+
17+
- (Comming Soon) Code RAG, provides a set of tools to help the LLM understand your codes much deeper than ever.
3018

31-
Based on these features, developers can easily implement or enhance their AI-assisted-coding agent or workflow, such as reviewing, optimizing, translating...
19+
Based on these features, developers can easily implement or enhance their AI-assisted-coding workflows (or agents), such as reviewing, optimizing, translating...
3220

3321
## Getting Started
3422

@@ -67,7 +55,6 @@ ABCoder currently supports the following languages:
6755

6856
We encourage developers to contribute and make this tool more powerful. If you are interested in contributing to ABCoder
6957
project, kindly check out our Getting Involved Guide:
70-
- [Parser Extension](docs/parser_extension-zh.md)
71-
- [Writer Extension](docs/writer_extension-zh.md)
58+
- [Parser Extension](docs/parser-zh.md)
7259

7360
> Note: This is a dynamic README and is subject to changes as the project evolves.

docs/parser-zh.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# ABCoder - Language Parser 介绍
2+
3+
当前 ABCoder 基于 [LSP](https://microsoft.github.io/language-server-protocol/) 协议实现 Parser ,以达到精确依赖收集,并方便后续多语言扩展
4+
5+
## 代码结构
6+
7+
位于 [lang](/lang) 包下,包括:
8+
9+
- uniast:统一 AST 结构的 golang 定义
10+
- lsp:LSP 协议处理 client,提供了 文件解析、引用查找、语法树解析、定义查找等接口,以及**通用的语言规范 LanguageSpec 接口**
11+
- collect:负责基于 LSP 符号收集和导出 UniAST,是核心运算逻辑
12+
- rust:主体是对 lsp#Spec 接口的 rust 语言规范实现。此外还有具体 lsp(rust-analyzer)的一些具体调用逻辑,以及针对按 id 查找的实现
13+
- go: go 的 parser 和 writer 实现
14+
15+
## 运算过程
16+
17+
![lang-parser](../images/lang-parser.png)
18+
19+
1. 通过命令行参数识别语言启动对应 lsp server,并传入初始化参数
20+
2. 遍历仓库文件,调用 `textDocument/documentSymbol` 方法获取每个文件的所有符号。对于每个符号
21+
1. 调用 `textDocument/semanticTokens/range` 方法获取符号代码中的 tokens
22+
2. 识别出有效实体的 token,调用 `textDocument/definition` 跳转到对应符号位置,从而建立节点依赖关系
23+
3. 循环 2 直文件结束。最终将收集到的 lsp symbols 转换为 UniAST 格式并输出
24+
25+
## 扩展其它语言实现
26+
27+
由于 UniAST 并不完全等价 LSP, 因此需要实现一些特定语言专属的行为接口才能进行转换。参考 lang/rust 包,大体需要实现以下能力:
28+
29+
- GetDefaultLSP():映射用户输入 language 到具体的 lsp.Language,以及对应的 lsp 名称
30+
- CheckRepo():检查用户仓库情况,根据各语言规范额处理工具链等问题,并返回默认打开的第一个文件(用于触发 lsp-server),以及等候 sever 初始化完成的时间(根据仓库大小来决定)
31+
- **LanguageSpec interface**: 核心模块,用于处理非 LSP 通用的语法信息、比如判断一个 token 是否是标准库的符号、函数签名解析等:
32+
- ModulePatcher: 后处理模块,用于处理语言特殊的信息收集。比如 rust 的 use 符号收集(lsp 不收集)。可以不实现
33+
34+
### LaunguageSpec
35+
36+
```
37+
用于在 lsp 符号收集过程中转换为 UniAST 所需信息,并且这些信息非 LSP 通用定义
38+
39+
```go
40+
41+
// Detailed implementation used for collect LSP symbols and transform them to UniAST
42+
type LanguageSpec interface {
43+
// initialize a root workspace, and return all modules [modulename=>abs-path] inside
44+
WorkSpace(root string) (map[string]string, error)
45+
46+
// give an absolute file path and returns its module name and package path
47+
// external path should alse be supported
48+
// FIXEM: some language (like rust) may have sub-mods inside a file, but we still consider it as a unity mod here
49+
NameSpace(path string) (string, string, error)
50+
51+
// tells if a file belang to language AST
52+
ShouldSkip(path string) bool
53+
54+
// return the first declaration token of a symbol, as Type-Name
55+
DeclareTokenOfSymbol(sym DocumentSymbol) int
56+
57+
// tells if a token is an AST entity
58+
IsEntityToken(tok Token) bool
59+
60+
// tells if a token is a std token
61+
IsStdToken(tok Token) bool
62+
63+
// return the SymbolKind of a token
64+
TokenKind(tok Token) SymbolKind
65+
66+
// tells if a symbol is a main function
67+
IsMainFunction(sym DocumentSymbol) bool
68+
69+
// tells if a symbol is a language symbol (func, type, variable, etc) in workspace
70+
IsEntitySymbol(sym DocumentSymbol) bool
71+
72+
// tells if a symbol is public in workspace
73+
IsPublicSymbol(sym DocumentSymbol) bool
74+
75+
// declare if the language has impl symbol
76+
// if it return true, the ImplSymbol() will be called
77+
HasImplSymbol() bool
78+
// if a symbol is an impl symbol, return the token index of interface type, receiver type and first-method start (-1 means not found)
79+
// ortherwise the collector will use FunctionSymbol() as receiver type token index (-1 means not found)
80+
ImplSymbol(sym DocumentSymbol) (int, int, int)
81+
82+
// if a symbol is a Function or Method symbol, return the token index of Receiver (-1 means not found),TypeParameters, InputParameters and Outputs
83+
FunctionSymbol(sym DocumentSymbol) (int, []int, []int, []int)
84+
}
85+
```
86+
87+
- Rust-parser 实现位置:[RustSpec](/lang/rust/spec.go)
88+
89+
```
90+
91+
### ModulePatcher
92+
93+
用于后处理收集完成的模块信息
94+
95+
```go
96+
// ModulePatcher supplements some information for module
97+
type ModulePatcher interface {
98+
// Patch is called after collect all symbols
99+
Patch(ast *parse.Module)
100+
}
101+
```
102+
103+
- Rust-parser 实现: [RustModulePatcher](/lang/rust/patch.go)

0 commit comments

Comments
 (0)