Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/regression.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ jobs:
['id']
['Path']
['ToolVersion']
['ASTVersion']
['Modules']['a.b/c']['Dependencies']['a.b/c']
['Modules']['a.b/c/cmdx']['Dependencies']['a.b/c/cmdx']
steps:
Expand Down
51 changes: 49 additions & 2 deletions docs/uniast-en.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Universal Abstract-Syntax-Tree Specification (v0.1.3)
# Universal Abstract-Syntax-Tree Specification (v0.1.5)

Universal Abstract-Syntax-Tree is a LLM-friendly, language-agnostic code context data structure established by ABCoder. It represents a unified abstract syntax tree of a repository's code, collecting definitions of language entities (functions, types, constants/variables) and their interdependencies for subsequent AI understanding and coding-workflow development.

Expand Down Expand Up @@ -370,6 +370,23 @@ Function type AST Node entity, corresponding to [NodeType] as FUNC, including fu

- Vars: Global variables referenced within the current function, including variables and constants

- Extra: Additional information for storing language-specific details or extra metadata


- AnonymousFunctions: Anonymous functions defined in the function, each element is the FileLine of the corresponding function


- File: The filename where it is located


- Line: **Line number of the starting position in the file (starting from 1)**


- StartOffset: **Byte offset of the code starting position relative to the file header**


- EndOffset: **Byte offset of the code ending position relative to the file header**


###### Dependency

Expand All @@ -384,7 +401,10 @@ Represents a dependency relationship, containing the dependent node Id, dependen
"File": "manager.go",
"Line": 140,
"StartOffset": 3547,
"EndOffset": 3564
"EndOffset": 3564,
"Extra": {
"FunctionIsCall": true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FunctionIsCall -> FunctionIsCalled ?

}
}
```

Expand All @@ -409,6 +429,12 @@ Represents a dependency relationship, containing the dependent node Id, dependen
- EndOffset: Offset of the ending position of the dependency point (not the dependent node) token relative to the code file


- Extra: Additional information for storing language-specific details or extra metadata


- FunctionIsCall: If the Dependency is a function call, whether it actually executes the function call or just references the function


##### Type

Type definition, [NodeType] is TYPE, including type definitions in specific languages such as structs, enums, interfaces, type aliases, etc.
Expand Down Expand Up @@ -490,6 +516,9 @@ Type definition, [NodeType] is TYPE, including type definitions in specific lang
- Implements: Which interfaces this type implements Identity


- Extra: Additional information for storing language-specific details or extra metadata


##### Var

Global variables, including variables and constants, **but must be global**
Expand Down Expand Up @@ -553,6 +582,24 @@ var x = getx(y db.Data) int {
- Groups: Group definitions, such as `const( A=1, B=2, C=3)` in Go, Groups would be `[C=3, B=2]` (assuming A is the variable itself)


- Extra: Additional information for storing language-specific details or extra metadata


- AnonymousFunctions: Anonymous functions defined in the initialization function of the current variable. Each element is the FileLine of the corresponding function


- File: The filename where it is located


- Line: **Line number of the starting position in the file (starting from 1)**


- StartOffset: **Byte offset of the code starting position relative to the file header**


- EndOffset: **Byte offset of the code ending position relative to the file header**


### Graph

The dependency topology graph of all AST Nodes in the repository. Formatted as Identity => Node mapping, where each Node contains dependency relationships with other nodes.
Expand Down
53 changes: 50 additions & 3 deletions docs/uniast-zh.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Universal Abstract-Syntax-Tree Specification (v0.1.3)
# Universal Abstract-Syntax-Tree Specification (v0.1.5)

Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言无关的代码上下文数据结构,表示某个仓库代码的统一抽象语法树。收集了语言实体(函数、类型、常(变)量)的定义及其相互依赖关系,用于后续的 AI 理解、coding-workflow 开发。

Expand Down Expand Up @@ -371,20 +371,40 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
- Vars: 当前函数内引用的全局量,包括变量和常量


- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据


- AnonymousFunctions: 函数中所定义的匿名函数,每个元素为对应函数的 FileLine


- File: 所在的文件名


- Line: **起始位置文件的行号(从1开始)**


- StartOffset: 代码起始位置**相对文件头的字节偏移量**


- EndOffset: 代码结束位置**相对文件头的字节偏移量**

###### Dependency

表示一个依赖关系,包含依赖节点 Id、依赖产生位置等信息,方便 LLM 准确识别


```
```json
{
"ModPath": "github.com/cloudwego/localsession",
"PkgPath": "github.com/cloudwego/localsession",
"Name": "transmitSessionIdentity",
"File": "manager.go",
"Line": 140,
"StartOffset": 3547,
"EndOffset": 3564
"EndOffset": 3564,
"Extra": {
"FunctionIsCall": true
}
}
```

Expand All @@ -409,6 +429,12 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
- EndOffset: 依赖点(不是被依赖节点)token 结束位置相对代码文件的偏移


- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据


- FunctionIsCall: 如果 Dependency 是一个函数调用,是否真正执行了函数调用,而不是只是引用了函数


##### Type

类型定义,【NodeType】为 TYPE,包括具体语言中的类型定义,如 结构体、枚举、接口、类型别名等
Expand Down Expand Up @@ -490,6 +516,9 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
- Implements: 该类型实现了哪些接口 **Identity**


- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据


##### Var

全局量,包括变量和常量,**但是必须是全局**
Expand Down Expand Up @@ -553,6 +582,24 @@ var x = getx(y db.Data) int {
- Groups: 同组定义, 如 Go 中的 `const( A=1, B=2, C=3)`,Groups 为 `[C=3, B=2]`(假设 A 为变量自身)


- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据


- AnonymousFunctions: 在当前变量的初始化函数中,所定义的匿名函数。每个元素为对应函数的 FileLine


- File: 所在的文件名


- Line: **起始位置文件的行号(从1开始)**


- StartOffset: 代码起始位置**相对文件头的字节偏移量**


- EndOffset: 代码结束位置**相对文件头的字节偏移量**


### Graph

整个仓库的 AST Node 依赖拓扑图。形式为 Identity => Node 的映射,其中每个 Node 包含对其它节点的依赖关系。基于该拓扑图,可以实现**任意节点上下文的递归获取**。
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ go 1.23.4

require (
github.com/Knetic/govaluate v3.0.1-0.20171022003610-9aa49832a739+incompatible
github.com/bytedance/sonic v1.14.1
github.com/cloudwego/eino v0.3.52
github.com/cloudwego/eino-ext/components/model/ark v0.1.16
github.com/cloudwego/eino-ext/components/model/claude v0.1.1
Expand Down Expand Up @@ -44,7 +45,6 @@ require (
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/bytedance/gopkg v0.1.3 // indirect
github.com/bytedance/sonic v1.14.1 // indirect
github.com/bytedance/sonic/loader v0.3.0 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cloudwego/base64x v0.1.6 // indirect
Expand Down
63 changes: 60 additions & 3 deletions lang/golang/parser/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,11 @@ import (
. "github.com/cloudwego/abcoder/lang/uniast"
)

const (
ExtraKey_FunctionIsCall = "FunctionIsCall"
ExtraKey_AnonymousFunctions = "AnonymousFunctions"
)

func (p *GoParser) parseFile(ctx *fileContext, f *ast.File) error {
cont := true
ast.Inspect(f, func(node ast.Node) bool {
Expand Down Expand Up @@ -121,7 +126,9 @@ func (p *GoParser) parseVar(ctx *fileContext, vspec *ast.ValueSpec, isConst bool

// collect func value dependencies, in case of var a = func() {...}
if val != nil && !isConst {
collects := collectInfos{}
collects := collectInfos{
directCalls: map[FileLine]bool{},
}
ast.Inspect(*val, func(n ast.Node) bool {
return p.parseASTNode(ctx, n, &collects)
})
Expand All @@ -137,6 +144,16 @@ func (p *GoParser) parseVar(ctx *fileContext, vspec *ast.ValueSpec, isConst bool
for _, dep := range collects.tys {
v.Dependencies = InsertDependency(v.Dependencies, dep)
}
if len(collects.directCalls) > 0 {
for i, dep := range v.Dependencies {
if collects.directCalls[dep.FileLine] {
v.Dependencies[i].SetExtra(ExtraKey_FunctionIsCall, true)
}
}
}
if len(collects.anonymousFunctions) > 0 {
v.SetExtra(ExtraKey_AnonymousFunctions, collects.anonymousFunctions)
}
}

if vspec.Type != nil {
Expand Down Expand Up @@ -392,12 +409,19 @@ func (p *GoParser) parseSelector(ctx *fileContext, expr *ast.SelectorExpr, infos
type collectInfos struct {
functionCalls, methodCalls []Dependency
tys, globalVars []Dependency

directCalls map[FileLine]bool
anonymousFunctions []FileLine // record anonymous function
}

func (p *GoParser) parseASTNode(ctx *fileContext, node ast.Node, collect *collectInfos) bool {
switch expr := node.(type) {
case *ast.SelectorExpr:
return p.parseSelector(ctx, expr, collect)
case *ast.CallExpr:
p.parseCall(ctx, expr, collect)
case *ast.FuncLit:
collect.anonymousFunctions = append(collect.anonymousFunctions, ctx.FileLine(expr))
case *ast.Ident:
callName := expr.Name
// println("[parseFunc] ast.Ident:", callName)
Expand Down Expand Up @@ -462,6 +486,22 @@ func (p *GoParser) parseASTNode(ctx *fileContext, node ast.Node, collect *collec
return true
}

// parseCall collect direct call info
func (p *GoParser) parseCall(ctx *fileContext, expr *ast.CallExpr, collect *collectInfos) {
var ident *ast.Ident

switch idt := expr.Fun.(type) {
case *ast.Ident:
ident = idt
case *ast.SelectorExpr:
ident = idt.Sel
}

if ident != nil {
collect.directCalls[ctx.FileLine(ident)] = true
}
}

// parseFunc parses all function declaration in one file
func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Function, bool) {
// method receiver
Expand Down Expand Up @@ -511,7 +551,9 @@ func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Functio
// collect content
content := string(ctx.GetRawContent(funcDecl))

collects := collectInfos{}
collects := collectInfos{
directCalls: map[FileLine]bool{},
}
if funcDecl.Body == nil {
goto set_func
}
Expand All @@ -521,7 +563,6 @@ func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Functio
})

set_func:

if fname == "init" && p.repo.GetFunction(NewIdentity(ctx.module.Name, ctx.pkgPath, fname)) != nil {
// according to https://go.dev/ref/spec#Program_initialization_and_execution,
// duplicated init() is allowed and never be referenced, thus add a subfix
Expand All @@ -544,6 +585,22 @@ set_func:
f.Types = InsertDependency(f.Types, t)
}
f.Signature = string(sig)

if len(collects.directCalls) > 0 {
for i, dep := range f.FunctionCalls {
if collects.directCalls[dep.FileLine] {
f.FunctionCalls[i].SetExtra(ExtraKey_FunctionIsCall, true)
}
}
for i, dep := range f.MethodCalls {
if collects.directCalls[dep.FileLine] {
f.MethodCalls[i].SetExtra(ExtraKey_FunctionIsCall, true)
}
}
}
if len(collects.anonymousFunctions) > 0 {
f.SetExtra(ExtraKey_AnonymousFunctions, collects.anonymousFunctions)
}
return f, false
}

Expand Down
Loading
Loading