Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 0 additions & 22 deletions .github/workflows/rust.yml

This file was deleted.

18 changes: 0 additions & 18 deletions .github/workflows/simple_checks.yml

This file was deleted.

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,4 @@ src/lang/testdata
*.json

tools
abcoder
28 changes: 0 additions & 28 deletions Cargo.toml

This file was deleted.

128 changes: 33 additions & 95 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,126 +1,64 @@
<!--
Copyright 2025 CloudWeGo Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# ABCoder: AI-Based Coder(AKA: A Brand-new Coder)

![ABCoder](images/ABCoder.png)

ABCoder, an AI-powered tool, streamlines coding by keeping real-time status updates, providing lossless code compression, and giving development guidance. It enhances testing by identifying quality, generating reports, and auto-creating test cases. It also offers guidance for refactoring, including language stack switches.

# Table of Contents

- [ABCoder: AI-Based Coder(AKA: A Brand-new Coder)](#abcoder-ai-based-coderaka-a-brand-new-coder)
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [Quick Start](#quick-start)
- [Prerequisites](#prerequisites)
- [Running through Coze OpenAPI](#running-through-coze-openapi)
- [Status Update](#status-update)
- [Lossless Compression](#lossless-compression)
- [Development Guide](#development-guide)
- [Testing Enhancements](#testing-enhancements)
- [Refactor/Rewrite Guide](#refactorrewrite-guide)
- [Getting Involved](#getting-involved)

# Overview
ABCoder, an AI-oriented code-processing SDK, is designed to enhance coding context for Large-Language-Model (LLM), and boost developing AI-assisted-coding applications.

ABCoder is a comprehensive open-source software development tool that aims to utilize artificial intelligence to enhance
the process of coding. This project focuses on various aspects of software development ranging from repository analysis,
issue and pull request tracking, to automated code compression, development guidance, testing enhancement, and
refactoring guidance.

# Quick Start
## Features

## Prerequisites
- install git and set your access token for github on cmd-line
- install [rust-toolchain](https://www.rust-lang.org/tools/install) (stable)
- (optional) install [ollama](https://github.com/ollama/ollama) and run your LLM
- (optional) create a [Coze](https://www.coze.com/docs/developer_guides/coze_api_overview?_lang=en) agent and set its OpenAPI key
- Universal Abstract Syntax Tree (UniAST), an language-independent, AI-friendly specification of code information, providing a flexible and structrual coding context for both AI and hunman.

- General Parser, parses abitary-language codes to UniAST.

## Running through Coze OpenAPI
1. Set .env file for configuration on ABCoder's working directory. Taking Coze as an example:
```
# cache for repo,AST and so on
WORK_DIR=tmp_abcoder
- General Writer, transforms UniAST back to codes.

- (Comming Soon) General Iterator, a framework for visiting the UniAST easily and implementing batch-code-processing workflows.

# exclude dirs for repo parsing, separated by comma
EXCLUDE_DIRS=target,gen-codes
- (Comming Soon) Code RAG, provides a set of tools and functions to help the LLM understand your codes much deeper than ever.

# LLM's api type
API_TYPE=coze # coze|ollama
Based on these features, developers can easily implement or enhance their AI-assisted-coding applications, such as reviewing, optimizing, translating, etc.

# LLM's output language
LANGUAGE=zh

# Coze options
COZE_API_TOKEN="{YOUR_COZE_API_TOKEN}"
COZE_BOT_ID={YOUR_COZE_BOT_ID}
```
## Universal-Abstract-Syntax-Tree Specification

2. compile the parsers
```
./script/make_parser.sh
```
see [UniAST Specification](docs/uniast-zh.md)

3. compile and run ABCoder
```
cargo run --bin cmd compress https://xxx.git
```

4. Once triggered, ABCoder will take three steps:
1. Download the repository in {REPO_DIR}
2. Parse the repository and store the AST in {CACHE_DIR}
3. Call the LLM to compress the repository codes, and refresh the AST for each call.
You can stop the process at anytime after step 2. You can restart the compressing by running the same command.
# Getting Started

5. Export the compressed results
1. Install ABCoder:
```bash
go install github.com/cloudwego/abcoder@latest
```
cargo run --bin cmd export https://xxx.git --out-dir {OUTPUT_DIR}
2. Use ABCoder to parse a repository to UniAST (JSON)
```bash
abcoder parse {language} {repo-path} > ast.json
```
3. Do your magic with UniAST...
4. Use ABCoder to write a UniAST back to codes
```bash
abcoder write {language} ast.json
```

# Status Update

The system is designed to automatically fetch the latest data from Github upon triggering relevant tasks, ensuring the
repository status is always up-to-date. It can answer queries related to function, defects based on issue and PR
information. For more details, check out our Issues and Pull Requests sections on Github.

# Lossless Compression

The system also offers a lossless compression feature for repository code. The specific implementation methods are being
optimized, and more details will be available soon.

# Development Guide

We welcome all developers wishing to contribute to ABCoder. Our system provides detailed guidance for manual development
and also supports auto-generation of instructions. Check out our Contribution Guide for more information.
# Supported Languages

# Testing Enhancements
ABCoder currently supports the following languages:

The system is designed to analyze existing functions and corresponding tests, identify the overall quality of testing,
produce reports, and automatically generate test cases for weakly covered items. Our goal is to help repositories
enhance and perfect their test cases.
| Language | Parser | Writer |
| -------- | ----------- | ----------- |
| Go | ✅ | ✅ |
| Rust | ✅ | Coming Soon |
| C | Coming Soon | ❌ |

# Refactor/Rewrite Guide

We offer guidance for both small-scale feature iterations and large-scale rewrites, including language stack switches.
Our system provides a detailed guide for manual development and also supports automated guidance generation.

# Getting Involved

We encourage developers to contribute and make this tool more powerful. If you are interested in contributing to ABCoder
project, kindly check out our Getting Involved Guide.
project, kindly check out our Getting Involved Guide:
- [Parser Extension](docs/parser-zh.md)

> Note: This is a dynamic README and is subject to changes as the project evolves.
102 changes: 102 additions & 0 deletions docs/parser-zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# ABCoder - Language Parser 介绍

当前 ABCoder 基于 [LSP](https://microsoft.github.io/language-server-protocol/) 协议实现 Parser ,以达到精确依赖收集,并方便后续多语言扩展

## 代码结构

位于 [lang](/lang) 包下,包括:

- uniast:统一 AST 结构的 golang 定义
- lsp:LSP 协议处理 client,提供了 文件解析、引用查找、语法树解析、定义查找等接口,以及**通用的语言规范 LanguageSpec 接口**
- collect:负责基于 LSP 符号收集和导出 UniAST,是核心运算逻辑
- {language}:主体是对 lsp#Spec 接口的对应 {language} 规范的实现。此外还有具体 LSP server 的一些具体调用逻辑

## 运算过程

![lang-parser](../images/lang-parser.png)

1. 通过命令行参数识别语言启动对应 LSP server,并传入初始化参数
2. 遍历仓库文件,调用 `textDocument/documentSymbol` 方法获取每个文件的所有符号。对于每个符号
1. 调用 `textDocument/semanticTokens/range` 方法获取符号代码中的 tokens
2. 识别出有效实体的 token,调用 `textDocument/definition` 跳转到对应符号位置,从而建立节点依赖关系
3. 循环 2 直文件结束。最终将收集到的 lsp symbols 转换为 UniAST 格式并输出

## 扩展其它语言实现

由于 UniAST 并不完全等价 LSP, 因此需要实现一些特定语言专属的行为接口才能进行转换。参考 lang/rust 包,大体需要实现以下能力:

- GetDefaultLSP():映射用户输入 language 到具体的 lsp.Language,以及对应的 LSP 名称
- CheckRepo():检查用户仓库情况,根据各语言规范额处理工具链等问题,并返回默认打开的第一个文件(用于触发 LSP server),以及等候 sever 初始化完成的时间(根据仓库大小来决定)
- **LanguageSpec interface**: 核心模块,用于处理非 LSP 通用的语法信息、比如判断一个 token 是否是标准库的符号、函数签名解析等:
- ModulePatcher: 后处理模块,用于处理语言特殊的信息收集。比如 rust 的 use 符号收集(LSP 不收集)。可以不实现

### LaunguageSpec

```
用于在 LSP 符号收集过程中转换为 UniAST 所需信息,并且这些信息非 LSP 通用定义

```go

// Detailed implementation used for collect LSP symbols and transform them to UniAST
type LanguageSpec interface {
// initialize a root workspace, and return all modules [modulename=>abs-path] inside
WorkSpace(root string) (map[string]string, error)

// give an absolute file path and returns its module name and package path
// external path should alse be supported
// FIXEM: some language (like rust) may have sub-mods inside a file, but we still consider it as a unity mod here
NameSpace(path string) (string, string, error)

// tells if a file belang to language AST
ShouldSkip(path string) bool

// return the first declaration token of a symbol, as Type-Name
DeclareTokenOfSymbol(sym DocumentSymbol) int

// tells if a token is an AST entity
IsEntityToken(tok Token) bool

// tells if a token is a std token
IsStdToken(tok Token) bool

// return the SymbolKind of a token
TokenKind(tok Token) SymbolKind

// tells if a symbol is a main function
IsMainFunction(sym DocumentSymbol) bool

// tells if a symbol is a language symbol (func, type, variable, etc) in workspace
IsEntitySymbol(sym DocumentSymbol) bool

// tells if a symbol is public in workspace
IsPublicSymbol(sym DocumentSymbol) bool

// declare if the language has impl symbol
// if it return true, the ImplSymbol() will be called
HasImplSymbol() bool
// if a symbol is an impl symbol, return the token index of interface type, receiver type and first-method start (-1 means not found)
// ortherwise the collector will use FunctionSymbol() as receiver type token index (-1 means not found)
ImplSymbol(sym DocumentSymbol) (int, int, int)

// if a symbol is a Function or Method symbol, return the token index of Receiver (-1 means not found),TypeParameters, InputParameters and Outputs
FunctionSymbol(sym DocumentSymbol) (int, []int, []int, []int)
}
```

- Rust-parser 实现位置:[RustSpec](/lang/rust/spec.go)

```

### ModulePatcher

用于后处理收集完成的模块信息

```go
// ModulePatcher supplements some information for module
type ModulePatcher interface {
// Patch is called after collect all symbols
Patch(ast *parse.Module)
}
```

- Rust-parser 实现: [RustModulePatcher](/lang/rust/patch.go)
Loading