Skip to content

Commit 9060832

Browse files
committed
Add structural evidence benchmark results
1 parent 987f8f5 commit 9060832

11 files changed

Lines changed: 781 additions & 2 deletions

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ Detailed notes for each tagged release live under [`docs/releases/`](./docs/rele
55

66
## Unreleased
77

8+
- Added a reproducible structural evidence benchmark with path-only and AST-symbol baselines, README chart, result JSON, and benchmark notes.
9+
- Fixed Python entrypoint role scoring so `main.py` and similar entry files are not mislabeled as root configuration files.
810
- Added semantic Python extraction for call targets, dynamic imports, type references, raised exceptions, class attributes, and lightweight data-flow edges.
911
- Improved internal import resolution for detailed `from ... import ...` records and dynamic imports.
1012
- Fed call/type/data-flow evidence into file prioritization, planning prompts, generated skeletons, and project summaries.

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@ include README.md
33
include README.zh-CN.md
44
include CHANGELOG.md
55
include docs/*.md
6+
recursive-include docs/assets *.svg
67
recursive-include docs/releases *.md
8+
recursive-include benchmarks *.py *.json
79
prune docs/superpowers
810
recursive-include src/code2skill py.typed
911
prune .code2skill

README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,26 @@ Use it when a Python project needs coding assistants to follow the current modul
4141
| Platform automation | A DevEx team runs the workflow across many Python services | Python API returns structured results and readiness status |
4242
| Contributor onboarding | New contributors need project-specific implementation rules | Generated Skills and docs describe the repo's working contracts |
4343

44+
## Benchmark
45+
46+
`code2skill` is evaluated on structural evidence extraction before any LLM call. The benchmark compares two simple baselines against the semantic scanner used by the Skill generation pipeline.
47+
48+
![Structural evidence benchmark](docs/assets/structural-evidence-benchmark.svg)
49+
50+
| Method | Gold evidence recall |
51+
|---|---:|
52+
| Path-only baseline | 0.048 |
53+
| AST symbols baseline | 0.357 |
54+
| code2skill semantic scanner | 1.000 |
55+
56+
The gold set covers route decorators, service calls, type references, data-flow edges, dynamic imports, raised exceptions, main guards, and internal dependency edges. Reproduce it with:
57+
58+
```bash
59+
python benchmarks/evaluate_structural_evidence.py
60+
```
61+
62+
Details: [Benchmark Notes](https://github.com/oceanusXXD/code2skill/blob/main/docs/benchmarks.md), [result JSON](https://github.com/oceanusXXD/code2skill/blob/main/benchmarks/results/structural-evidence-benchmark.json).
63+
4464
## Install
4565

4666
Requires Python 3.10 or newer.
@@ -241,6 +261,7 @@ For lower-level automation, use `create_scan_config(...)` with `scan_repository(
241261
- [Python API](https://github.com/oceanusXXD/code2skill/blob/main/docs/python-api.md)
242262
- [Output Layout](https://github.com/oceanusXXD/code2skill/blob/main/docs/output-layout.md)
243263
- [Algorithm Notes](https://github.com/oceanusXXD/code2skill/blob/main/docs/algorithm-notes.md)
264+
- [Benchmark Notes](https://github.com/oceanusXXD/code2skill/blob/main/docs/benchmarks.md)
244265
- [Release Guide](https://github.com/oceanusXXD/code2skill/blob/main/docs/release.md)
245266
- [Changelog](https://github.com/oceanusXXD/code2skill/blob/main/CHANGELOG.md)
246267

README.zh-CN.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,26 @@
4141
| 平台自动化 | DevEx 团队跨多个 Python 服务运行同一流程 | Python API 返回结构化结果和 readiness |
4242
| 开源贡献者 onboarding | 新贡献者改代码前需要项目实现规则 | 生成的 Skills 和 docs 说明仓库的工作契约 |
4343

44+
## 基准测试
45+
46+
`code2skill` 评测的是 LLM 调用前的结构证据抽取能力。这个 benchmark 用两个简单 baseline 对比 Skill 生成流水线使用的语义扫描器。
47+
48+
![Structural evidence benchmark](docs/assets/structural-evidence-benchmark.svg)
49+
50+
| 方法 | Gold evidence recall |
51+
|---|---:|
52+
| Path-only baseline | 0.048 |
53+
| AST symbols baseline | 0.357 |
54+
| code2skill semantic scanner | 1.000 |
55+
56+
Gold set 覆盖 route decorators、service calls、type references、data-flow edges、dynamic imports、raised exceptions、main guards 和 internal dependency edges。复现命令:
57+
58+
```bash
59+
python benchmarks/evaluate_structural_evidence.py
60+
```
61+
62+
详情见:[Benchmark Notes](https://github.com/oceanusXXD/code2skill/blob/main/docs/benchmarks.md)[result JSON](https://github.com/oceanusXXD/code2skill/blob/main/benchmarks/results/structural-evidence-benchmark.json)
63+
4464
## 安装
4565

4666
需要 Python 3.10 或更高版本。
@@ -241,6 +261,7 @@ print(readiness.ready, readiness.score)
241261
- [Python API](https://github.com/oceanusXXD/code2skill/blob/main/docs/python-api.md)
242262
- [Output Layout](https://github.com/oceanusXXD/code2skill/blob/main/docs/output-layout.md)
243263
- [Algorithm Notes](https://github.com/oceanusXXD/code2skill/blob/main/docs/algorithm-notes.md)
264+
- [Benchmark Notes](https://github.com/oceanusXXD/code2skill/blob/main/docs/benchmarks.md)
244265
- [Release Guide](https://github.com/oceanusXXD/code2skill/blob/main/docs/release.md)
245266
- [Changelog](https://github.com/oceanusXXD/code2skill/blob/main/CHANGELOG.md)
246267

0 commit comments

Comments
 (0)