Skip to content

Commit 4e0e953

Browse files
committed
Improve semantic evidence extraction and benchmark
1 parent 9060832 commit 4e0e953

18 files changed

Lines changed: 562 additions & 62 deletions

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,11 @@ Use it when a Python project needs coding assistants to follow the current modul
4949

5050
| Method | Gold evidence recall |
5151
|---|---:|
52-
| Path-only baseline | 0.048 |
53-
| AST symbols baseline | 0.357 |
52+
| Path-only baseline | 0.044 |
53+
| AST symbols baseline | 0.356 |
5454
| code2skill semantic scanner | 1.000 |
5555

56-
The gold set covers route decorators, service calls, type references, data-flow edges, dynamic imports, raised exceptions, main guards, and internal dependency edges. Reproduce it with:
56+
The gold set covers route decorators, service calls, type references, data-flow edges, dynamic imports, re-exported symbol dependencies, raised exceptions, main guards, and internal dependency edges. Reproduce it with:
5757

5858
```bash
5959
python benchmarks/evaluate_structural_evidence.py
@@ -179,8 +179,8 @@ The default artifact directory is `.code2skill/`.
179179
| Path | Purpose |
180180
|---|---|
181181
| `adoption-guide.md` | Repository-specific adoption checklist and next workflow |
182-
| `project-summary.md` | Human-readable repository summary |
183-
| `skill-blueprint.json` | Structural repository blueprint |
182+
| `project-summary.md` | Human-readable repository summary with evidence coverage and import graph signals |
183+
| `skill-blueprint.json` | Structural repository blueprint with evidence counts and dependency graph stats |
184184
| `skill-plan.json` | LLM-planned Skill inventory |
185185
| `references/*.md` | Architecture, style, workflow, and API references |
186186
| `skills/index.md` | Generated Skill index |

README.zh-CN.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,11 @@
4949

5050
| 方法 | Gold evidence recall |
5151
|---|---:|
52-
| Path-only baseline | 0.048 |
53-
| AST symbols baseline | 0.357 |
52+
| Path-only baseline | 0.044 |
53+
| AST symbols baseline | 0.356 |
5454
| code2skill semantic scanner | 1.000 |
5555

56-
Gold set 覆盖 route decorators、service calls、type references、data-flow edges、dynamic imports、raised exceptions、main guards 和 internal dependency edges。复现命令:
56+
Gold set 覆盖 route decorators、service calls、type references、data-flow edges、dynamic imports、re-exported symbol dependencies、raised exceptions、main guards 和 internal dependency edges。复现命令:
5757

5858
```bash
5959
python benchmarks/evaluate_structural_evidence.py
@@ -179,8 +179,8 @@ code2skill scan .
179179
| 路径 | 用途 |
180180
|---|---|
181181
| `adoption-guide.md` | 仓库级采用 checklist 和下一步工作流 |
182-
| `project-summary.md` | 面向人的仓库概要 |
183-
| `skill-blueprint.json` | 结构化仓库蓝图 |
182+
| `project-summary.md` | 面向人的仓库概要,包含 evidence coverage 和 import graph 信号 |
183+
| `skill-blueprint.json` | 结构化仓库蓝图,包含证据计数和依赖图统计 |
184184
| `skill-plan.json` | 模型规划出的 Skill 清单 |
185185
| `references/*.md` | 架构、风格、工作流和 API 参考 |
186186
| `skills/index.md` | 生成的 Skill 索引 |

benchmarks/evaluate_structural_evidence.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ def run_benchmark() -> dict[str, object]:
8383
"baselines": [
8484
"path-only: file path and suffix role inference",
8585
"ast-symbols: standard AST imports, classes, functions, and methods",
86-
"code2skill-semantic: AST symbols plus routes, calls, type references, data-flow edges, dynamic imports, raised exceptions, and internal dependency resolution",
86+
"code2skill-semantic: AST symbols plus routes, calls, type references, data-flow edges, dynamic imports, raised exceptions, re-exported symbols, and internal dependency resolution",
8787
],
8888
"scope": (
8989
"This benchmark measures repository evidence extraction before any LLM call. "
@@ -236,7 +236,7 @@ def render_svg(report: dict[str, object]) -> str:
236236
f'<text x="{left + bar_width + 10:.1f}" y="{y + 23}" font-family="Arial, sans-serif" font-size="13" fill="#111827">{value:.3f} ({result["gold_hits"]}/{result["gold_total"]})</text>'
237237
)
238238
lines.append(
239-
'<text x="36" y="314" font-family="Arial, sans-serif" font-size="12" fill="#6b7280">code2skill captures routes, calls, type references, data-flow, dynamic imports, exceptions, and internal dependency edges.</text>'
239+
'<text x="36" y="314" font-family="Arial, sans-serif" font-size="12" fill="#6b7280">code2skill captures routes, calls, type references, data-flow, dynamic imports, re-exported symbols, exceptions, and dependency edges.</text>'
240240
)
241241
lines.append("</svg>")
242242
return "\n".join(lines)
@@ -246,7 +246,8 @@ def gold_facts() -> list[str]:
246246
return [
247247
"role:src/shop/main.py:entrypoint",
248248
"role:src/shop/api/users.py:route",
249-
"import:src/shop/api/users.py:shop.core.ops",
249+
"import:src/shop/__init__.py:shop.core.ops",
250+
"import:src/shop/api/users.py:shop",
250251
"class:src/shop/core/ops.py:UserService",
251252
"class:src/shop/domain/accounts.py:AccountCreate",
252253
"class:src/shop/domain/accounts.py:AccountRecord",
@@ -259,6 +260,8 @@ def gold_facts() -> list[str]:
259260
"type:src/shop/core/ops.py:AccountRecord",
260261
"model:src/shop/domain/accounts.py:AccountCreate",
261262
"model:src/shop/domain/accounts.py:AccountRecord",
263+
"dependency:src/shop/__init__.py->src/shop/core/ops.py",
264+
"dependency:src/shop/api/users.py->src/shop/__init__.py",
262265
"dependency:src/shop/api/users.py->src/shop/core/ops.py",
263266
"dependency:src/shop/api/users.py->src/shop/domain/accounts.py",
264267
"dependency:src/shop/core/ops.py->src/shop/domain/accounts.py",
@@ -291,6 +294,9 @@ def gold_facts() -> list[str]:
291294

292295
def write_fixture_repo(repo_path: Path) -> None:
293296
files = {
297+
"src/shop/__init__.py": """
298+
from shop.core.ops import UserService
299+
""",
294300
"src/shop/main.py": """
295301
from fastapi import FastAPI
296302
from shop.api.users import router
@@ -300,7 +306,7 @@ def write_fixture_repo(repo_path: Path) -> None:
300306
""",
301307
"src/shop/api/users.py": """
302308
from fastapi import APIRouter
303-
from shop.core.ops import UserService
309+
from shop import UserService
304310
from shop.domain.accounts import AccountCreate
305311
306312
router = APIRouter()

benchmarks/results/structural-evidence-benchmark.json

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
{
22
"name": "structural-evidence-extraction",
3-
"generated_at": "2026-06-06T15:49:31.523360+00:00",
4-
"gold_total": 42,
3+
"generated_at": "2026-06-06T16:13:09.683771+00:00",
4+
"gold_total": 45,
55
"results": [
66
{
77
"method": "path-only",
8-
"facts_found": 10,
8+
"facts_found": 11,
99
"gold_hits": 2,
10-
"gold_total": 42,
11-
"recall": 0.047619047619047616,
10+
"gold_total": 45,
11+
"recall": 0.044444444444444446,
1212
"hit_facts": [
1313
"role:src/shop/api/users.py:route",
1414
"role:src/shop/main.py:entrypoint"
@@ -31,6 +31,8 @@
3131
"data_flow:src/tool/runner.py:run:state<-load_state",
3232
"dependency:src/app/bootstrap.py->src/app/runtime/loader.py",
3333
"dependency:src/app/runtime/loader.py->src/app/plugins/audit.py",
34+
"dependency:src/shop/__init__.py->src/shop/core/ops.py",
35+
"dependency:src/shop/api/users.py->src/shop/__init__.py",
3436
"dependency:src/shop/api/users.py->src/shop/core/ops.py",
3537
"dependency:src/shop/api/users.py->src/shop/domain/accounts.py",
3638
"dependency:src/shop/core/ops.py->src/shop/domain/accounts.py",
@@ -43,7 +45,8 @@
4345
"function:src/tool/runner.py:run",
4446
"function:src/tool/state_store.py:load_state",
4547
"import:src/app/runtime/loader.py:importlib",
46-
"import:src/shop/api/users.py:shop.core.ops",
48+
"import:src/shop/__init__.py:shop.core.ops",
49+
"import:src/shop/api/users.py:shop",
4750
"main_guard:src/tool/runner.py",
4851
"method:src/app/plugins/audit.py:AuditPlugin.record",
4952
"method:src/shop/core/ops.py:UserService.create",
@@ -58,10 +61,10 @@
5861
},
5962
{
6063
"method": "ast-symbols",
61-
"facts_found": 25,
62-
"gold_hits": 15,
63-
"gold_total": 42,
64-
"recall": 0.35714285714285715,
64+
"facts_found": 26,
65+
"gold_hits": 16,
66+
"gold_total": 45,
67+
"recall": 0.35555555555555557,
6568
"hit_facts": [
6669
"class:src/app/plugins/audit.py:AuditPlugin",
6770
"class:src/shop/core/ops.py:UserService",
@@ -74,7 +77,8 @@
7477
"function:src/tool/runner.py:run",
7578
"function:src/tool/state_store.py:load_state",
7679
"import:src/app/runtime/loader.py:importlib",
77-
"import:src/shop/api/users.py:shop.core.ops",
80+
"import:src/shop/__init__.py:shop.core.ops",
81+
"import:src/shop/api/users.py:shop",
7882
"method:src/app/plugins/audit.py:AuditPlugin.record",
7983
"method:src/shop/core/ops.py:UserService.create",
8084
"method:src/tool/actions.py:ReleaseService.publish"
@@ -92,6 +96,8 @@
9296
"data_flow:src/tool/runner.py:run:state<-load_state",
9397
"dependency:src/app/bootstrap.py->src/app/runtime/loader.py",
9498
"dependency:src/app/runtime/loader.py->src/app/plugins/audit.py",
99+
"dependency:src/shop/__init__.py->src/shop/core/ops.py",
100+
"dependency:src/shop/api/users.py->src/shop/__init__.py",
95101
"dependency:src/shop/api/users.py->src/shop/core/ops.py",
96102
"dependency:src/shop/api/users.py->src/shop/domain/accounts.py",
97103
"dependency:src/shop/core/ops.py->src/shop/domain/accounts.py",
@@ -111,9 +117,9 @@
111117
},
112118
{
113119
"method": "code2skill-semantic",
114-
"facts_found": 85,
115-
"gold_hits": 42,
116-
"gold_total": 42,
120+
"facts_found": 91,
121+
"gold_hits": 45,
122+
"gold_total": 45,
117123
"recall": 1.0,
118124
"hit_facts": [
119125
"call:src/app/bootstrap.py:load_plugin",
@@ -133,6 +139,8 @@
133139
"data_flow:src/tool/runner.py:run:state<-load_state",
134140
"dependency:src/app/bootstrap.py->src/app/runtime/loader.py",
135141
"dependency:src/app/runtime/loader.py->src/app/plugins/audit.py",
142+
"dependency:src/shop/__init__.py->src/shop/core/ops.py",
143+
"dependency:src/shop/api/users.py->src/shop/__init__.py",
136144
"dependency:src/shop/api/users.py->src/shop/core/ops.py",
137145
"dependency:src/shop/api/users.py->src/shop/domain/accounts.py",
138146
"dependency:src/shop/core/ops.py->src/shop/domain/accounts.py",
@@ -145,7 +153,8 @@
145153
"function:src/tool/runner.py:run",
146154
"function:src/tool/state_store.py:load_state",
147155
"import:src/app/runtime/loader.py:importlib",
148-
"import:src/shop/api/users.py:shop.core.ops",
156+
"import:src/shop/__init__.py:shop.core.ops",
157+
"import:src/shop/api/users.py:shop",
149158
"main_guard:src/tool/runner.py",
150159
"method:src/app/plugins/audit.py:AuditPlugin.record",
151160
"method:src/shop/core/ops.py:UserService.create",
@@ -165,7 +174,7 @@
165174
"baselines": [
166175
"path-only: file path and suffix role inference",
167176
"ast-symbols: standard AST imports, classes, functions, and methods",
168-
"code2skill-semantic: AST symbols plus routes, calls, type references, data-flow edges, dynamic imports, raised exceptions, and internal dependency resolution"
177+
"code2skill-semantic: AST symbols plus routes, calls, type references, data-flow edges, dynamic imports, raised exceptions, re-exported symbols, and internal dependency resolution"
169178
],
170179
"scope": "This benchmark measures repository evidence extraction before any LLM call. It does not claim end-to-end SWE-bench issue resolution."
171180
}

docs/algorithm-notes.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,16 @@ evidence for planning and Skill generation.
2323
- Import graph construction uses detailed `ImportInfo`, including `from ...
2424
import ...` names and dynamic imports, so package-level imports resolve to
2525
concrete internal files when possible.
26+
- Symbol-aware dependency resolution maps extracted call targets, instantiated
27+
classes, type references, decorators, and raised exceptions back to internal
28+
files when they match an imported alias, a package re-export, or a unique
29+
repository symbol.
2630
- File priority combines path heuristics with content evidence. Route, service,
2731
model, main-guard, call-target, type-reference, and data-flow signals can
2832
raise selection priority.
33+
- Evidence coverage is summarized in the blueprint and project summary so users
34+
can see how many source files, symbols, routes, calls, types, flows, dynamic
35+
imports, exceptions, and dependency edges were captured.
2936
- Planner prompts receive dependency, call, type, and flow evidence for core
3037
modules. Generation prompts use the same skeleton lines when large files are
3138
summarized instead of inlined.
@@ -36,7 +43,8 @@ The extractor is deliberately conservative. It records shallow data-flow edges
3643
from assignments, loops, and context managers, but it does not attempt full
3744
interprocedural static analysis, control-flow reconstruction, type inference, or
3845
runtime import evaluation. Missing or ambiguous evidence should still be marked
39-
as uncertain by generated Skills.
46+
as uncertain by generated Skills. Plain symbol references are linked only when
47+
the symbol is unique in the repository or tied to an import alias/re-export.
4048

4149
## References
4250

docs/assets/structural-evidence-benchmark.svg

Lines changed: 6 additions & 6 deletions
Loading

docs/benchmarks.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ to the context-selection and evidence-extraction layer that `code2skill` owns.
1313
https://arxiv.org/abs/2310.06770
1414
- RepoBench evaluates repository-level context for code completion:
1515
https://arxiv.org/abs/2306.03091
16+
- CrossCodeEval evaluates cross-file code completion across multiple languages:
17+
https://arxiv.org/abs/2310.11248
1618
- CodeSearchNet evaluates semantic code search and retrieval quality:
1719
https://arxiv.org/abs/1909.09436
1820

@@ -33,9 +35,10 @@ Outputs:
3335
- `benchmarks/results/structural-evidence-benchmark.json`
3436
- `docs/assets/structural-evidence-benchmark.svg`
3537

36-
The fixture repository contains Python route, service, schema, dynamic plugin,
37-
runtime loader, main-guard, state, and exception-handling examples. The gold set
38-
contains 42 structural facts that are useful for writing grounded Skills:
38+
The fixture repository contains Python route, service, schema, package
39+
re-export, dynamic plugin, runtime loader, main-guard, state, and
40+
exception-handling examples. The gold set contains 45 structural facts that are
41+
useful for writing grounded Skills:
3942

4043
- file roles
4144
- imports and internal dependency edges
@@ -44,6 +47,7 @@ contains 42 structural facts that are useful for writing grounded Skills:
4447
- service calls and call chains
4548
- type references
4649
- model/schema signals
50+
- re-exported symbol dependencies
4751
- dynamic imports
4852
- data-flow edges
4953
- raised exceptions
@@ -65,9 +69,9 @@ dependency edges.
6569

6670
| Method | Gold hits | Gold total | Recall |
6771
|---|---:|---:|---:|
68-
| path-only | 2 | 42 | 0.048 |
69-
| ast-symbols | 15 | 42 | 0.357 |
70-
| code2skill-semantic | 42 | 42 | 1.000 |
72+
| path-only | 2 | 45 | 0.044 |
73+
| ast-symbols | 16 | 45 | 0.356 |
74+
| code2skill-semantic | 45 | 45 | 1.000 |
7175

7276
![Structural evidence benchmark](assets/structural-evidence-benchmark.svg)
7377

docs/getting-started.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ Expected files include:
3232
- `.code2skill/report.json`
3333
- `.code2skill/state/analysis-state.json`
3434

35+
Open `.code2skill/project-summary.md` first. Its evidence coverage section shows
36+
whether the scan captured routes, calls, type references, data-flow edges,
37+
dynamic imports, exceptions, and internal dependency edges before any LLM call.
38+
3539
## 3. Preview Cost And Impact
3640

3741
```bash

docs/output-layout.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ This bundle separates generated Skill files from intermediate artifacts used for
3434
### Review And Diagnostic Artifacts
3535

3636
- `adoption-guide.md`: repository-specific adoption checklist and recommended next workflow.
37-
- `project-summary.md`: human-readable repository overview.
38-
- `skill-blueprint.json`: structural analysis output from Phase 1.
37+
- `project-summary.md`: human-readable repository overview, including evidence coverage and import graph signals.
38+
- `skill-blueprint.json`: structural analysis output from Phase 1, including extracted evidence counts and dependency graph stats.
3939
- `skill-plan.json`: planned Skill inventory.
4040
- `references/*.md`: supporting architecture, style, workflow, and API references.
4141
- `report.json`: execution metrics, mode decisions, cost estimates, affected files, affected Skills, and artifact lists.

0 commit comments

Comments
 (0)