oceanusXXD
diff --git a/‎CHANGELOG.md‎
Lines changed: 4 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.zh-CN.md‎
Lines changed: 2 additions & 1 deletion b/‎README.zh-CN.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/algorithm-notes.md‎
Lines changed: 50 additions & 0 deletions b/‎docs/algorithm-notes.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎src/code2skill/analyzers/skill_blueprint_builder.py‎
Lines changed: 5 additions & 1 deletion b/‎src/code2skill/analyzers/skill_blueprint_builder.py‎
Lines changed: 5 additions & 1 deletion
@@ -5,6 +5,10 @@ Detailed notes for each tagged release live under [`docs/releases/`](./docs/rele
 
 ## Unreleased
 
+- Added semantic Python extraction for call targets, dynamic imports, type references, raised exceptions, class attributes, and lightweight data-flow edges.
+- Improved internal import resolution for detailed `from ... import ...` records and dynamic imports.
+- Fed call/type/data-flow evidence into file prioritization, planning prompts, generated skeletons, and project summaries.
+- Added algorithm notes documenting the paper-backed ideas behind the scanner improvements.
 - Added `doctor` readiness checks for generated bundles, Skill plans, state snapshots, and adapted target files.
 - Added repository-specific `adoption-guide.md` output and updated README/docs around first adoption, CI refresh, and multi-tool publishing workflows.
 - Changed merge-style adapters to preserve hand-written content through a managed code2skill block.
 
@@ -14,7 +14,7 @@ Use it when a Python project needs coding assistants to follow the current modul
 
 ## What This Repository Can Do
 
-- Analyze a Python repository with AST parsing, import graph checks, config extraction, and file-role inference.
+- Analyze a Python repository with AST semantic extraction, import graph checks, call/type/data-flow evidence, config extraction, and file-role inference.
 - Write a `.code2skill/` bundle with a project summary, references, a Skill plan, generated Skills, a report, and incremental state.
 - Estimate model cost and affected Skills before generation.
 - Generate Skill Markdown from repository evidence using OpenAI Responses API, OpenAI-compatible Responses endpoints, Claude, or Qwen.
@@ -240,6 +240,7 @@ For lower-level automation, use `create_scan_config(...)` with `scan_repository(
 - [CI Guide](https://github.com/oceanusXXD/code2skill/blob/main/docs/ci.md)
 - [Python API](https://github.com/oceanusXXD/code2skill/blob/main/docs/python-api.md)
 - [Output Layout](https://github.com/oceanusXXD/code2skill/blob/main/docs/output-layout.md)
+- [Algorithm Notes](https://github.com/oceanusXXD/code2skill/blob/main/docs/algorithm-notes.md)
 - [Release Guide](https://github.com/oceanusXXD/code2skill/blob/main/docs/release.md)
 - [Changelog](https://github.com/oceanusXXD/code2skill/blob/main/CHANGELOG.md)
 
 
@@ -14,7 +14,7 @@
 
 ## 这个仓库可以做什么
 
-- 用 AST、import graph、配置抽取和文件角色推断分析 Python 仓库。
+- 用 AST 语义抽取、import graph、调用/类型/data-flow 证据、配置抽取和文件角色推断分析 Python 仓库。
 - 写出 `.code2skill/` bundle，包括项目概要、参考文档、Skill plan、生成的 Skills、执行报告和增量 state。
 - 在生成前估算模型成本和受影响 Skills。
 - 使用 OpenAI Responses API、OpenAI-compatible Responses endpoint、Claude 或 Qwen，从仓库证据生成 Skill Markdown。
@@ -240,6 +240,7 @@ print(readiness.ready, readiness.score)
 - [CI Guide](https://github.com/oceanusXXD/code2skill/blob/main/docs/ci.md)
 - [Python API](https://github.com/oceanusXXD/code2skill/blob/main/docs/python-api.md)
 - [Output Layout](https://github.com/oceanusXXD/code2skill/blob/main/docs/output-layout.md)
+- [Algorithm Notes](https://github.com/oceanusXXD/code2skill/blob/main/docs/algorithm-notes.md)
 - [Release Guide](https://github.com/oceanusXXD/code2skill/blob/main/docs/release.md)
 - [Changelog](https://github.com/oceanusXXD/code2skill/blob/main/CHANGELOG.md)
 
 
@@ -0,0 +1,50 @@
+# Algorithm Notes
+
+`code2skill` does not try to train a model over the repository. It builds a
+small structural graph and a compact AST skeleton, then gives the LLM grounded
+evidence for planning and Skill generation.
+
+## What Was Borrowed
+
+- AST path evidence: inspired by code2vec, which showed that paths through code
+  structure are stronger signals than plain token lists.
+- Program graph evidence: inspired by graph-based program representation work
+  and Code Property Graphs, which combine syntax and semantic edges instead of
+  treating code as isolated files.
+- Data-flow evidence: inspired by GraphCodeBERT, which uses data-flow structure
+  to connect variables and operations beyond lexical proximity.
+
+## Current Implementation
+
+- Python AST extraction records imports, exports, functions, classes, methods,
+  route decorators, model/schema signals, call targets, type references,
+  raised exceptions, dynamic imports, class attributes, and simple data-flow
+  edges such as `scope:target<-source`.
+- Import graph construction uses detailed `ImportInfo`, including `from ...
+  import ...` names and dynamic imports, so package-level imports resolve to
+  concrete internal files when possible.
+- File priority combines path heuristics with content evidence. Route, service,
+  model, main-guard, call-target, type-reference, and data-flow signals can
+  raise selection priority.
+- Planner prompts receive dependency, call, type, and flow evidence for core
+  modules. Generation prompts use the same skeleton lines when large files are
+  summarized instead of inlined.
+
+## Boundaries
+
+The extractor is deliberately conservative. It records shallow data-flow edges
+from assignments, loops, and context managers, but it does not attempt full
+interprocedural static analysis, control-flow reconstruction, type inference, or
+runtime import evaluation. Missing or ambiguous evidence should still be marked
+as uncertain by generated Skills.
+
+## References
+
+- Alon et al., code2vec: Learning Distributed Representations of Code:
+  https://arxiv.org/abs/1803.09473
+- Allamanis et al., Learning to Represent Programs with Graphs:
+  https://arxiv.org/abs/1711.00740
+- Yamaguchi et al., Modeling and Discovering Vulnerabilities with Code Property
+  Graphs: https://ieeexplore.ieee.org/document/6956581
+- Guo et al., GraphCodeBERT: Pre-training Code Representations with Data Flow:
+  https://arxiv.org/abs/2009.08366
@@ -201,7 +201,8 @@ def _build_important_apis(
                     )
                 )
             if summary.inferred_role == "service":
-                for function_name in summary.functions[:3]:
+                service_entries = [*summary.functions, *summary.methods]
+                for function_name in service_entries[:4]:
                     apis.append(
                         ApiSummary(
                             kind=summary.inferred_role,
@@ -305,6 +306,9 @@ def _core_module_sort_key(summary: SourceFileSummary) -> tuple[int, float, int,
         + len(summary.methods)
         + len(summary.routes) * 2
         + len(summary.internal_dependencies)
+        + min(len(summary.call_targets), 8)
+        + min(len(summary.type_references), 4)
+        + min(len(summary.data_flow_edges), 4)
     )
     return (
         role_priority,
Original file line number	Diff line number	Diff line change
`@@ -201,7 +201,8 @@ def _build_important_apis(`
`201`	`201`	`)`
`202`	`202`	`)`
`203`	`203`	`if summary.inferred_role == "service":`
`204`		`- for function_name in summary.functions[:3]:`
	`204`	`+ service_entries = [summary.functions, summary.methods]`
	`205`	`+ for function_name in service_entries[:4]:`
`205`	`206`	`apis.append(`
`206`	`207`	`ApiSummary(`
`207`	`208`	`kind=summary.inferred_role,`
`@@ -305,6 +306,9 @@ def _core_module_sort_key(summary: SourceFileSummary) -> tuple[int, float, int,`
`305`	`306`	`+ len(summary.methods)`
`306`	`307`	`+ len(summary.routes) * 2`
`307`	`308`	`+ len(summary.internal_dependencies)`
	`309`	`+ + min(len(summary.call_targets), 8)`
	`310`	`+ + min(len(summary.type_references), 4)`
	`311`	`+ + min(len(summary.data_flow_edges), 4)`
`308`	`312`	`)`
`309`	`313`	`return (`
`310`	`314`	`role_priority,`