Skip to content

Commit 8a8ade2

Browse files
gkorlandCopilot
andcommitted
Add Python import tracking to code graph
Migrated from FalkorDB/code-graph-backend PR #97. Original issue: FalkorDB/code-graph-backend#61 Resolves #535 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent c372a5e commit 8a8ade2

File tree

12 files changed

+530
-0
lines changed

12 files changed

+530
-0
lines changed

CI_OPTIMIZATION.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# CI Pipeline Optimization Analysis (Staging Branch)
2+
3+
## Current Workflows on Staging
4+
5+
The staging branch has 3 workflow files (identical to main):
6+
7+
| Workflow | File | Trigger | ~Duration |
8+
|---|---|---|---|
9+
| **Build** | `nextjs.yml` | All PRs + push to main | **~1 min** |
10+
| **Playwright Tests** | `playwright.yml` | PRs + push to main/staging | **~10 min** (x2 shards) |
11+
| **Release image** | `release-image.yml` | Tags + main push | release-only |
12+
13+
Additionally, **CodeQL** runs on staging pushes.
14+
15+
## Playwright Tests — The Bottleneck
16+
17+
This is the critical path. It runs 2 shards in parallel, each taking ~10 min. Measured from recent staging runs:
18+
19+
| Step | Shard 1 | Shard 2 | % of total |
20+
|---|---|---|---|
21+
| **Seed test data into FalkorDB** | **223s** | **220s** | **37%** |
22+
| **Run Playwright tests** | 264s | 262s | 44% |
23+
| **Install Playwright browsers** | 48s | 51s | 8% |
24+
| Install backend deps (`pip install`) | 28s | 31s | 5% |
25+
| Build frontend | 12s | 12s | 2% |
26+
| Install frontend deps (`npm ci`) | 8s | 8s | 1% |
27+
| Container init + setup | ~15s | ~15s | 3% |
28+
29+
**Total per shard: ~600s (10 min). Total billable: ~20 min.**
30+
31+
## Build Workflow — Wasted Work
32+
33+
The Build workflow (~64s total) installs backend dependencies but does nothing with them:
34+
35+
| Step | Duration |
36+
|---|---|
37+
| Install frontend deps | 7s |
38+
| Build frontend | 14s |
39+
| Lint frontend | <1s |
40+
| **Install backend deps (`pip install`)** | **35s** |
41+
42+
The backend install accounts for **55% of the Build workflow** and serves no purpose.
43+
44+
---
45+
46+
## Optimization Recommendations
47+
48+
### 1. Cache or pre-seed FalkorDB test data (saves **~3.5 min/shard = ~7 min total**)
49+
50+
`seed_test_data.py` clones 2 GitHub repos (GraphRAG-SDK, Flask) and runs full source analysis every run. This is the single biggest time sink at **37% of Playwright runtime**.
51+
52+
**Options:**
53+
- **Best**: Export the seeded graph as an RDB dump, commit it as a test fixture, and restore with `redis-cli`. Eliminates the 220s step entirely.
54+
- **Good**: Cache the cloned repos + analysis output with `actions/cache` keyed on the seed script hash + repo commit SHAs.
55+
- **Minimum**: Cache just the git clones to skip network time.
56+
57+
### 2. Cache Playwright browsers (saves **~50s/shard = ~1.5 min total**)
58+
59+
Browsers are installed from scratch every run (`npx playwright install --with-deps`). Add:
60+
61+
```yaml
62+
- name: Cache Playwright browsers
63+
id: playwright-cache
64+
uses: actions/cache@v4
65+
with:
66+
path: ~/.cache/ms-playwright
67+
key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
68+
69+
- name: Install Playwright Browsers
70+
if: steps.playwright-cache.outputs.cache-hit != 'true'
71+
run: npx playwright install --with-deps chromium
72+
73+
- name: Install Playwright system deps
74+
if: steps.playwright-cache.outputs.cache-hit == 'true'
75+
run: npx playwright install-deps chromium
76+
```
77+
78+
### 3. Switch `pip install` to `uv` (saves **~15-20s/shard**)
79+
80+
Both workflows use slow `pip install`. `uv sync` is 3-5x faster:
81+
82+
```yaml
83+
- name: Install uv
84+
uses: astral-sh/setup-uv@v5
85+
with:
86+
version: "latest"
87+
88+
- name: Install dependencies
89+
run: uv sync
90+
```
91+
92+
### 4. Remove unused backend install from Build workflow (saves **~35s**)
93+
94+
`nextjs.yml` installs backend deps but runs no backend tests or lint. Either:
95+
- **Remove** the `Setup Python` and `Install backend dependencies` steps entirely
96+
- **Or** add backend unit tests / pylint to justify the install
97+
98+
### 5. Add concurrency groups (saves **queued minutes**)
99+
100+
The Build workflow has no concurrency group. Rapid pushes queue redundant runs:
101+
102+
```yaml
103+
concurrency:
104+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
105+
cancel-in-progress: true
106+
```
107+
108+
The Playwright workflow also lacks a concurrency group.
109+
110+
### 6. Add npm cache (saves **~3-5s/shard**)
111+
112+
Neither workflow caches npm. Add to `setup-node`:
113+
114+
```yaml
115+
- uses: actions/setup-node@v4
116+
with:
117+
node-version: 24
118+
cache: 'npm'
119+
cache-dependency-path: |
120+
package-lock.json
121+
app/package-lock.json
122+
```
123+
124+
### 7. Docker build caching for releases (saves **~2-5 min** on releases)
125+
126+
No layer caching on the Docker build. Add:
127+
128+
```yaml
129+
- uses: docker/build-push-action@v5
130+
with:
131+
context: .
132+
file: ./Dockerfile
133+
push: true
134+
tags: ${{ env.TAGS }}
135+
cache-from: type=gha
136+
cache-to: type=gha,mode=max
137+
```
138+
139+
### 8. Deduplicate npm installs in Playwright workflow
140+
141+
The Playwright workflow runs `npm ci` twice — once for frontend (`./app`) and once for root (Playwright). These could be consolidated or at least cached.
142+
143+
---
144+
145+
## Summary
146+
147+
| # | Optimization | Time saved | Effort |
148+
|---|---|---|---|
149+
| 1 | Cache/pre-seed FalkorDB data | **~7 min** | Medium |
150+
| 2 | Cache Playwright browsers | **~1.5 min** | Low |
151+
| 3 | Switch to `uv` from `pip` | **~40s** | Low |
152+
| 4 | Remove unused backend install from Build | **~35s** | Trivial |
153+
| 5 | Add concurrency groups | Variable | Trivial |
154+
| 6 | Add npm cache | ~10s | Trivial |
155+
| 7 | Docker layer caching | ~2-5 min (releases) | Low |
156+
| 8 | Deduplicate npm installs | ~5s | Low |
157+
158+
**Total potential savings: ~9-10 min per CI run**, bringing Playwright from ~10 min/shard down to ~4-5 min/shard (dominated by the actual test execution).
159+
160+
The single biggest win is **pre-seeding FalkorDB data** — it alone accounts for 37% of the Playwright workflow runtime.

api/analyzers/analyzer.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,3 +149,32 @@ def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_
149149

150150
pass
151151

152+
@abstractmethod
153+
def add_file_imports(self, file: File) -> None:
154+
"""
155+
Add import statements to the file.
156+
157+
Args:
158+
file (File): The file to add imports to.
159+
"""
160+
161+
pass
162+
163+
@abstractmethod
164+
def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]:
165+
"""
166+
Resolve an import statement to entities.
167+
168+
Args:
169+
files (dict[Path, File]): All files in the project.
170+
lsp (SyncLanguageServer): The language server.
171+
file_path (Path): The path to the file containing the import.
172+
path (Path): The path to the project root.
173+
import_node (Node): The import statement node.
174+
175+
Returns:
176+
list[Entity]: List of resolved entities.
177+
"""
178+
179+
pass
180+

api/analyzers/java/analyzer.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,19 @@ def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_
127127
return self.resolve_method(files, lsp, file_path, path, symbol)
128128
else:
129129
raise ValueError(f"Unknown key {key}")
130+
131+
def add_file_imports(self, file: File) -> None:
132+
"""
133+
Extract and add import statements from the file.
134+
Java imports are not yet implemented.
135+
"""
136+
# TODO: Implement Java import tracking
137+
pass
138+
139+
def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]:
140+
"""
141+
Resolve an import statement to the entities it imports.
142+
Java imports are not yet implemented.
143+
"""
144+
# TODO: Implement Java import resolution
145+
return []

api/analyzers/python/analyzer.py

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,3 +122,95 @@ def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_
122122
return self.resolve_method(files, lsp, file_path, path, symbol)
123123
else:
124124
raise ValueError(f"Unknown key {key}")
125+
126+
def add_file_imports(self, file: File) -> None:
127+
"""
128+
Extract and add import statements from the file.
129+
130+
Supports:
131+
- import module
132+
- import module as alias
133+
- from module import name
134+
- from module import name1, name2
135+
- from module import name as alias
136+
"""
137+
try:
138+
import warnings
139+
with warnings.catch_warnings():
140+
warnings.simplefilter("ignore")
141+
# Query for both import types
142+
import_query = self.language.query("""
143+
(import_statement) @import
144+
(import_from_statement) @import_from
145+
""")
146+
147+
captures = import_query.captures(file.tree.root_node)
148+
149+
# Add all import statement nodes to the file
150+
if 'import' in captures:
151+
for import_node in captures['import']:
152+
file.add_import(import_node)
153+
154+
if 'import_from' in captures:
155+
for import_node in captures['import_from']:
156+
file.add_import(import_node)
157+
except Exception as e:
158+
logger.debug(f"Failed to extract imports from {file.path}: {e}")
159+
160+
def resolve_import(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, import_node: Node) -> list[Entity]:
161+
"""
162+
Resolve an import statement to the entities it imports.
163+
"""
164+
res = []
165+
166+
try:
167+
if import_node.type == 'import_statement':
168+
# Handle "import module" or "import module as alias"
169+
# Find all dotted_name and aliased_import nodes
170+
for child in import_node.children:
171+
if child.type == 'dotted_name':
172+
# Try to resolve the module/name
173+
identifier = child.children[0] if child.child_count > 0 else child
174+
resolved = self.resolve_type(files, lsp, file_path, path, identifier)
175+
res.extend(resolved)
176+
elif child.type == 'aliased_import':
177+
# Get the actual name from aliased import (before 'as')
178+
if child.child_count > 0:
179+
actual_name = child.children[0]
180+
if actual_name.type == 'dotted_name' and actual_name.child_count > 0:
181+
identifier = actual_name.children[0]
182+
else:
183+
identifier = actual_name
184+
resolved = self.resolve_type(files, lsp, file_path, path, identifier)
185+
res.extend(resolved)
186+
187+
elif import_node.type == 'import_from_statement':
188+
# Handle "from module import name1, name2"
189+
# Find the 'import' keyword to know where imported names start
190+
import_keyword_found = False
191+
for child in import_node.children:
192+
if child.type == 'import':
193+
import_keyword_found = True
194+
continue
195+
196+
# After 'import' keyword, dotted_name nodes are the imported names
197+
if import_keyword_found and child.type == 'dotted_name':
198+
# Try to resolve the imported name
199+
identifier = child.children[0] if child.child_count > 0 else child
200+
resolved = self.resolve_type(files, lsp, file_path, path, identifier)
201+
res.extend(resolved)
202+
elif import_keyword_found and child.type == 'aliased_import':
203+
# Handle "from module import name as alias"
204+
if child.child_count > 0:
205+
actual_name = child.children[0]
206+
if actual_name.type == 'dotted_name' and actual_name.child_count > 0:
207+
identifier = actual_name.children[0]
208+
else:
209+
identifier = actual_name
210+
resolved = self.resolve_type(files, lsp, file_path, path, identifier)
211+
res.extend(resolved)
212+
213+
except Exception as e:
214+
logger.debug(f"Failed to resolve import: {e}")
215+
216+
return res

api/analyzers/source_analyzer.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,10 @@ def first_pass(self, path: Path, files: list[Path], ignore: list[str], graph: Gr
114114
# Walk thought the AST
115115
graph.add_file(file)
116116
self.create_hierarchy(file, analyzer, graph)
117+
118+
# Extract import statements
119+
if not analyzer.is_dependency(str(file_path)):
120+
analyzer.add_file_imports(file)
117121

118122
def second_pass(self, graph: Graph, files: list[Path], path: Path) -> None:
119123
"""
@@ -148,6 +152,8 @@ def second_pass(self, graph: Graph, files: list[Path], path: Path) -> None:
148152
for i, file_path in enumerate(files):
149153
file = self.files[file_path]
150154
logging.info(f'Processing file ({i + 1}/{files_len}): {file_path}')
155+
156+
# Resolve entity symbols
151157
for _, entity in file.entities.items():
152158
entity.resolved_symbol(lambda key, symbol, fp=file_path: analyzers[fp.suffix].resolve_symbol(self.files, lsps[fp.suffix], fp, path, key, symbol))
153159
for key, symbols in entity.symbols.items():
@@ -167,6 +173,13 @@ def second_pass(self, graph: Graph, files: list[Path], path: Path) -> None:
167173
graph.connect_entities("RETURNS", entity.id, resolved_symbol.id)
168174
elif key == "parameters":
169175
graph.connect_entities("PARAMETERS", entity.id, resolved_symbol.id)
176+
177+
# Resolve file imports
178+
for import_node in file.imports:
179+
resolved_entities = analyzers[file_path.suffix].resolve_import(self.files, lsps[file_path.suffix], file_path, path, import_node)
180+
for resolved_entity in resolved_entities:
181+
file.add_resolved_import(resolved_entity)
182+
graph.connect_entities("IMPORTS", file.id, resolved_entity.id)
170183

171184
def analyze_files(self, files: list[Path], path: Path, graph: Graph) -> None:
172185
self.first_pass(path, files, [], graph)

api/entities/file.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,30 @@ def __init__(self, path: Path, tree: Tree) -> None:
2121
self.path = path
2222
self.tree = tree
2323
self.entities: dict[Node, Entity] = {}
24+
self.imports: list[Node] = []
25+
self.resolved_imports: set[Entity] = set()
2426

2527
def add_entity(self, entity: Entity):
2628
entity.parent = self
2729
self.entities[entity.node] = entity
30+
31+
def add_import(self, import_node: Node):
32+
"""
33+
Add an import statement node to track.
34+
35+
Args:
36+
import_node (Node): The import statement node.
37+
"""
38+
self.imports.append(import_node)
39+
40+
def add_resolved_import(self, resolved_entity: Entity):
41+
"""
42+
Add a resolved import entity.
43+
44+
Args:
45+
resolved_entity (Entity): The resolved entity that is imported.
46+
"""
47+
self.resolved_imports.add(resolved_entity)
2848

2949
def __str__(self) -> str:
3050
return f"path: {self.path}"

test-project/a.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#include <stdio.h>
2+
#include "/src/ff.h"
3+
4+
5+
/* Create an empty intset. */
6+
intset* intsetNew(void) {
7+
intset *is = zmalloc(sizeof(intset));
8+
is->encoding = intrev32ifbe(INTSET_ENC_INT16);
9+
is->length = 0;
10+
return is;
11+
}

0 commit comments

Comments
 (0)