Skip to content

Commit 6681259

Browse files
DvirDukhanCopilot
andcommitted
perf(analyzers): memoise compiled tree-sitter queries
AbstractAnalyzer._captures was recompiling its query string on every call. cProfile on pytest-dev/pytest-6202 (204 files) showed tree_sitter.Language.query consuming 3.03s of the 6.36s first_pass — ~48% of analyzer time spent rebuilding queries that never change. Cache them on the analyzer instance, keyed by pattern string. Also switches from the deprecated language.query() to the Query(language, pattern) constructor. Wall-time on pytest-6202 (CODE_GRAPH_PY_RESOLVER=tree_sitter): before: 6.9s after: 3.7s Benefits every tree-sitter analyzer (Python, JavaScript, Kotlin), not just the new static resolver. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 60ef625 commit 6681259

1 file changed

Lines changed: 12 additions & 3 deletions

File tree

api/analyzers/analyzer.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from pathlib import Path
22
from typing import Optional
33

4-
from tree_sitter import Language, Node, Parser, Point, QueryCursor
4+
from tree_sitter import Language, Node, Parser, Point, Query, QueryCursor
55
from api.entities.entity import Entity
66
from api.entities.file import File
77
from abc import ABC, abstractmethod
@@ -11,11 +11,20 @@ class AbstractAnalyzer(ABC):
1111
def __init__(self, language: Language) -> None:
1212
self.language = language
1313
self.parser = Parser(language)
14+
# Memoise compiled queries; tree-sitter query compilation is ~370us
15+
# each and adds up to seconds on large repos.
16+
self._query_cache: dict[str, Query] = {}
17+
18+
def _get_query(self, pattern: str) -> Query:
19+
q = self._query_cache.get(pattern)
20+
if q is None:
21+
q = Query(self.language, pattern)
22+
self._query_cache[pattern] = q
23+
return q
1424

1525
def _captures(self, pattern: str, node: Node) -> dict:
1626
"""Run a tree-sitter query and return captures dict."""
17-
query = self.language.query(pattern)
18-
cursor = QueryCursor(query)
27+
cursor = QueryCursor(self._get_query(pattern))
1928
return cursor.captures(node)
2029

2130
def find_parent(self, node: Node, parent_types: list) -> Node:

0 commit comments

Comments
 (0)