This document describes the architecture of Tree-sitter Analyzer, including its plugin system, MCP integration, and core components.
┌─────────────────────────────────────────────────────────────────────────┐
│ User Interfaces │
├────────────────────┬────────────────────┬───────────────────────────────┤
│ CLI Interface │ MCP Server │ Python API │
│ (tree-sitter- │ (AI Assistant │ (Direct Import) │
│ analyzer) │ Integration) │ │
└────────┬───────────┴────────┬───────────┴────────────┬──────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Core Engine │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Analyzer │ │ Query │ │ Formatter │ │ Cache │ │
│ │ Engine │ │ Engine │ │ Registry │ │ Service │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬──────┘ │
│ │ │ │ │ │
└─────────┼─────────────────┼─────────────────┼─────────────────┼─────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Language Plugins │
├────────┬────────┬────────┬────────┬────────┬────────┬────────┬─────────┤
│ Java │ Python │ TS │ SQL │ HTML │ CSS │ Rust │ ... │
└────────┴────────┴────────┴────────┴────────┴────────┴────────┴─────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Tree-sitter Parsers │
└─────────────────────────────────────────────────────────────────────────┘
The analyzer engine is the central component that orchestrates code analysis.
Location: tree_sitter_analyzer/core/analyzer.py
Responsibilities:
- Parse source code using tree-sitter
- Extract code elements (classes, methods, functions, etc.)
- Calculate metrics (lines, complexity, etc.)
- Coordinate with language plugins
Key Classes:
CodeAnalyzer: Main analysis orchestratorAnalysisResult: Container for analysis resultsElementExtractor: Base class for element extraction
Each supported language has a dedicated plugin that understands its syntax and semantics.
Location: tree_sitter_analyzer/plugins/
Plugin Structure:
plugins/
├── base_plugin.py # Abstract base class
├── java_plugin.py # Java language support
├── python_plugin.py # Python language support
├── typescript_plugin.py # TypeScript/JavaScript support
├── sql_plugin.py # SQL support
├── html_plugin.py # HTML support
├── css_plugin.py # CSS support
├── rust_plugin.py # Rust support
├── go_plugin.py # Go support
├── kotlin_plugin.py # Kotlin support
└── ...
Plugin Responsibilities:
- Define language-specific tree-sitter queries
- Map AST nodes to code elements
- Provide language-specific formatting
- Handle language idioms and patterns
The formatter system provides flexible output formatting.
Location: tree_sitter_analyzer/formatters/
Components:
FormatterRegistry: Central registry for formattersBaseFormatter: Abstract formatter interface- Language-specific formatters (Java, SQL, HTML, etc.)
Output Formats:
| Format | Description |
|---|---|
full |
Comprehensive table with all details |
compact |
Abbreviated summary |
csv |
Machine-readable CSV |
json |
Structured JSON |
text |
Human-readable text |
html |
HTML-specific format |
The query engine enables targeted code element extraction.
Location: tree_sitter_analyzer/core/query.py
Features:
- Predefined query keys (methods, classes, functions, etc.)
- Custom tree-sitter query support
- Filter expressions for result refinement
The cache service optimizes repeated operations.
Location: tree_sitter_analyzer/services/cache_service.py
Capabilities:
- Analysis result caching
- File modification detection
- Memory-efficient storage
┌─────────────────────────────────────────────────────────────┐
│ MCP Server │
│ (tree_sitter_analyzer/mcp/server.py) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Tool Registry │ │
│ ├─────────────────────────────────────────────────────┤ │
│ │ • check_code_scale │ │
│ │ • analyze_code_structure │ │
│ │ • extract_code_section │ │
│ │ • query_code │ │
│ │ • list_files │ │
│ │ • search_content │ │
│ │ • find_and_grep │ │
│ │ • set_project_path │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Resource Registry │ │
│ ├─────────────────────────────────────────────────────┤ │
│ │ • code://file/{path} - File content access │ │
│ │ • code://stats/{type} - Project statistics │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Security Layer │ │
│ ├─────────────────────────────────────────────────────┤ │
│ │ • Project boundary validation │ │
│ │ • Path traversal prevention │ │
│ │ • Input sanitization │ │
│ │ • Error message filtering │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Location: tree_sitter_analyzer/mcp/tools/
| Tool | File | Purpose |
|---|---|---|
check_code_scale |
scale_tool.py |
File size and complexity assessment |
analyze_code_structure |
table_format_tool.py |
Structured code analysis |
extract_code_section |
partial_tool.py |
Line-range code extraction |
query_code |
query_tool.py |
Element-specific queries |
list_files |
list_files_tool.py |
File discovery (fd) |
search_content |
search_content_tool.py |
Content search (ripgrep) |
find_and_grep |
find_grep_tool.py |
Combined search |
set_project_path |
Various | Project boundary setting |
┌──────────────────┐ ┌──────────────────┐
│ fd │ │ ripgrep │
│ (file search) │ │ (content search) │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────┐
│ External Tool Wrapper │
│ (tree_sitter_analyzer/tools/external/) │
├─────────────────────────────────────────────┤
│ • Process execution │
│ • Output parsing │
│ • Error handling │
│ • Result normalization │
└─────────────────────────────────────────────┘
Location: tree_sitter_analyzer/models/
# Base element types
CodeElement # Generic code element
ClassElement # Class/interface/struct
MethodElement # Method/function
FieldElement # Field/property/variable
ImportElement # Import/include statement
PackageElement # Package/namespace/module
# Specialized elements
MarkupElement # HTML elements
StyleElement # CSS rules
SQLElement # Database objectsAnalysisResult:
- file_path: str
- language: str
- metrics: FileMetrics
- elements: List[CodeElement]
- errors: List[AnalysisError]
FileMetrics:
- lines_total: int
- lines_code: int
- lines_comment: int
- lines_blank: int
- complexity: ComplexityMetrics┌─────────────────────────────────────────┐
│ Security Validator │
├─────────────────────────────────────────┤
│ │
│ Request │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ Path Normalization │ │
│ │ (resolve symlinks, canonize) │ │
│ └──────────────┬─────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ Boundary Check │ │
│ │ (is path within project?) │ │
│ └──────────────┬─────────────────┘ │
│ │ │
│ ┌────┴────┐ │
│ │ Valid? │ │
│ └────┬────┘ │
│ Yes │ No │
│ │ │ │ │
│ ▼ │ ▼ │
│ ┌─────────┐ │ ┌─────────┐ │
│ │ Allow │ │ │ Reject │ │
│ └─────────┘ │ └─────────┘ │
│ │ │
└─────────────────┴───────────────────────┘
- Path traversal prevention (
../detection) - Null byte injection detection
- Unicode normalization attack prevention
- Maximum path length enforcement
- File extension validation
Error responses automatically remove:
- Passwords and tokens
- Full file system paths
- Stack traces (in production)
- Internal configuration details
┌─────────────────────────────────────────┐
│ Request │
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Cache Check │ │
│ └───────┬───────┘ │
│ │ │
│ Hit │ Miss │
│ │ │ │ │
│ ▼ │ ▼ │
│ ┌──────────┐ │ ┌──────────┐ │
│ │ Return │ │ │ Analyze │ │
│ │ Cached │ │ │ & Cache │ │
│ └──────────┘ │ └──────────┘ │
│ │ │
└────────────────┴────────────────────────┘
Five levels of token optimization:
| Level | Options | Token Reduction |
|---|---|---|
| 1 | count_only=true |
~70% |
| 2 | summary_only=true |
~80% |
| 3 | suppress_output=true + output_file |
~95% |
| 4 | group_by_file=true |
~60% |
| 5 | total_only=true |
~90% |
- Create plugin file:
plugins/new_language_plugin.py - Implement
BasePlugininterface - Define tree-sitter queries
- Register with
PluginRegistry - Create formatter if needed
- Add tests
See New Language Support Checklist for detailed guidance.
- Create tool class in
mcp/tools/ - Implement
BaseToolinterface - Define input/output schemas
- Register with MCP server
- Add documentation
- Add tests
- Create formatter in
formatters/ - Implement
BaseFormatterinterface - Register with
FormatterRegistry - Add CLI option if needed
- Add tests
tree_sitter_analyzer/
├── __init__.py
├── __main__.py # CLI entry point
├── core/
│ ├── analyzer.py # Main analyzer
│ ├── query.py # Query engine
│ └── language_detector.py # Language detection
├── models/
│ ├── elements.py # Code element models
│ ├── results.py # Analysis result models
│ └── metrics.py # Metric models
├── plugins/
│ ├── base_plugin.py # Plugin base class
│ ├── java_plugin.py
│ ├── python_plugin.py
│ └── ...
├── formatters/
│ ├── base_formatter.py # Formatter base class
│ ├── registry.py # Formatter registry
│ ├── java_formatter.py
│ └── ...
├── mcp/
│ ├── server.py # MCP server
│ ├── tools/ # MCP tool implementations
│ └── resources/ # MCP resource handlers
├── services/
│ ├── cache_service.py # Caching
│ ├── file_service.py # File operations
│ └── security_service.py # Security validation
└── tools/
└── external/ # External tool wrappers
├── fd_wrapper.py
└── ripgrep_wrapper.py
- Installation Guide - Setup instructions
- CLI Reference - Command-line usage
- MCP Tools Specification - API details
- Features Overview - Language support
- SMART Workflow - Usage methodology
- Contributing Guide - Development guidelines