Skip to content

Commit b2a2900

Browse files
ctothclaude
andcommitted
feat: Restructure as clean package with proper entry points
- Move MCP server to code_extractor/server.py package structure - Add __main__.py for module execution support - Update pyproject.toml with new console script entry point - Create CLAUDE.md with development and usage guidance - Support multiple execution methods: * uv run mcp-server-code-extractor (console script) * uv run python -m code_extractor (module) * uvx mcp-server-code-extractor (future) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 4ce490a commit b2a2900

File tree

4 files changed

+631
-2
lines changed

4 files changed

+631
-2
lines changed

CLAUDE.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Development Commands
6+
7+
**Testing:**
8+
```bash
9+
# Run all tests using pytest
10+
uv run pytest
11+
12+
# Run individual test files
13+
uv run pytest tests/test_extractor.py
14+
uv run pytest tests/test_languages.py
15+
uv run pytest tests/test_models.py
16+
17+
# Quick functional test of core MCP server
18+
python test_new_mcp.py
19+
```
20+
21+
**Running the MCP Server:**
22+
```bash
23+
# Run as package with UV (recommended)
24+
uv run mcp-server-code-extractor
25+
26+
# Run as Python module
27+
uv run python -m code_extractor
28+
29+
# Test with MCP Inspector
30+
npx @modelcontextprotocol/inspector uv run mcp-server-code-extractor
31+
32+
# For uvx usage (after publishing)
33+
uvx mcp-server-code-extractor
34+
```
35+
36+
**Development Dependencies:**
37+
```bash
38+
# Install development dependencies (testing, formatting, linting)
39+
uv add --dev pytest black flake8 mypy
40+
```
41+
42+
**Code Quality:**
43+
```bash
44+
# Format code with Black
45+
uv run black .
46+
47+
# Lint with flake8
48+
uv run flake8 .
49+
50+
# Type checking with mypy
51+
uv run mypy .
52+
```
53+
54+
## Architecture Overview
55+
56+
This is an MCP (Model Context Protocol) server that provides precise code extraction using tree-sitter parsing. The codebase has a **clean package structure**:
57+
58+
### Package Structure (`code_extractor/`)
59+
- **MCP Server** (`server.py`) - FastMCP server with 5 extraction tools
60+
- **Core Library** (`extractor.py`) - Query-driven extraction engine using tree-sitter
61+
- **Data Models** (`models.py`) - Rich symbol representations with hierarchical relationships
62+
- **Language Support** (`languages.py`) - Detection and mapping for 30+ programming languages
63+
- **Tree-sitter Queries** (`queries/`) - Language-specific syntax parsing patterns
64+
- **Entry Points** (`__main__.py`) - Module execution support
65+
66+
### Entry Points
67+
- **Console Script**: `mcp-server-code-extractor` - Direct execution via uvx/pip
68+
- **Module Execution**: `python -m code_extractor` - Run as Python module
69+
- **Package Import**: `from code_extractor import CodeExtractor` - Library usage
70+
71+
### Key Architectural Decisions
72+
73+
**Method vs Function Classification:**
74+
The core innovation is distinguishing methods (functions inside classes) from top-level functions using tree-sitter query patterns. This solves the context problem where traditional parsers can't determine if a function is a class method without understanding the containment hierarchy.
75+
76+
**Two-Layer Symbol Processing:**
77+
1. **Query capture phase**: Tree-sitter queries extract syntax nodes with semantic labels
78+
2. **Symbol building phase**: Raw captures are processed into rich `CodeSymbol` objects with hierarchical relationships
79+
80+
**Clean MCP Interface:**
81+
The server uses FastMCP for simple tool registration and exposes 5 core extraction tools with consistent function signatures and error handling.
82+
83+
## Working with Tree-Sitter Queries
84+
85+
Tree-sitter queries are stored in `code_extractor/queries/` and use the S-expression format:
86+
87+
```scheme
88+
; Extract methods inside classes
89+
(class_definition
90+
body: (block
91+
(function_definition
92+
name: (identifier) @method.name
93+
parameters: (parameters) @method.parameters) @method.definition))
94+
```
95+
96+
**Query Structure:**
97+
- Capture names use `category.type` format (e.g., `method.name`, `function.definition`)
98+
- The extractor groups captures by their definition nodes to build complete symbols
99+
- Parent-child relationships are determined by byte range containment
100+
101+
## Language Support
102+
103+
New languages require:
104+
1. Adding language mapping in `code_extractor/languages.py`
105+
2. Creating tree-sitter query file in `code_extractor/queries/`
106+
3. Testing with language-specific syntax patterns
107+
108+
The system automatically detects language from file extensions and falls back gracefully for unsupported languages.
109+
110+
## MCP Tools Interface
111+
112+
The server exposes 5 tools to AI assistants:
113+
114+
1. **`get_symbols`** - Primary entry point for code discovery (uses modern core library)
115+
2. **`get_function`** - Extract specific functions (legacy tree traversal)
116+
3. **`get_class`** - Extract specific classes (legacy tree traversal)
117+
4. **`get_lines`** - Extract line ranges by number
118+
5. **`get_signature`** - Get function signatures only
119+
120+
**Best Practice**: Always use `get_symbols` first for code exploration, then use specific extraction tools for detailed analysis.

code_extractor/__main__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/usr/bin/env python3
2+
"""
3+
MCP Code Extractor - Main entry point
4+
5+
Entry point for running the MCP server as a module:
6+
- python -m code_extractor
7+
- uvx mcp-server-code-extractor
8+
"""
9+
10+
from .server import main
11+
12+
if __name__ == "__main__":
13+
main()

0 commit comments

Comments
 (0)