Skip to content

Commit 9c4e8d6

Browse files
ctothaider-chat-bot
andcommitted
docs: Add comprehensive README for MCP Code Extractor package
Co-authored-by: aider (claude-opus-4-20250514) <aider@aider.chat>
1 parent 9fe7ca9 commit 9c4e8d6

File tree

1 file changed

+230
-0
lines changed

1 file changed

+230
-0
lines changed

README.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# MCP Code Extractor
2+
3+
A Model Context Protocol (MCP) server that provides precise code extraction tools using tree-sitter parsing. Extract functions, classes, and code snippets from 30+ programming languages without manual parsing.
4+
5+
## Why MCP Code Extractor?
6+
7+
When working with AI coding assistants like Claude, you often need to:
8+
- Extract specific functions or classes from large codebases
9+
- Get an overview of what's in a file without reading the entire thing
10+
- Retrieve precise code snippets with accurate line numbers
11+
- Avoid manual parsing and grep/sed/awk gymnastics
12+
13+
MCP Code Extractor solves these problems by providing structured, tree-sitter-powered code extraction tools directly within your AI assistant.
14+
15+
## Features
16+
17+
- **🎯 Precise Extraction**: Uses tree-sitter parsing for accurate code boundary detection
18+
- **🌍 30+ Languages**: Supports Python, JavaScript, TypeScript, Go, Rust, Java, C/C++, and many more
19+
- **📍 Line Numbers**: Every extraction includes precise line number information
20+
- **🔍 Code Discovery**: List all functions and classes in a file before extracting
21+
- **⚡ Fast & Lightweight**: Single-file implementation with minimal dependencies
22+
- **🤖 AI-Optimized**: Designed specifically for use with AI coding assistants
23+
24+
## Installation
25+
26+
### Quick Start with UV (Recommended)
27+
28+
```bash
29+
# Install UV if you haven't already
30+
curl -LsSf https://astral.sh/uv/install.sh | sh
31+
32+
# Clone this repository
33+
git clone https://github.com/yourusername/mcp-code-extractor
34+
cd mcp-code-extractor
35+
36+
# Run directly with UV (no installation needed!)
37+
uv run mcp_code_extractor.py
38+
```
39+
40+
### Traditional Installation
41+
42+
```bash
43+
pip install mcp[cli] tree-sitter-languages tree-sitter==0.21.3
44+
```
45+
46+
### Configure with Claude Desktop
47+
48+
Add to your Claude Desktop configuration:
49+
50+
```json
51+
{
52+
"mcpServers": {
53+
"code-extractor": {
54+
"command": "uv",
55+
"args": ["run", "/path/to/mcp_code_extractor.py"]
56+
}
57+
}
58+
}
59+
```
60+
61+
Or with traditional Python:
62+
63+
```json
64+
{
65+
"mcpServers": {
66+
"code-extractor": {
67+
"command": "python",
68+
"args": ["/path/to/mcp_code_extractor.py"]
69+
}
70+
}
71+
}
72+
```
73+
74+
## Available Tools
75+
76+
### 1. `get_symbols` - Discover Code Structure
77+
List all functions, classes, and other symbols in a file.
78+
79+
**Use this first** when exploring a new codebase!
80+
81+
```
82+
Returns:
83+
- name: Symbol name
84+
- type: function/class/method/etc
85+
- start_line/end_line: Line numbers
86+
- preview: First line of the symbol
87+
```
88+
89+
### 2. `get_function` - Extract Complete Functions
90+
Extract a complete function with all its code.
91+
92+
```
93+
Parameters:
94+
- file_path: Path to the source file
95+
- function_name: Name of the function to extract
96+
97+
Returns:
98+
- code: Complete function code
99+
- start_line/end_line: Precise boundaries
100+
- language: Detected language
101+
```
102+
103+
### 3. `get_class` - Extract Complete Classes
104+
Extract an entire class definition including all methods.
105+
106+
```
107+
Parameters:
108+
- file_path: Path to the source file
109+
- class_name: Name of the class to extract
110+
111+
Returns:
112+
- code: Complete class code
113+
- start_line/end_line: Precise boundaries
114+
- language: Detected language
115+
```
116+
117+
### 4. `get_lines` - Extract Specific Line Ranges
118+
Get exact line ranges when you know the line numbers.
119+
120+
```
121+
Parameters:
122+
- file_path: Path to the source file
123+
- start_line: Starting line (1-based)
124+
- end_line: Ending line (inclusive)
125+
126+
Returns:
127+
- code: Extracted lines
128+
- line numbers and metadata
129+
```
130+
131+
### 5. `get_signature` - Get Function Signatures
132+
Quickly get just the function signature without the body.
133+
134+
```
135+
Parameters:
136+
- file_path: Path to the source file
137+
- function_name: Name of the function
138+
139+
Returns:
140+
- signature: Function signature only
141+
- start_line: Where the function starts
142+
```
143+
144+
## Usage Examples
145+
146+
### Example 1: Exploring a Python File
147+
148+
```python
149+
# First, see what's in the file
150+
symbols = get_symbols("src/main.py")
151+
# Returns: List of all functions and classes with line numbers
152+
153+
# Extract a specific function
154+
result = get_function("src/main.py", "process_data")
155+
# Returns: Complete function code with line numbers
156+
157+
# Get just a function signature
158+
sig = get_signature("src/main.py", "process_data")
159+
# Returns: "def process_data(input_file: str, output_dir: Path) -> Dict[str, Any]:"
160+
```
161+
162+
### Example 2: Working with Classes
163+
164+
```python
165+
# Extract an entire class
166+
result = get_class("models/user.py", "User")
167+
# Returns: Complete User class with all methods
168+
169+
# Get specific lines (e.g., just the __init__ method)
170+
lines = get_lines("models/user.py", 10, 25)
171+
# Returns: Lines 10-25 of the file
172+
```
173+
174+
### Example 3: Multi-Language Support
175+
176+
```javascript
177+
// Works with JavaScript/TypeScript
178+
symbols = get_symbols("app.ts")
179+
func = get_function("app.ts", "handleRequest")
180+
```
181+
182+
```go
183+
// Works with Go
184+
symbols = get_symbols("main.go")
185+
method = get_function("main.go", "ServeHTTP")
186+
```
187+
188+
## Supported Languages
189+
190+
- Python, JavaScript, TypeScript, JSX/TSX
191+
- Go, Rust, C, C++, C#, Java
192+
- Ruby, PHP, Swift, Kotlin, Scala
193+
- Bash, PowerShell, SQL
194+
- Haskell, OCaml, Elixir, Clojure
195+
- And many more...
196+
197+
## Best Practices
198+
199+
1. **Always use `get_symbols` first** when exploring a new file
200+
2. **Use `get_function/get_class`** instead of reading entire files
201+
3. **Use `get_lines`** when you know exact line numbers
202+
4. **Use `get_signature`** for quick API exploration
203+
204+
## Why Not Just Use Read?
205+
206+
Traditional file reading tools require you to:
207+
- Read entire files (inefficient for large files)
208+
- Manually parse code to find functions/classes
209+
- Count lines manually for extraction
210+
- Deal with complex syntax and edge cases
211+
212+
MCP Code Extractor:
213+
- ✅ Extracts exactly what you need
214+
- ✅ Provides structured data with metadata
215+
- ✅ Handles complex syntax automatically
216+
- ✅ Works across 30+ languages consistently
217+
218+
## Contributing
219+
220+
Contributions are welcome! Please feel free to submit a Pull Request.
221+
222+
## License
223+
224+
MIT License - see LICENSE file for details.
225+
226+
## Acknowledgments
227+
228+
- Built on [tree-sitter](https://tree-sitter.github.io/) for robust parsing
229+
- Uses [tree-sitter-languages](https://github.com/grantjenks/py-tree-sitter-languages) for language support
230+
- Implements the [Model Context Protocol](https://modelcontextprotocol.io/) specification

0 commit comments

Comments
 (0)