|
| 1 | +# MCP Code Extractor |
| 2 | + |
| 3 | +A Model Context Protocol (MCP) server that provides precise code extraction tools using tree-sitter parsing. Extract functions, classes, and code snippets from 30+ programming languages without manual parsing. |
| 4 | + |
| 5 | +## Why MCP Code Extractor? |
| 6 | + |
| 7 | +When working with AI coding assistants like Claude, you often need to: |
| 8 | +- Extract specific functions or classes from large codebases |
| 9 | +- Get an overview of what's in a file without reading the entire thing |
| 10 | +- Retrieve precise code snippets with accurate line numbers |
| 11 | +- Avoid manual parsing and grep/sed/awk gymnastics |
| 12 | + |
| 13 | +MCP Code Extractor solves these problems by providing structured, tree-sitter-powered code extraction tools directly within your AI assistant. |
| 14 | + |
| 15 | +## Features |
| 16 | + |
| 17 | +- **🎯 Precise Extraction**: Uses tree-sitter parsing for accurate code boundary detection |
| 18 | +- **🌍 30+ Languages**: Supports Python, JavaScript, TypeScript, Go, Rust, Java, C/C++, and many more |
| 19 | +- **📍 Line Numbers**: Every extraction includes precise line number information |
| 20 | +- **🔍 Code Discovery**: List all functions and classes in a file before extracting |
| 21 | +- **⚡ Fast & Lightweight**: Single-file implementation with minimal dependencies |
| 22 | +- **🤖 AI-Optimized**: Designed specifically for use with AI coding assistants |
| 23 | + |
| 24 | +## Installation |
| 25 | + |
| 26 | +### Quick Start with UV (Recommended) |
| 27 | + |
| 28 | +```bash |
| 29 | +# Install UV if you haven't already |
| 30 | +curl -LsSf https://astral.sh/uv/install.sh | sh |
| 31 | + |
| 32 | +# Clone this repository |
| 33 | +git clone https://github.com/yourusername/mcp-code-extractor |
| 34 | +cd mcp-code-extractor |
| 35 | + |
| 36 | +# Run directly with UV (no installation needed!) |
| 37 | +uv run mcp_code_extractor.py |
| 38 | +``` |
| 39 | + |
| 40 | +### Traditional Installation |
| 41 | + |
| 42 | +```bash |
| 43 | +pip install mcp[cli] tree-sitter-languages tree-sitter==0.21.3 |
| 44 | +``` |
| 45 | + |
| 46 | +### Configure with Claude Desktop |
| 47 | + |
| 48 | +Add to your Claude Desktop configuration: |
| 49 | + |
| 50 | +```json |
| 51 | +{ |
| 52 | + "mcpServers": { |
| 53 | + "code-extractor": { |
| 54 | + "command": "uv", |
| 55 | + "args": ["run", "/path/to/mcp_code_extractor.py"] |
| 56 | + } |
| 57 | + } |
| 58 | +} |
| 59 | +``` |
| 60 | + |
| 61 | +Or with traditional Python: |
| 62 | + |
| 63 | +```json |
| 64 | +{ |
| 65 | + "mcpServers": { |
| 66 | + "code-extractor": { |
| 67 | + "command": "python", |
| 68 | + "args": ["/path/to/mcp_code_extractor.py"] |
| 69 | + } |
| 70 | + } |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +## Available Tools |
| 75 | + |
| 76 | +### 1. `get_symbols` - Discover Code Structure |
| 77 | +List all functions, classes, and other symbols in a file. |
| 78 | + |
| 79 | +**Use this first** when exploring a new codebase! |
| 80 | + |
| 81 | +``` |
| 82 | +Returns: |
| 83 | +- name: Symbol name |
| 84 | +- type: function/class/method/etc |
| 85 | +- start_line/end_line: Line numbers |
| 86 | +- preview: First line of the symbol |
| 87 | +``` |
| 88 | + |
| 89 | +### 2. `get_function` - Extract Complete Functions |
| 90 | +Extract a complete function with all its code. |
| 91 | + |
| 92 | +``` |
| 93 | +Parameters: |
| 94 | +- file_path: Path to the source file |
| 95 | +- function_name: Name of the function to extract |
| 96 | +
|
| 97 | +Returns: |
| 98 | +- code: Complete function code |
| 99 | +- start_line/end_line: Precise boundaries |
| 100 | +- language: Detected language |
| 101 | +``` |
| 102 | + |
| 103 | +### 3. `get_class` - Extract Complete Classes |
| 104 | +Extract an entire class definition including all methods. |
| 105 | + |
| 106 | +``` |
| 107 | +Parameters: |
| 108 | +- file_path: Path to the source file |
| 109 | +- class_name: Name of the class to extract |
| 110 | +
|
| 111 | +Returns: |
| 112 | +- code: Complete class code |
| 113 | +- start_line/end_line: Precise boundaries |
| 114 | +- language: Detected language |
| 115 | +``` |
| 116 | + |
| 117 | +### 4. `get_lines` - Extract Specific Line Ranges |
| 118 | +Get exact line ranges when you know the line numbers. |
| 119 | + |
| 120 | +``` |
| 121 | +Parameters: |
| 122 | +- file_path: Path to the source file |
| 123 | +- start_line: Starting line (1-based) |
| 124 | +- end_line: Ending line (inclusive) |
| 125 | +
|
| 126 | +Returns: |
| 127 | +- code: Extracted lines |
| 128 | +- line numbers and metadata |
| 129 | +``` |
| 130 | + |
| 131 | +### 5. `get_signature` - Get Function Signatures |
| 132 | +Quickly get just the function signature without the body. |
| 133 | + |
| 134 | +``` |
| 135 | +Parameters: |
| 136 | +- file_path: Path to the source file |
| 137 | +- function_name: Name of the function |
| 138 | +
|
| 139 | +Returns: |
| 140 | +- signature: Function signature only |
| 141 | +- start_line: Where the function starts |
| 142 | +``` |
| 143 | + |
| 144 | +## Usage Examples |
| 145 | + |
| 146 | +### Example 1: Exploring a Python File |
| 147 | + |
| 148 | +```python |
| 149 | +# First, see what's in the file |
| 150 | +symbols = get_symbols("src/main.py") |
| 151 | +# Returns: List of all functions and classes with line numbers |
| 152 | + |
| 153 | +# Extract a specific function |
| 154 | +result = get_function("src/main.py", "process_data") |
| 155 | +# Returns: Complete function code with line numbers |
| 156 | + |
| 157 | +# Get just a function signature |
| 158 | +sig = get_signature("src/main.py", "process_data") |
| 159 | +# Returns: "def process_data(input_file: str, output_dir: Path) -> Dict[str, Any]:" |
| 160 | +``` |
| 161 | + |
| 162 | +### Example 2: Working with Classes |
| 163 | + |
| 164 | +```python |
| 165 | +# Extract an entire class |
| 166 | +result = get_class("models/user.py", "User") |
| 167 | +# Returns: Complete User class with all methods |
| 168 | + |
| 169 | +# Get specific lines (e.g., just the __init__ method) |
| 170 | +lines = get_lines("models/user.py", 10, 25) |
| 171 | +# Returns: Lines 10-25 of the file |
| 172 | +``` |
| 173 | + |
| 174 | +### Example 3: Multi-Language Support |
| 175 | + |
| 176 | +```javascript |
| 177 | +// Works with JavaScript/TypeScript |
| 178 | +symbols = get_symbols("app.ts") |
| 179 | +func = get_function("app.ts", "handleRequest") |
| 180 | +``` |
| 181 | + |
| 182 | +```go |
| 183 | +// Works with Go |
| 184 | +symbols = get_symbols("main.go") |
| 185 | +method = get_function("main.go", "ServeHTTP") |
| 186 | +``` |
| 187 | + |
| 188 | +## Supported Languages |
| 189 | + |
| 190 | +- Python, JavaScript, TypeScript, JSX/TSX |
| 191 | +- Go, Rust, C, C++, C#, Java |
| 192 | +- Ruby, PHP, Swift, Kotlin, Scala |
| 193 | +- Bash, PowerShell, SQL |
| 194 | +- Haskell, OCaml, Elixir, Clojure |
| 195 | +- And many more... |
| 196 | + |
| 197 | +## Best Practices |
| 198 | + |
| 199 | +1. **Always use `get_symbols` first** when exploring a new file |
| 200 | +2. **Use `get_function/get_class`** instead of reading entire files |
| 201 | +3. **Use `get_lines`** when you know exact line numbers |
| 202 | +4. **Use `get_signature`** for quick API exploration |
| 203 | + |
| 204 | +## Why Not Just Use Read? |
| 205 | + |
| 206 | +Traditional file reading tools require you to: |
| 207 | +- Read entire files (inefficient for large files) |
| 208 | +- Manually parse code to find functions/classes |
| 209 | +- Count lines manually for extraction |
| 210 | +- Deal with complex syntax and edge cases |
| 211 | + |
| 212 | +MCP Code Extractor: |
| 213 | +- ✅ Extracts exactly what you need |
| 214 | +- ✅ Provides structured data with metadata |
| 215 | +- ✅ Handles complex syntax automatically |
| 216 | +- ✅ Works across 30+ languages consistently |
| 217 | + |
| 218 | +## Contributing |
| 219 | + |
| 220 | +Contributions are welcome! Please feel free to submit a Pull Request. |
| 221 | + |
| 222 | +## License |
| 223 | + |
| 224 | +MIT License - see LICENSE file for details. |
| 225 | + |
| 226 | +## Acknowledgments |
| 227 | + |
| 228 | +- Built on [tree-sitter](https://tree-sitter.github.io/) for robust parsing |
| 229 | +- Uses [tree-sitter-languages](https://github.com/grantjenks/py-tree-sitter-languages) for language support |
| 230 | +- Implements the [Model Context Protocol](https://modelcontextprotocol.io/) specification |
0 commit comments