This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Testing:
# Run all tests using pytest
uv run pytest
# Run individual test files
uv run pytest tests/test_extractor.py
uv run pytest tests/test_languages.py
uv run pytest tests/test_models.py
uv run pytest tests/test_git_support.py
# Quick functional test of core MCP server
python test_new_mcp.py
# Test semantic search functionality (includes directory search tests)
python test_semantic_search.pyRunning the MCP Server:
# Run as package with UV (recommended)
uv run mcp-server-code-extractor
# Run as Python module
uv run python -m code_extractor
# Test with MCP Inspector
npx @modelcontextprotocol/inspector uv run mcp-server-code-extractor
# Example search_code_tool usage in MCP Inspector:
# Single file search:
# search_type: "function-calls"
# target: "get_file_content"
# scope: "code_extractor/server.py"
# Directory search (searches all files in directory):
# search_type: "function-calls"
# target: "get_file_content"
# scope: "code_extractor"
# file_patterns: ["*.py"]
# max_results: 50
# Directory search with exclusions:
# search_type: "function-calls"
# target: "print"
# scope: "code_extractor"
# file_patterns: ["*.py"]
# exclude_patterns: ["test_*", "__pycache__/*"]
# For uvx usage (after publishing)
uvx mcp-server-code-extractorDevelopment Dependencies:
# Install development dependencies (testing, formatting, linting)
uv add --dev pytest black flake8 mypyCode Quality:
# Format code with Black
uv run black .
# Lint with flake8
uv run flake8 .
# Type checking with mypy
uv run mypy .- To bump the version use uv version --bump <patch|minor|major> commit (only the changes to pyproject.toml) push, and monitor the workflow run to verify it actually gets deployed.
- IMPORTANT: The deployment workflow is triggered by git tags, not pushes. After pushing version changes, create and push a tag:
git tag v<version> && git push origin v<version>
This is an MCP (Model Context Protocol) server that provides precise code extraction using tree-sitter parsing. The codebase has a clean package structure:
- MCP Server (
server.py) - FastMCP server with 5 extraction tools - Core Library (
extractor.py) - Query-driven extraction engine using tree-sitter - Data Models (
models.py) - Rich symbol representations with hierarchical relationships - Language Support (
languages.py) - Detection and mapping for 30+ programming languages - Tree-sitter Queries (
queries/) - Language-specific syntax parsing patterns - File Reading (
file_reader.py) - Unified file reading with VCS and URL support - URL Fetching (
url_fetcher.py) - HTTP fetching with caching and error handling - VCS Support (
vcs/) - Pluggable version control system abstraction - Entry Points (
__main__.py) - Module execution support
- Console Script:
mcp-server-code-extractor- Direct execution via uvx/pip - Module Execution:
python -m code_extractor- Run as Python module - Package Import:
from code_extractor import CodeExtractor- Library usage
Method vs Function Classification: The core innovation is distinguishing methods (functions inside classes) from top-level functions using tree-sitter query patterns. This solves the context problem where traditional parsers can't determine if a function is a class method without understanding the containment hierarchy.
Two-Layer Symbol Processing:
- Query capture phase: Tree-sitter queries extract syntax nodes with semantic labels
- Symbol building phase: Raw captures are processed into rich
CodeSymbolobjects with hierarchical relationships
Clean MCP Interface: The server uses FastMCP for simple tool registration and exposes 5 core extraction tools with consistent function signatures and error handling.
Pluggable VCS Architecture:
The codebase includes a pluggable VCS abstraction layer (code_extractor/vcs/) that enables git revision support while maintaining extensibility for future version control systems. All file reading is centralized through file_reader.py which automatically detects git repositories and delegates to the appropriate VCS provider.
Tree-sitter queries are stored in code_extractor/queries/ and use the S-expression format:
; Extract methods inside classes
(class_definition
body: (block
(function_definition
name: (identifier) @method.name
parameters: (parameters) @method.parameters) @method.definition))Query Structure:
- Capture names use
category.typeformat (e.g.,method.name,function.definition) - The extractor groups captures by their definition nodes to build complete symbols
- Parent-child relationships are determined by byte range containment
New languages require:
- Adding language mapping in
code_extractor/languages.py - Creating tree-sitter query file in
code_extractor/queries/ - Testing with language-specific syntax patterns
The system automatically detects language from file extensions and falls back gracefully for unsupported languages.