Skip to content

Latest commit

 

History

History
178 lines (138 loc) · 6.27 KB

File metadata and controls

178 lines (138 loc) · 6.27 KB

Large Repository Handling Guide

This guide explains how to handle large repositories with the CodeCodePrism MCP server, including memory management, filtering options, and performance optimization.

New Improved Defaults ✨

Good news! The MCP server now has much better defaults based on real-world usage:

  • Memory limit: 4GB (was 1GB)
  • Batch size: 30 files (was 50, optimized for memory efficiency)
  • File filtering: Automatically includes common programming languages
  • Directory exclusion: Common build/dependency directories excluded by default

This means for most repositories, you can simply run:

./target/release/codeprism-mcp /path/to/your/repository

Default Configuration

Memory & Performance

  • Memory limit: 4GB (suitable for modern development machines)
  • Batch size: 30 files (optimized balance of speed and memory usage)
  • Streaming mode: Automatically enabled for repositories with >10,000 files

File Filtering (New Defaults)

Excluded directories (automatic):

  • Version Control: .git
  • Package Management: node_modules, vendor
  • Build Artifacts: target, build, dist, coverage
  • Python Virtual Environments: .venv, venv, .tox, .env, env
  • Python Caches: __pycache__, .pytest_cache, .mypy_cache, .ruff_cache
  • Web Build Artifacts: .next, .nuxt
  • IDE/Editor: .vscode, .idea
  • OS Files: .DS_Store, Thumbs.db

Included file extensions (automatic):

  • Web: js, ts, jsx, tsx
  • Systems: rs, c, cpp, h, hpp, go
  • Enterprise: java, kt, swift
  • Scripting: py, rb, php

When to Override Defaults

Dependency Analysis Options

New feature! 🎯 You can now control how dependencies are handled:

1. Minimal (Default) - Fast but Limited

./target/release/codeprism-mcp /path/to/repo
  • Excludes: All dependency directories (.tox, venv, node_modules, etc.)
  • Pros: Fast indexing, low memory usage, focuses on your code
  • Cons: Can't follow imports into dependencies, missing external API intelligence

2. Smart Dependency Scanning - Balanced

./target/release/codeprism-mcp --smart-deps /path/to/repo
  • Includes: Public APIs and commonly used dependency files only
  • Excludes: Internal implementation details, tests, documentation
  • Includes: __init__.py, index.js, lib.rs, top-level modules
  • Excludes: */tests/, */internal/, */_private/, deep nested files
  • Best for: Following imports while keeping performance reasonable

3. Complete Analysis - Comprehensive but Slow

./target/release/codeprism-mcp --include-deps /path/to/repo
  • Includes: Everything including full dependency source code
  • Pros: Complete code intelligence, full import following, comprehensive analysis
  • Cons: Much slower indexing, high memory usage, may hit memory limits

Comparison Examples

Your specific use case: Following imports in your Agent class

# Your code can now follow these imports in smart mode:
from rustic_ai.core.agents.commons.message_formats import ErrorMessagefrom pydantic import BaseModel  ✅ (public API)
from some_lib.internal._private import InternalHelper  ❌ (excluded in smart mode)

Recommended Approach

  1. Start with smart mode for most development:
./target/release/codeprism-mcp --smart-deps --memory-limit 4096 /path/to/repo
  1. Use complete analysis when you need full dependency intelligence:
./target/release/codeprism-mcp --include-deps --memory-limit 8192 /path/to/repo
  1. Use minimal mode for CI/CD or when performance is critical:
./target/release/codeprism-mcp /path/to/repo  # Default

Very Large Repositories (>50,000 files)

./target/release/codeprism-mcp --memory-limit 8192 --batch-size 20 /path/to/huge/repo

Specific Language Focus

# Only Python and JavaScript
./target/release/codeprism-mcp --include-extensions py,js,ts /path/to/repo

# Only Rust projects  
./target/release/codeprism-mcp --include-extensions rs,toml /path/to/repo

Memory-Constrained Systems

# For systems with limited RAM
./target/release/codeprism-mcp --memory-limit 2048 --batch-size 15 /path/to/repo

Include All File Types

# Override defaults to include all files (not recommended for large repos)
./target/release/codeprism-mcp --include-extensions "*" /path/to/repo

Command Line Options Reference

Option New Default Description
--memory-limit MB 4096 Memory limit in MB
--batch-size SIZE 30 Files processed in parallel
--max-file-size MB 10 Skip files larger than this
--exclude-dirs DIRS Smart defaults Comma-separated directories to exclude
--include-extensions EXTS Programming languages Comma-separated file extensions
--disable-memory-limit false Disable memory checking
--verbose false Enable verbose logging

Default Excluded Directories

  • .git, node_modules, target, .venv, __pycache__, build, dist, vendor
  • .tox, venv, .env, env, .pytest_cache, .mypy_cache, .ruff_cache
  • .next, .nuxt, coverage, .coverage, .vscode, .idea, .DS_Store, Thumbs.db

Default Included Extensions

  • py, js, ts, jsx, tsx, rs, java, cpp, c, h, hpp, go, php, rb, kt, swift

Migration from Old Defaults

If you were using the old defaults and want to include all file types like before:

# Old behavior (include all files, 1GB limit)
./target/release/codeprism-mcp --memory-limit 1024 --include-extensions "*" /path/to/repo

Examples

Simple Usage (Recommended)

# Uses optimized defaults - works for most repositories
./target/release/codeprism-mcp /path/to/your/repository

Custom Configuration Examples

# Minimal memory usage
./target/release/codeprism-mcp --memory-limit 1024 --batch-size 15

# Maximum performance (if you have lots of RAM)
./target/release/codeprism-mcp --memory-limit 16384 --batch-size 50

# Disable memory limits entirely
./target/release/codeprism-mcp --disable-memory-limit

The new defaults should handle most real-world repositories without any configuration needed!