Skip to content

Latest commit

 

History

History
365 lines (297 loc) · 11.8 KB

File metadata and controls

365 lines (297 loc) · 11.8 KB

GitIngest – AI Agent Integration Guide

Turn any Git repository into a prompt-ready text digest. GitIngest fetches, cleans, and formats source code so AI agents and Large Language Models can reason over complete projects programmatically.

🤖 For AI Agents: Use CLI or Python package for automated integration. Web UI is designed for human interaction only.


1. Installation

1.1 CLI Installation (Recommended for Scripts & Automation)

# Best practice: Use pipx for CLI tools (isolated environment)
pipx install gitingest

# Alternative: Use pip (may conflict with other packages)
pip install gitingest

# Verify installation
gitingest --help

1.2 Python Package Installation (For Code Integration)

# For projects/notebooks: Use pip in virtual environment
python -m venv gitingest-env
source gitingest-env/bin/activate  # On Windows: gitingest-env\Scripts\activate
pip install gitingest

# Or add to requirements.txt
echo "gitingest" >> requirements.txt
pip install -r requirements.txt

# For self-hosting: Install with server dependencies
pip install gitingest[server]

# For development: Install with dev dependencies
pip install gitingest[dev,server]

1.3 Installation Verification

# Test CLI installation
gitingest --version

# Test Python package
python -c "from gitingest import ingest; print('GitIngest installed successfully')"

# Quick functionality test
gitingest https://github.com/octocat/Hello-World -o test_output.txt

2. Quick-Start for AI Agents

Method Best for One-liner
CLI Scripts, automation, pipelines gitingest https://github.com/user/repo -o - | your-llm
Python Code integration, notebooks, async tasks from gitingest import ingest; s,t,c = ingest('repo-url'); process(c)
URL Hack Quick web scraping (limited) Replace github.comgitingest.com in any GitHub URL
Web UI Human use only Not recommended for AI agents

3. Output Format for AI Processing

GitIngest returns structured plain-text optimized for LLM consumption with three distinct sections:

3.1 Repository Summary

Repository: owner/repo-name
Files analyzed: 42
Estimated tokens: 15.2k

Contains basic metadata: repository name, file count, and token estimation for LLM planning.

3.2 Directory Structure

Directory structure:
└── project-name/
    ├── src/
    │   ├── main.py
    │   └── utils.py
    ├── tests/
    │   └── test_main.py
    └── README.md

Hierarchical tree view showing the complete project structure for context and navigation.

3.3 File Contents

Each file is wrapped with clear delimiters:

================================================
FILE: src/main.py
================================================
def hello_world():
    print("Hello, World!")

if __name__ == "__main__":
    hello_world()


================================================
FILE: README.md
================================================
# Project Title

This is a sample project...

3.4 Usage Example

# Python package usage
from gitingest import ingest

summary, tree, content = ingest("https://github.com/octocat/Hello-World")

# Returns exactly:
# summary = "Repository: octocat/hello-world\nFiles analyzed: 1\nEstimated tokens: 29"
# tree = "Directory structure:\n└── octocat-hello-world/\n    └── README"
# content = "================================================\nFILE: README\n================================================\nHello World!\n\n\n"

# For AI processing, combine all sections:
full_context = f"{summary}\n\n{tree}\n\n{content}"
# CLI usage - pipe directly to your AI system
gitingest https://github.com/octocat/Hello-World -o - | your_llm_processor

# Output streams the complete formatted text:
# Repository: octocat/hello-world
# Files analyzed: 1
# Estimated tokens: 29
#
# Directory structure:
# └── octocat-hello-world/
#     └── README
#
# ================================================
# FILE: README
# ================================================
# Hello World!

4. AI Agent Integration Methods

4.1 CLI Integration (Recommended for Automation)

# Basic usage - pipe directly to your AI system
gitingest https://github.com/user/repo -o - | your_ai_processor

# Advanced filtering for focused analysis (long flags)
gitingest https://github.com/user/repo \
  --include-pattern "*.py" --include-pattern "*.js" --include-pattern "*.md" \
  --max-size 102400 \
  -o - | python your_analyzer.py

# Same command with short flags (more concise)
gitingest https://github.com/user/repo \
  -i "*.py" -i "*.js" -i "*.md" \
  -s 102400 \
  -o - | python your_analyzer.py

# Exclude unwanted files and directories (long flags)
gitingest https://github.com/user/repo \
  --exclude-pattern "node_modules/*" --exclude-pattern "*.log" \
  --exclude-pattern "dist/*" \
  -o - | your_analyzer

# Same with short flags
gitingest https://github.com/user/repo \
  -e "node_modules/*" -e "*.log" -e "dist/*" \
  -o - | your_analyzer

# Private repositories with token (short flag)
export GITHUB_TOKEN="ghp_your_token_here"
gitingest https://github.com/user/private-repo -t $GITHUB_TOKEN -o -

# Specific branch analysis (short flag)
gitingest https://github.com/user/repo -b main -o -

# Save to file (default: digest.txt in current directory)
gitingest https://github.com/user/repo -o my_analysis.txt

# Ultra-concise example for small files only
gitingest https://github.com/user/repo -i "*.py" -s 51200 -o -

Key Parameters for AI Agents:

  • -s / --max-size: Maximum file size in bytes to process (default: no limit)
  • -i / --include-pattern: Include files matching Unix shell-style wildcards
  • -e / --exclude-pattern: Exclude files matching Unix shell-style wildcards
  • -b / --branch: Specify branch to analyze (defaults to repository's default branch)
  • -t / --token: GitHub personal access token for private repositories
  • -o / --output: Stream to STDOUT with - (default saves to digest.txt)

4.2 Python Package (Best for Code Integration)

from gitingest import ingest, ingest_async
import asyncio

# Synchronous processing
def analyze_repository(repo_url: str):
    summary, tree, content = ingest(repo_url)

    # Process metadata
    repo_info = parse_summary(summary)

    # Analyze structure
    file_structure = parse_tree(tree)

    # Process code content
    return analyze_code(content)

# Asynchronous processing (recommended for AI services)
async def batch_analyze_repos(repo_urls: list):
    tasks = [ingest_async(url) for url in repo_urls]
    results = await asyncio.gather(*tasks)
    return [process_repo_data(*result) for result in results]

# Memory-efficient processing for large repos
def stream_process_repo(repo_url: str):
    summary, tree, content = ingest(
        repo_url,
        max_file_size=51200,  # 50KB max per file
        include_patterns=["*.py", "*.js"],  # Focus on code files
    )

    # Process in chunks to manage memory
    for file_content in split_content(content):
        yield analyze_file(file_content)

# Filtering with exclude patterns
def analyze_without_deps(repo_url: str):
    summary, tree, content = ingest(
        repo_url,
        exclude_patterns=[
            "node_modules/*", "*.lock", "dist/*",
            "build/*", "*.min.js", "*.log"
        ]
    )
    return analyze_code(content)

Python Integration Patterns:

  • Batch Processing: Use ingest_async for multiple repositories
  • Memory Management: Use max_file_size and pattern filtering for large repos
  • Error Handling: Wrap in try-catch for network/auth issues
  • Caching: Store results to avoid repeated API calls
  • Pattern Filtering: Use include_patterns and exclude_patterns lists

4.3 Web UI (❌ Not for AI Agents)

The web interface at https://gitingest.com is designed for human interaction only.

Why AI agents should avoid the web UI:

  • Requires manual interaction and browser automation
  • No programmatic access to results
  • Rate limiting and CAPTCHA protection
  • Inefficient for automated workflows

Use CLI or Python package instead for all AI agent integrations.


5. AI Agent Best Practices

5.1 Repository Analysis Workflows

# Pattern 1: Full repository analysis
def full_repo_analysis(repo_url: str):
    summary, tree, content = ingest(repo_url)
    return {
        'metadata': extract_metadata(summary),
        'structure': analyze_structure(tree),
        'code_analysis': analyze_all_files(content),
        'insights': generate_insights(summary, tree, content)
    }

# Pattern 2: Selective file processing
def selective_analysis(repo_url: str, file_patterns: list):
    summary, tree, content = ingest(
        repo_url,
        include_patterns=file_patterns
    )
    return focused_analysis(content)

# Pattern 3: Streaming for large repos
def stream_analysis(repo_url: str):
    # First pass: get structure and metadata only
    summary, tree, _ = ingest(
        repo_url,
        include_patterns=["*.md", "*.txt"],
        max_file_size=10240  # 10KB limit for docs
    )

    # Then process code files selectively by language
    for pattern in ["*.py", "*.js", "*.go", "*.rs"]:
        _, _, content = ingest(
            repo_url,
            include_patterns=[pattern],
            max_file_size=51200  # 50KB limit for code
        )
        yield process_language_specific(content, pattern)

5.2 Error Handling for AI Agents

from gitingest import ingest
from gitingest.utils.exceptions import GitIngestError
import time

def robust_ingest(repo_url: str, retries: int = 3):
    for attempt in range(retries):
        try:
            return ingest(repo_url)
        except GitIngestError as e:
            if attempt == retries - 1:
                return None, None, f"Failed to ingest: {e}"
            time.sleep(2 ** attempt)  # Exponential backoff

5.3 Private Repository Access

import os
from gitingest import ingest

# Method 1: Environment variable
def ingest_private_repo(repo_url: str):
    token = os.getenv('GITHUB_TOKEN')
    if not token:
        raise ValueError("GITHUB_TOKEN environment variable required")
    return ingest(repo_url, token=token)

# Method 2: Secure token management
def ingest_with_token_rotation(repo_url: str, token_manager):
    token = token_manager.get_active_token()
    try:
        return ingest(repo_url, token=token)
    except AuthenticationError:
        token = token_manager.rotate_token()
        return ingest(repo_url, token=token)

6. Integration Scenarios for AI Agents

Use Case Recommended Method Example Implementation
Code Review Bot Python async await ingest_async(pr_repo) → analyze changes
Documentation Generator CLI with filtering gitingest repo -i "*.py" -i "*.md" -o -
Vulnerability Scanner Python with error handling Batch process multiple repos
Code Search Engine CLI → Vector DB gitingest repo -o - | embed | store
AI Coding Assistant Python integration Load repo context into conversation
CI/CD Analysis CLI integration gitingest repo -o - | analyze_pipeline
Repository Summarization Python with streaming Process large repos in chunks
Dependency Analysis CLI exclude patterns gitingest repo -e "node_modules/*" -e "*.lock" -o -
Security Audit CLI with size limits gitingest repo -i "*.py" -i "*.js" -s 204800 -o -

7. Support & Resources for AI Developers

GitIngest – Purpose-built for AI agents to understand entire codebases programmatically.