Turn any Git repository into a prompt-ready text digest. GitIngest fetches, cleans, and formats source code so AI agents and Large Language Models can reason over complete projects programmatically.
🤖 For AI Agents: Use CLI or Python package for automated integration. Web UI is designed for human interaction only.
# Best practice: Use pipx for CLI tools (isolated environment)
pipx install gitingest
# Alternative: Use pip (may conflict with other packages)
pip install gitingest
# Verify installation
gitingest --help# For projects/notebooks: Use pip in virtual environment
python -m venv gitingest-env
source gitingest-env/bin/activate # On Windows: gitingest-env\Scripts\activate
pip install gitingest
# Or add to requirements.txt
echo "gitingest" >> requirements.txt
pip install -r requirements.txt
# For self-hosting: Install with server dependencies
pip install gitingest[server]
# For development: Install with dev dependencies
pip install gitingest[dev,server]# Test CLI installation
gitingest --version
# Test Python package
python -c "from gitingest import ingest; print('GitIngest installed successfully')"
# Quick functionality test
gitingest https://github.com/octocat/Hello-World -o test_output.txt| Method | Best for | One-liner |
|---|---|---|
| CLI | Scripts, automation, pipelines | gitingest https://github.com/user/repo -o - | your-llm |
| Python | Code integration, notebooks, async tasks | from gitingest import ingest; s,t,c = ingest('repo-url'); process(c) |
| URL Hack | Quick web scraping (limited) | Replace github.com → gitingest.com in any GitHub URL |
| Web UI | Human use only |
GitIngest returns structured plain-text optimized for LLM consumption with three distinct sections:
Repository: owner/repo-name
Files analyzed: 42
Estimated tokens: 15.2k
Contains basic metadata: repository name, file count, and token estimation for LLM planning.
Directory structure:
└── project-name/
├── src/
│ ├── main.py
│ └── utils.py
├── tests/
│ └── test_main.py
└── README.md
Hierarchical tree view showing the complete project structure for context and navigation.
Each file is wrapped with clear delimiters:
================================================
FILE: src/main.py
================================================
def hello_world():
print("Hello, World!")
if __name__ == "__main__":
hello_world()
================================================
FILE: README.md
================================================
# Project Title
This is a sample project...
# Python package usage
from gitingest import ingest
summary, tree, content = ingest("https://github.com/octocat/Hello-World")
# Returns exactly:
# summary = "Repository: octocat/hello-world\nFiles analyzed: 1\nEstimated tokens: 29"
# tree = "Directory structure:\n└── octocat-hello-world/\n └── README"
# content = "================================================\nFILE: README\n================================================\nHello World!\n\n\n"
# For AI processing, combine all sections:
full_context = f"{summary}\n\n{tree}\n\n{content}"# CLI usage - pipe directly to your AI system
gitingest https://github.com/octocat/Hello-World -o - | your_llm_processor
# Output streams the complete formatted text:
# Repository: octocat/hello-world
# Files analyzed: 1
# Estimated tokens: 29
#
# Directory structure:
# └── octocat-hello-world/
# └── README
#
# ================================================
# FILE: README
# ================================================
# Hello World!# Basic usage - pipe directly to your AI system
gitingest https://github.com/user/repo -o - | your_ai_processor
# Advanced filtering for focused analysis (long flags)
gitingest https://github.com/user/repo \
--include-pattern "*.py" --include-pattern "*.js" --include-pattern "*.md" \
--max-size 102400 \
-o - | python your_analyzer.py
# Same command with short flags (more concise)
gitingest https://github.com/user/repo \
-i "*.py" -i "*.js" -i "*.md" \
-s 102400 \
-o - | python your_analyzer.py
# Exclude unwanted files and directories (long flags)
gitingest https://github.com/user/repo \
--exclude-pattern "node_modules/*" --exclude-pattern "*.log" \
--exclude-pattern "dist/*" \
-o - | your_analyzer
# Same with short flags
gitingest https://github.com/user/repo \
-e "node_modules/*" -e "*.log" -e "dist/*" \
-o - | your_analyzer
# Private repositories with token (short flag)
export GITHUB_TOKEN="ghp_your_token_here"
gitingest https://github.com/user/private-repo -t $GITHUB_TOKEN -o -
# Specific branch analysis (short flag)
gitingest https://github.com/user/repo -b main -o -
# Save to file (default: digest.txt in current directory)
gitingest https://github.com/user/repo -o my_analysis.txt
# Ultra-concise example for small files only
gitingest https://github.com/user/repo -i "*.py" -s 51200 -o -Key Parameters for AI Agents:
-s/--max-size: Maximum file size in bytes to process (default: no limit)-i/--include-pattern: Include files matching Unix shell-style wildcards-e/--exclude-pattern: Exclude files matching Unix shell-style wildcards-b/--branch: Specify branch to analyze (defaults to repository's default branch)-t/--token: GitHub personal access token for private repositories-o/--output: Stream to STDOUT with-(default saves todigest.txt)
from gitingest import ingest, ingest_async
import asyncio
# Synchronous processing
def analyze_repository(repo_url: str):
summary, tree, content = ingest(repo_url)
# Process metadata
repo_info = parse_summary(summary)
# Analyze structure
file_structure = parse_tree(tree)
# Process code content
return analyze_code(content)
# Asynchronous processing (recommended for AI services)
async def batch_analyze_repos(repo_urls: list):
tasks = [ingest_async(url) for url in repo_urls]
results = await asyncio.gather(*tasks)
return [process_repo_data(*result) for result in results]
# Memory-efficient processing for large repos
def stream_process_repo(repo_url: str):
summary, tree, content = ingest(
repo_url,
max_file_size=51200, # 50KB max per file
include_patterns=["*.py", "*.js"], # Focus on code files
)
# Process in chunks to manage memory
for file_content in split_content(content):
yield analyze_file(file_content)
# Filtering with exclude patterns
def analyze_without_deps(repo_url: str):
summary, tree, content = ingest(
repo_url,
exclude_patterns=[
"node_modules/*", "*.lock", "dist/*",
"build/*", "*.min.js", "*.log"
]
)
return analyze_code(content)Python Integration Patterns:
- Batch Processing: Use
ingest_asyncfor multiple repositories - Memory Management: Use
max_file_sizeand pattern filtering for large repos - Error Handling: Wrap in try-catch for network/auth issues
- Caching: Store results to avoid repeated API calls
- Pattern Filtering: Use
include_patternsandexclude_patternslists
The web interface at https://gitingest.com is designed for human interaction only.
Why AI agents should avoid the web UI:
- Requires manual interaction and browser automation
- No programmatic access to results
- Rate limiting and CAPTCHA protection
- Inefficient for automated workflows
Use CLI or Python package instead for all AI agent integrations.
# Pattern 1: Full repository analysis
def full_repo_analysis(repo_url: str):
summary, tree, content = ingest(repo_url)
return {
'metadata': extract_metadata(summary),
'structure': analyze_structure(tree),
'code_analysis': analyze_all_files(content),
'insights': generate_insights(summary, tree, content)
}
# Pattern 2: Selective file processing
def selective_analysis(repo_url: str, file_patterns: list):
summary, tree, content = ingest(
repo_url,
include_patterns=file_patterns
)
return focused_analysis(content)
# Pattern 3: Streaming for large repos
def stream_analysis(repo_url: str):
# First pass: get structure and metadata only
summary, tree, _ = ingest(
repo_url,
include_patterns=["*.md", "*.txt"],
max_file_size=10240 # 10KB limit for docs
)
# Then process code files selectively by language
for pattern in ["*.py", "*.js", "*.go", "*.rs"]:
_, _, content = ingest(
repo_url,
include_patterns=[pattern],
max_file_size=51200 # 50KB limit for code
)
yield process_language_specific(content, pattern)from gitingest import ingest
from gitingest.utils.exceptions import GitIngestError
import time
def robust_ingest(repo_url: str, retries: int = 3):
for attempt in range(retries):
try:
return ingest(repo_url)
except GitIngestError as e:
if attempt == retries - 1:
return None, None, f"Failed to ingest: {e}"
time.sleep(2 ** attempt) # Exponential backoffimport os
from gitingest import ingest
# Method 1: Environment variable
def ingest_private_repo(repo_url: str):
token = os.getenv('GITHUB_TOKEN')
if not token:
raise ValueError("GITHUB_TOKEN environment variable required")
return ingest(repo_url, token=token)
# Method 2: Secure token management
def ingest_with_token_rotation(repo_url: str, token_manager):
token = token_manager.get_active_token()
try:
return ingest(repo_url, token=token)
except AuthenticationError:
token = token_manager.rotate_token()
return ingest(repo_url, token=token)| Use Case | Recommended Method | Example Implementation |
|---|---|---|
| Code Review Bot | Python async | await ingest_async(pr_repo) → analyze changes |
| Documentation Generator | CLI with filtering | gitingest repo -i "*.py" -i "*.md" -o - |
| Vulnerability Scanner | Python with error handling | Batch process multiple repos |
| Code Search Engine | CLI → Vector DB | gitingest repo -o - | embed | store |
| AI Coding Assistant | Python integration | Load repo context into conversation |
| CI/CD Analysis | CLI integration | gitingest repo -o - | analyze_pipeline |
| Repository Summarization | Python with streaming | Process large repos in chunks |
| Dependency Analysis | CLI exclude patterns | gitingest repo -e "node_modules/*" -e "*.lock" -o - |
| Security Audit | CLI with size limits | gitingest repo -i "*.py" -i "*.js" -s 204800 -o - |
- Web UI official instance: https://gitingest.com
- GitHub Repository: https://github.com/coderamp-labs/gitingest
- Python Package: https://pypi.org/project/gitingest/
- Community Support: https://discord.gg/zerRaGK9EC
GitIngest – Purpose-built for AI agents to understand entire codebases programmatically.