- Scans a repo and produces both human-friendly Markdown and LLM-ready JSON/JSONL in one shot.
- NEW v1.2.2: Root-safe symlink handling + large-file masking (no bypass) + deprecation-free pathspec matching.
- NEW v1.2.1: Enterprise-grade security (5 critical fixes) + flexible 3-tier configuration system.
- NEW v1.2.0: Zero-config intelligence (60-70% token reduction) with Gravitas compression, smart query, and AST sampling.
- Smart sampling + token budgeting to avoid overloading models.
- Built-in masking for secrets and a Spicy 5-level risk report (on by default).
SIDRCE Grade: A (90-94/100) — Security: A+ | Reliability: A | Performance: A
- Onboarding: New teammates see the structure, key files, and summaries in under a minute.
- PR / Code Review: Drop the JSONL straight into your LLM to review a large codebase without token waste.
- Safety & Quality: Spicy highlights risky spots; masking protects tokens/keys/PEM/JWT/DB URLs.
- Reproducible: Cache/venv noise auto-excluded;
--no-timestampkeeps outputs stable for CI. - NEW v1.2.2: Root-safe symlink traversal + large-file masking + cleaner pathspec matching.
- NEW v1.2.1: Enterprise security (Markdown injection prevention, RCE elimination) + flexible configuration.
- NEW v1.2.0: Automatic 60-70% token reduction with zero configuration overhead.
- Gravitas Compression: 30-50% token reduction using symbolic compression (auto-enabled in
pro/ai) - Smart Query Processing: Typo correction + synonym expansion, 60%→90% accuracy (auto-enabled with queries)
- AST Semantic Sampling: Python structure extraction, 30-40% additional reduction (auto-enabled for .py files)
- Security Fixes: 5 critical issues resolved (Markdown injection, RCE, silent failures, glob expansion, externalized config)
- 3-Tier Priority System: User CLI > Project config (pyproject.toml) > System defaults (defaults.json)
- Flexible Configuration:
--defaults-filefor custom defaults,[tool.dir2md.excludes]for project-level patterns
- Symlink Guard: Followed symlinked directories must stay inside root
- Large-File Masking: Masking now chunks large inputs to avoid bypass
- PathSpec Engine: Uses
gitignorematcher to avoid deprecation warnings
- Sampling & Budgets: Head/tail sampling, token estimates; auto-skip when budgets would overflow.
- SimHash Dedup: Filters near-duplicate files/build artifacts.
- Masking: Basic & advanced patterns (keys, tokens, JWT, DB URLs, PEM, Slack/GitHub tokens) + custom regex.
- Spicy Risk Report: 5 severities (ok/warn/risk/high/critical), score+counts+per-file findings.
--spicy-strictmakes CI fail on high/critical. - Modular pipeline:
walker(tree/filter) →selector(sampling/SimHash) →renderer(md/json/jsonl) →orchestrator(multi-format).
- Markdown (.md): Tree, selected snippets, stats, Spicy table (file, line, severity, message, suggestion).
- JSON (.json): Structured manifest of files/stats/spicy for downstream tools.
- JSONL (.jsonl): One line per file (path, content/snippet, meta) plus spicy summary — ideal for LLM ingest/vector DB.
- v1.2.2: Root-safe symlink traversal + large-file masking (no bypass)
- v1.2.1: Flexible 3-tier exclusion system (user > project > system defaults)
- v1.2.0: Auto-enabled intelligence (Gravitas + Smart Query + AST in
pro/aipresets) - Spicy ON (disable with
--no-spicy; gate with--spicy-strict) - Dual outputs (md + jsonl)
- Noise auto-excluded:
.pytest_cache,.ruff_cache,venv_clean, etc. (configurable viadefaults.json) - UTF-8 writes; human md + machine jsonl in one run
| preset | mode | budget | best for |
|---|---|---|---|
pro (default) |
summary/ref | user-set | balanced CI/PR context |
raw |
inline | unlimited | full code visibility |
ai |
ref | 6000 cap | LLM-focused, query-prioritized |
fast |
off (no contents) | n/a | ultra-light tree + manifest |
# Default: md + jsonl, spicy on, auto-intelligence
dir2md .
# Ultra-light for PR comment (tree + manifest only)
dir2md . --fast
# Query-focused LLM context with all optimizations (v1.2.0)
dir2md . --ai-mode --query "auth flow" --output-format jsonl
# Auto: typo correction + expansion + gravitas medium + AST sampling
# Enforce failure on high/critical risks
dir2md . --spicy --spicy-strict
# NEW v1.2.1: Custom configuration
dir2md . --defaults-file my-defaults.json # Custom system defaults
dir2md . --exclude-glob "secret-data/" # User override (highest priority)
# NEW v1.2.1: Project-level config in pyproject.toml
# [tool.dir2md]
# excludes = ["*.log", "temp/", "cache/"]
dir2md . # Automatically uses project config- Use
--output-format jsonlfor direct LLM ingestion. - Use
--fast+ artifact upload for PR context. - Add
--spicy-strictas a quality gate. - v1.2.1: Use
--defaults-filefor CI-specific exclusion patterns. - v1.2.0: Use
--ai-modefor maximum token efficiency (60-70% reduction).
- Saves time: No manual tree/sampling; outputs tailored for humans and LLMs simultaneously.
- Reduces risk: Masking + Spicy risk surfacing by default; v1.2.1 eliminates critical security vulnerabilities.
- Safer by default: v1.2.2 blocks symlink escapes and preserves masking on large files.
- Minimal setup: Caches/venvs ignored by default; reproducible outputs.
- Intelligent: v1.2.0 auto-optimizations reduce tokens by 60-70% with zero configuration.
- Flexible: v1.2.1 3-tier configuration system adapts to any workflow (user > project > system).
- Production-ready: SIDRCE Grade A (90-94/100) with enterprise-grade quality.
- CLI Reference - All commands, options, and examples
- Features Guide - Technical capabilities and architecture details
- Troubleshooting - Common issues and solutions
- Main README - Project overview
- Usage Examples - Copy-paste recipes
- Contributing - Development guidelines
- GitHub Issues - Bug reports
- GitHub Discussions - Questions and ideas
- Email: info@flamehaven.space (security issues)
Made for developers who want their AI to actually understand their code.