11# TreeMapper
22
3- > Extends [ ../CLAUDE.md] ( ../CLAUDE.md )
3+ <!-- Extends ../CLAUDE.md -->
4+
5+ [ ![ PyPI] ( https://img.shields.io/pypi/v/treemapper )] ( https://pypi.org/project/treemapper/ )
6+ [ ![ Downloads] ( https://img.shields.io/pypi/dm/treemapper )] ( https://pypi.org/project/treemapper/ )
7+ [ ![ License] ( https://img.shields.io/github/license/nikolay-e/treemapper )] ( https://github.com/nikolay-e/treemapper/blob/main/LICENSE )
8+
9+ ** Export your codebase for AI/LLM context in one command.**
10+
11+ ``` bash
12+ pip install treemapper # core (no native extensions)
13+ pip install ' treemapper[tree-sitter]' # + AST parsing for 10 languages
14+ treemapper . -o context.yaml # paste into ChatGPT/Claude
15+ ```
16+
17+ ## Why TreeMapper?
18+
19+ Unlike ` tree ` or ` find ` , TreeMapper exports ** structure + file
20+ contents** in a format optimized for LLM context windows:
21+
22+ ``` yaml
23+ name : myproject
24+ type : directory
25+ children :
26+ - name : main.py
27+ type : file
28+ content : |
29+ def hello():
30+ print("Hello, World!")
31+ - name : utils/
32+ type : directory
33+ children :
34+ - name : helpers.py
35+ type : file
36+ content : |
37+ def add(a, b):
38+ return a + b
39+ ` ` `
40+
41+ ## Usage
42+
43+ ` ` ` bash
44+ treemapper # current dir, YAML to stdout
45+ treemapper . # YAML to stdout + token count
46+ treemapper . -o tree.yaml # save to file
47+ treemapper . -o # save to tree.yaml (default)
48+ treemapper . -o - # explicit stdout output
49+ treemapper . -f json # JSON format
50+ treemapper . -f txt # plain text with indentation
51+ treemapper . -f md # Markdown with fenced code
52+ treemapper . -f yml # YAML (alias)
53+ treemapper . --no-content # structure only
54+ treemapper . --max-depth 3 # limit depth (0=root, 1=children)
55+ treemapper . --max-file-bytes 10000 # skip files > 10KB (default: 10 MB)
56+ treemapper . --max-file-bytes 0 # no limit
57+ treemapper . -i custom.ignore # custom ignore patterns
58+ treemapper . --no-default-ignores # disable .gitignore + defaults
59+ treemapper . --log-level info # log level (default: error)
60+ treemapper . -c # copy to clipboard
61+ treemapper . -c -o tree.yaml # clipboard + save to file
62+ treemapper -v # show version
63+ ```
64+
65+ ## Diff Context Mode
66+
67+ Smart context selection for git diffs — automatically finds the
68+ minimal set of code fragments needed to understand a change:
69+
70+ ``` bash
71+ treemapper . --diff HEAD~1..HEAD # recent changes
72+ treemapper . --diff main..feature # feature branch
73+ treemapper . --diff HEAD~1 --budget 30000 # limit tokens
74+ treemapper . --diff HEAD~1 --full # all changed code
75+ ```
76+
77+ Uses graph-based relevance propagation (Personalized PageRank)
78+ to select the most important context. Output size is controlled
79+ by algorithm convergence (τ-stopping) by default, or an explicit
80+ ` --budget ` token limit. Understands imports, type references,
81+ config dependencies, and co-change patterns across 15+
82+ programming languages.
83+
84+ Output format:
85+
86+ ``` yaml
87+ name : myproject
88+ type : diff_context
89+ fragment_count : 5
90+ fragments :
91+ - path : src/main.py
92+ lines : " 10-25"
93+ kind : function
94+ symbol : process_data
95+ content : |
96+ def process_data(items):
97+ ...
98+ ` ` `
99+
100+ Options:
101+
102+ | Flag | Default | Description |
103+ |------------|---------------|------------------------------------------------|
104+ | ` --budget` | none | Token limit (convergence-based by default) |
105+ | `--alpha` | 0.60 | PPR damping factor |
106+ | `--tau` | 0.08 | Stopping threshold |
107+ | `--full` | false | Include all changed code |
108+
109+ # # Token Counting
110+
111+ Token count and size are always displayed on stderr :
112+
113+ ` ` ` text
114+ 12,847 tokens (o200k_base), 52.3 KB
115+ ` ` `
116+
117+ For large outputs (>1MB), approximate counts with `~` prefix :
118+
119+ ` ` ` text
120+ ~125,000 tokens (o200k_base), 5.2 MB
121+ ` ` `
122+
123+ Uses tiktoken with `o200k_base` encoding (GPT-4o tokenizer).
124+
125+ # # Clipboard Support
126+
127+ Copy output directly to clipboard with `-c` or `--copy` :
128+
129+ ` ` ` bash
130+ treemapper . -c # copy (no stdout)
131+ treemapper . -c -o tree.yaml # copy + save to file
132+ ` ` `
133+
134+ **System Requirements:**
135+
136+ - **macOS:** `pbcopy` (pre-installed)
137+ - **Windows:** `clip` (pre-installed)
138+ - **Linux (Wayland):** `wl-copy`
139+ - **Linux (X11):** `xclip` or `xsel`
140+
141+ # # Python API
142+
143+ ` ` ` python
144+ from treemapper import map_directory
145+ from treemapper import to_yaml, to_json, to_text, to_markdown
146+
147+ tree = map_directory(
148+ path, # directory path
149+ max_depth=None, # limit traversal depth
150+ no_content=False, # exclude file contents
151+ max_file_bytes=None, # skip large files
152+ ignore_file=None, # custom ignore file
153+ no_default_ignores=False,# disable default ignores
154+ )
155+
156+ yaml_str = to_yaml(tree)
157+ json_str = to_json(tree)
158+ text_str = to_text(tree)
159+ md_str = to_markdown(tree)
160+ ` ` `
161+
162+ # # Ignore Patterns
163+
164+ Respects `.gitignore` and `.treemapperignore` automatically.
165+ Use `--no-default-ignores` to disable all ignore processing
166+ (`.gitignore`, `.treemapperignore`, and built-in defaults).
167+
168+ - Hierarchical : nested ignore files at each directory level
169+ - Negation patterns : ` !important.log` un-ignores a file
170+ - Anchored patterns : ` /root_only.txt` matches only in root
171+ - Output file is always auto-ignored
172+
173+ # # Content Placeholders
174+
175+ - `<file too large : N bytes>` — exceeds `--max-file-bytes`
176+ - `<binary file : N bytes>` — binary file detected
177+ - `<unreadable content : not utf-8>` — not valid UTF-8
178+ - ` <unreadable content>` — permission denied or I/O error
4179
5180# # Development
6181
7182` ` ` bash
8- pip install -e " .[dev]"
183+ pip install -e ".[dev,tree-sitter ]"
9184pytest
10185pre-commit run --all-files
11186` ` `
@@ -28,6 +203,8 @@ noise, catching regressions in relevance filtering. Each garbage
28203file uses unique prefixed identifiers (e.g. `GARBAGE_*`) so leaks
29204are unambiguously detectable.
30205
206+ ---
207+
31208# # Two Modes of Operation
32209
33210TreeMapper operates in two fundamentally different modes that
@@ -119,7 +296,10 @@ The engine operates as a 7-stage pipeline:
119296 fragments maximizing density (marginal utility per token). Core
120297 fragments are selected first, then expansion candidates ordered
121298 by a max-heap. A τ-based stopping threshold (relative to
122- baseline density median) prevents noise accumulation.
299+ baseline density median) prevents noise accumulation. When no
300+ explicit `--budget` is set, τ-stopping alone controls output
301+ size — the algorithm converges naturally without a hard token
302+ cap.
123303
124304# ## Edge Taxonomy: Six Perspectives on Code Relationships
125305
@@ -200,7 +380,7 @@ broader inclusion.
200380
201381Files are decomposed into semantic fragments using a
202382priority-ordered parser pipeline. Language-specific parsers
203- (tree-sitter for 13+ languages, Python AST, Mistune for Markdown)
383+ (tree-sitter for 10 languages, Python AST, Mistune for Markdown)
204384produce function/class/section-level fragments. Fallback parsers
205385handle config files (key-value boundaries), text (sentence-aware
206386splitting), and generic content (line-count limits). The
@@ -238,12 +418,12 @@ without letting them dominate.
238418
239419# ## Tunable Parameters
240420
241- | Parameter | Default | Controls |
242- | ------------| ---------| -----------------------------------|
243- | ` --budget ` | 50000 | Maximum output tokens |
244- | ` --alpha ` | 0.60 | PPR damping — broader propagation |
245- | ` --tau ` | 0.08 | Stopping — stricter = less noise |
246- | ` --full ` | false | Bypass smart selection |
421+ | Parameter | Default | Controls |
422+ |------------|---------------|------------- -----------------------------------|
423+ | `--budget` | none | Token limit (convergence-based by default) |
424+ | `--alpha` | 0.60 | PPR damping — broader propagation |
425+ | `--tau` | 0.08 | Stopping — stricter = less noise |
426+ | `--full` | false | Bypass smart selection |
247427
248428---
249429
@@ -254,7 +434,7 @@ without letting them dominate.
254434| Output | YAML | LLM-readable, literal blocks |
255435| Tokens | tiktoken o200k | GPT-4o standard, exact BPE |
256436| Ignores | pathspec | gitignore-compatible |
257- | Parsing | tree-sitter | 13+ languages, AST-level |
437+ | Parsing | tree-sitter | 10 languages, AST-level |
258438| Ranking | PPR | Relevance with natural decay |
259439| Selection | Lazy greedy | Near-optimal, linear time |
260440| Git | subprocess UTF-8 | Platform-safe, non-ASCII |
0 commit comments