Skip to content

Commit e26a15b

Browse files
committed
fix: optional tree-sitter, untracked files in diff, binary read optimization
1 parent bfe070b commit e26a15b

13 files changed

Lines changed: 478 additions & 377 deletions

File tree

CLAUDE.md

Lines changed: 191 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,186 @@
11
# TreeMapper
22

3-
> Extends [../CLAUDE.md](../CLAUDE.md)
3+
<!-- Extends ../CLAUDE.md -->
4+
5+
[![PyPI](https://img.shields.io/pypi/v/treemapper)](https://pypi.org/project/treemapper/)
6+
[![Downloads](https://img.shields.io/pypi/dm/treemapper)](https://pypi.org/project/treemapper/)
7+
[![License](https://img.shields.io/github/license/nikolay-e/treemapper)](https://github.com/nikolay-e/treemapper/blob/main/LICENSE)
8+
9+
**Export your codebase for AI/LLM context in one command.**
10+
11+
```bash
12+
pip install treemapper # core (no native extensions)
13+
pip install 'treemapper[tree-sitter]' # + AST parsing for 10 languages
14+
treemapper . -o context.yaml # paste into ChatGPT/Claude
15+
```
16+
17+
## Why TreeMapper?
18+
19+
Unlike `tree` or `find`, TreeMapper exports **structure + file
20+
contents** in a format optimized for LLM context windows:
21+
22+
```yaml
23+
name: myproject
24+
type: directory
25+
children:
26+
- name: main.py
27+
type: file
28+
content: |
29+
def hello():
30+
print("Hello, World!")
31+
- name: utils/
32+
type: directory
33+
children:
34+
- name: helpers.py
35+
type: file
36+
content: |
37+
def add(a, b):
38+
return a + b
39+
```
40+
41+
## Usage
42+
43+
```bash
44+
treemapper # current dir, YAML to stdout
45+
treemapper . # YAML to stdout + token count
46+
treemapper . -o tree.yaml # save to file
47+
treemapper . -o # save to tree.yaml (default)
48+
treemapper . -o - # explicit stdout output
49+
treemapper . -f json # JSON format
50+
treemapper . -f txt # plain text with indentation
51+
treemapper . -f md # Markdown with fenced code
52+
treemapper . -f yml # YAML (alias)
53+
treemapper . --no-content # structure only
54+
treemapper . --max-depth 3 # limit depth (0=root, 1=children)
55+
treemapper . --max-file-bytes 10000 # skip files > 10KB (default: 10 MB)
56+
treemapper . --max-file-bytes 0 # no limit
57+
treemapper . -i custom.ignore # custom ignore patterns
58+
treemapper . --no-default-ignores # disable .gitignore + defaults
59+
treemapper . --log-level info # log level (default: error)
60+
treemapper . -c # copy to clipboard
61+
treemapper . -c -o tree.yaml # clipboard + save to file
62+
treemapper -v # show version
63+
```
64+
65+
## Diff Context Mode
66+
67+
Smart context selection for git diffs — automatically finds the
68+
minimal set of code fragments needed to understand a change:
69+
70+
```bash
71+
treemapper . --diff HEAD~1..HEAD # recent changes
72+
treemapper . --diff main..feature # feature branch
73+
treemapper . --diff HEAD~1 --budget 30000 # limit tokens
74+
treemapper . --diff HEAD~1 --full # all changed code
75+
```
76+
77+
Uses graph-based relevance propagation (Personalized PageRank)
78+
to select the most important context. Output size is controlled
79+
by algorithm convergence (τ-stopping) by default, or an explicit
80+
`--budget` token limit. Understands imports, type references,
81+
config dependencies, and co-change patterns across 15+
82+
programming languages.
83+
84+
Output format:
85+
86+
```yaml
87+
name: myproject
88+
type: diff_context
89+
fragment_count: 5
90+
fragments:
91+
- path: src/main.py
92+
lines: "10-25"
93+
kind: function
94+
symbol: process_data
95+
content: |
96+
def process_data(items):
97+
...
98+
```
99+
100+
Options:
101+
102+
| Flag | Default | Description |
103+
|------------|---------------|------------------------------------------------|
104+
| `--budget` | none | Token limit (convergence-based by default) |
105+
| `--alpha` | 0.60 | PPR damping factor |
106+
| `--tau` | 0.08 | Stopping threshold |
107+
| `--full` | false | Include all changed code |
108+
109+
## Token Counting
110+
111+
Token count and size are always displayed on stderr:
112+
113+
```text
114+
12,847 tokens (o200k_base), 52.3 KB
115+
```
116+
117+
For large outputs (>1MB), approximate counts with `~` prefix:
118+
119+
```text
120+
~125,000 tokens (o200k_base), 5.2 MB
121+
```
122+
123+
Uses tiktoken with `o200k_base` encoding (GPT-4o tokenizer).
124+
125+
## Clipboard Support
126+
127+
Copy output directly to clipboard with `-c` or `--copy`:
128+
129+
```bash
130+
treemapper . -c # copy (no stdout)
131+
treemapper . -c -o tree.yaml # copy + save to file
132+
```
133+
134+
**System Requirements:**
135+
136+
- **macOS:** `pbcopy` (pre-installed)
137+
- **Windows:** `clip` (pre-installed)
138+
- **Linux (Wayland):** `wl-copy`
139+
- **Linux (X11):** `xclip` or `xsel`
140+
141+
## Python API
142+
143+
```python
144+
from treemapper import map_directory
145+
from treemapper import to_yaml, to_json, to_text, to_markdown
146+
147+
tree = map_directory(
148+
path, # directory path
149+
max_depth=None, # limit traversal depth
150+
no_content=False, # exclude file contents
151+
max_file_bytes=None, # skip large files
152+
ignore_file=None, # custom ignore file
153+
no_default_ignores=False,# disable default ignores
154+
)
155+
156+
yaml_str = to_yaml(tree)
157+
json_str = to_json(tree)
158+
text_str = to_text(tree)
159+
md_str = to_markdown(tree)
160+
```
161+
162+
## Ignore Patterns
163+
164+
Respects `.gitignore` and `.treemapperignore` automatically.
165+
Use `--no-default-ignores` to disable all ignore processing
166+
(`.gitignore`, `.treemapperignore`, and built-in defaults).
167+
168+
- Hierarchical: nested ignore files at each directory level
169+
- Negation patterns: `!important.log` un-ignores a file
170+
- Anchored patterns: `/root_only.txt` matches only in root
171+
- Output file is always auto-ignored
172+
173+
## Content Placeholders
174+
175+
- `<file too large: N bytes>` — exceeds `--max-file-bytes`
176+
- `<binary file: N bytes>` — binary file detected
177+
- `<unreadable content: not utf-8>` — not valid UTF-8
178+
- `<unreadable content>` — permission denied or I/O error
4179

5180
## Development
6181

7182
```bash
8-
pip install -e ".[dev]"
183+
pip install -e ".[dev,tree-sitter]"
9184
pytest
10185
pre-commit run --all-files
11186
```
@@ -28,6 +203,8 @@ noise, catching regressions in relevance filtering. Each garbage
28203
file uses unique prefixed identifiers (e.g. `GARBAGE_*`) so leaks
29204
are unambiguously detectable.
30205

206+
---
207+
31208
## Two Modes of Operation
32209

33210
TreeMapper operates in two fundamentally different modes that
@@ -119,7 +296,10 @@ The engine operates as a 7-stage pipeline:
119296
fragments maximizing density (marginal utility per token). Core
120297
fragments are selected first, then expansion candidates ordered
121298
by a max-heap. A τ-based stopping threshold (relative to
122-
baseline density median) prevents noise accumulation.
299+
baseline density median) prevents noise accumulation. When no
300+
explicit `--budget` is set, τ-stopping alone controls output
301+
size — the algorithm converges naturally without a hard token
302+
cap.
123303

124304
### Edge Taxonomy: Six Perspectives on Code Relationships
125305

@@ -200,7 +380,7 @@ broader inclusion.
200380

201381
Files are decomposed into semantic fragments using a
202382
priority-ordered parser pipeline. Language-specific parsers
203-
(tree-sitter for 13+ languages, Python AST, Mistune for Markdown)
383+
(tree-sitter for 10 languages, Python AST, Mistune for Markdown)
204384
produce function/class/section-level fragments. Fallback parsers
205385
handle config files (key-value boundaries), text (sentence-aware
206386
splitting), and generic content (line-count limits). The
@@ -238,12 +418,12 @@ without letting them dominate.
238418

239419
### Tunable Parameters
240420

241-
| Parameter | Default | Controls |
242-
|------------|---------|-----------------------------------|
243-
| `--budget` | 50000 | Maximum output tokens |
244-
| `--alpha` | 0.60 | PPR damping — broader propagation |
245-
| `--tau` | 0.08 | Stopping — stricter = less noise |
246-
| `--full` | false | Bypass smart selection |
421+
| Parameter | Default | Controls |
422+
|------------|---------------|------------------------------------------------|
423+
| `--budget` | none | Token limit (convergence-based by default) |
424+
| `--alpha` | 0.60 | PPR damping — broader propagation |
425+
| `--tau` | 0.08 | Stopping — stricter = less noise |
426+
| `--full` | false | Bypass smart selection |
247427

248428
---
249429

@@ -254,7 +434,7 @@ without letting them dominate.
254434
| Output | YAML | LLM-readable, literal blocks |
255435
| Tokens | tiktoken o200k | GPT-4o standard, exact BPE |
256436
| Ignores | pathspec | gitignore-compatible |
257-
| Parsing | tree-sitter | 13+ languages, AST-level |
437+
| Parsing | tree-sitter | 10 languages, AST-level |
258438
| Ranking | PPR | Relevance with natural decay |
259439
| Selection | Lazy greedy | Near-optimal, linear time |
260440
| Git | subprocess UTF-8 | Platform-safe, non-ASCII |

0 commit comments

Comments
 (0)