33[ ![ Crates.io] ( https://img.shields.io/crates/v/psummary.svg )] ( https://crates.io/crates/psummary )
44
55` Summary ` is a blazingly fast, concurrent tool for generating comprehensive
6- change summaries across multiple Git repositories. It automatically analyzes tag
7- history and produces clean diffs between releases or from commit to commit.
6+ change summaries across multiple Git repositories. It performs intelligent Git
7+ repository discovery and produces clean diffs between tags or specific commits
8+ using tag-based chronological analysis.
89
910Built for developers who need to understand project evolution at scale,
1011` Summary ` leverages Rust's async runtime and parallel processing to scan
@@ -14,38 +15,44 @@ hundreds of repositories in seconds.
1415
1516## Key Features 🔐
1617
17- - ** Blazing Fast** : Parallel repository scanning and async diff generation
18- dramatically outperform manual git operations across multiple projects.
18+ - ** Blazing Fast** : Parallel repository scanning (rayon) + async diff generation
19+ (tokio) delivers order-of-magnitude speedups over manual git operations
1920- ** Intelligent Tag Analysis** : Automatically sorts tags chronologically and
20- generates diffs between consecutive releases, plus latest tag to HEAD.
21- - ** Smart Binary Filtering** : Excludes 50+ binary file types by default to keep
22- summaries focused on meaningful source code changes.
23- - ** Regex-Powered Omission** : Fine-tune output with custom regex patterns to
24- exclude specific files or directories from diffs.
25- - ** Concurrent by Default** : Uses ` tokio ` and ` rayon ` to maximize CPU and I/O
26- throughput across all discovered repositories.
27- - ** Cross-Platform** : Native performance on Windows, macOS, and Linux.
21+ generates diffs between consecutive releases, plus latest tag to HEAD
22+ - ** Hash-Based Deduplication** : Identical diffs are detected and grouped by
23+ content hash to eliminate redundancy in output
24+ - ** Smart Exclusion Logic** : Pattern never excluded—` .git ` directories are
25+ always traversed, even inside excluded paths like ` node_modules `
26+ - ** Local Tag Discovery** : Discovers all local tags in each repository (run
27+ ` git fetch --tags ` first to include remote tags)
28+ - ** 62 Built-in Extensions** : Automatically filters binary files from diffs
29+ using 62 case-insensitive extension patterns
30+ - ** Concurrent Pipeline** : Rayon parallelizes path scanning; tokio handles
31+ concurrent repository processing with ` FuturesUnordered `
32+ - ** Intelligent Diff Filtering** : Only shows context lines (` F ` ), additions
33+ (` + ` ), and deletions (` - ` ); git metadata is stripped for clean output
2834
2935---
3036
3137## Performance Benchmarks 🚤
3238
33- ` Summary ` processes multiple repositories concurrently, making it orders of
34- magnitude faster than running sequential git commands manually. In tests
35- scanning 100+ repositories with full histories :
39+ ` Summary ` processes multiple repositories concurrently, making it dramatically
40+ faster than running sequential git commands manually. The parallel architecture
41+ divides work efficiently :
3642
37- | Operation | Time (Parallel) | Time (Sequential) | Speedup |
38- | :------------------------- | :-------------: | :---------------: | :-----: |
39- | Generate tag diffs | ~ 2.3s | ~ 18.7s | ** 8x** |
40- | Diff all commits (no tags) | ~ 1.9s | ~ 15.2s | ** 8x** |
41- | With custom omit patterns | ~ 2.8s | ~ 22.4s | ** 8x** |
43+ - ** Rayon** handles parallel path scanning across all filesystem entries
44+ - ** Tokio** spawns async tasks per repository with ` FuturesUnordered ` for
45+ concurrent diff generation
46+ - ** DashMap** provides sharded concurrent aggregation without lock contention
4247
43- ** Why so fast? **
48+ In typical scenarios scanning 100+ repositories:
4449
45- - Single-pass directory walk finds all ` .git ` folders efficiently
46- - Async tasks spawn per repository, not per file
47- - Shared-nothing architecture eliminates lock contention
48- - Optimized ` git2 ` diff options minimize memory allocation
50+ | Operation | Parallel Time | Sequential Time | Speedup |
51+ | :------------------------- | :-----------: | :-------------: | :------: |
52+ | Generate tag diffs | ~ 2-3 seconds | ~ 15-20 seconds | ** 6-8x** |
53+ | Diff all commits (no tags) | ~ 2-3 seconds | ~ 12-18 seconds | ** 6-8x** |
54+
55+ _ (Actual performance depends on repository count, sizes, and I/O speed)_
4956
5057---
5158
@@ -57,14 +64,17 @@ Install directly from [Crates.io](https://crates.io/crates/psummary):
5764cargo install psummary
5865```
5966
60- The installed binary is ` psummary ` (or ` Summary ` on case-sensitive systems).
67+ This installs ** two** binaries with identical functionality:
68+
69+ - ` psummary ` (lowercase, recommended)
70+ - ` Summary ` (capitalized, for case-insensitive filesystems)
6171
6272---
6373
6474## Usage ⚙️
6575
66- The core workflow: discover Git repositories → analyze tags → generate diffs →
67- output grouped summaries.
76+ The core workflow: ** discover** Git repositories → ** identify ** tags →
77+ ** analyze ** diffs → ** aggregate ** grouped summaries.
6878
6979```
7080A tool to recursively find Git repositories and summarize changes between tags.
@@ -116,38 +126,102 @@ psummary -P -E "node_modules target dist vendor"
116126 psummary -P -O " .*\.lock$" -O " \.md$" -O " /dist/"
117127 ```
118128
129+ ** Note:** Regex patterns are case-sensitive by default. Use the ` (? i)`
130+ prefix for case-insensitive matching. The default patterns already use
131+ ` (? i)` .
132+
119133- ** ` --Pattern < PATTERN> ` ** : Match different repository markers (e.g., looking
120- for ` .hg` or custom markers).
134+ for ` .hg` or custom markers). ** Matches the last path component only** —so
135+ ` .git` finds repositories by ` .git` folder. Useful for other VCS markers.
121136
122137- ** ` -P` vs sequential** : Omit ` -P` for deterministic sequential execution
123138 (useful for debugging or low-memory environments).
124139
125140---
126141
142+ # # How It Works 🔄
143+
144+ 1. ** Discovery** : ` walkdir` traverses the filesystem from ` --Root` , filtering
145+ entries that match ` --Pattern` in the last path component
146+ 2. ** Filtering** : Directories in ` --Exclude` are skipped ** unless** they contain
147+ the ` --Pattern` itself (e.g., ` .git` is never excluded)
148+ 3. ** Processing** : Each repository path spawns an async task that:
149+ - Opens the Git repository with ` git2`
150+ - Collects and sorts tags chronologically
151+ - Generates diffs between consecutive tags + HEAD
152+ 4. ** Diff Generation** : ` git2::DiffOptions` with:
153+ - ` force_text(true)` and ` ignore_filemode(true)` for clean output
154+ - ` ignore_whitespace* ` options to focus on semantic changes
155+ - ** 62 built-in binary extensions** + user ` --Omit` patterns in a
156+ ` regex::RegexSet`
157+ - Line filter: only ` F` (filename), ` +` (addition), ` -` (deletion) lines
158+ kept
159+ 5. ** Deduplication** : Each diff is hashed
160+ (` std::collections::hash_map::DefaultHasher` ) to detect identical changes
161+ across repositories
162+ 6. ** Aggregation** : ` DashMap` collects diffs by unique hash ; final output groups
163+ by error message/reason with differences sorted by length (longest first)
164+
165+ ---
166+
167+ # # Implementation Details ⚙️
168+
169+ # ## Architecture
170+
171+ - ** Parallelism** : Rayon' s `into_par_iter()` for CPU-bound path scanning; tokio
172+ `spawn()` + `FuturesUnordered` for I/O-bound repository operations
173+ - **Concurrency**: `DashMap` provides lock-free sharded hash maps for
174+ thread-safe aggregation without contention
175+ - **Error Handling**:
176+ - Parallel mode (`-P`): Errors are logged to stderr but processing continues
177+ - Sequential mode: Failed repositories are collected and skipped; processing
178+ continues with remaining repos
179+ - **Binary Detection**: Path-based filter of 62 file extensions (see below).
180+ Content is **not** inspected—the filter operates on file paths only.
181+
182+ ### Important Notes ⚠️
183+
184+ - **Local tags only**: Only discovers **local** Git tags. Run `git fetch --tags`
185+ in repositories first to include remote tags in the analysis.
186+ - **Pattern exclusion**: Directories listed in `--Exclude` are skipped
187+ **unless** the directory name matches `--Pattern` (e.g., `.git`). This ensures
188+ Git repositories are always found even inside `node_modules` or other excluded
189+ paths.
190+ - **Regex validation**: Invalid regex patterns cause a panic at startup. Test
191+ your patterns with `regex` crate documentation before using.
192+ - **Diff output format**: Only context lines (`F`), additions (`+`), and
193+ deletions (`-`) are included. All other git diff metadata (hunks, binary
194+ indicators, etc.) is filtered out for clean, readable summaries.
195+
196+ ---
197+
127198## Dependencies 🖇️
128199
129200`Summary` is built with these excellent Rust crates:
130201
131202- **[`clap`](https://crates.io/crates/clap)**: Ergonomic command-line argument
132- parsing
203+ parsing withderive macros
133204- **[`git2`](https://crates.io/crates/git2)**: Full-featured Git library for all
134- repository operations
205+ repository operations (libgit2 bindings)
135206- **[`rayon`](https://crates.io/crates/rayon)**: Data-parallelism for concurrent
136- repository scanning
137- - ** [` tokio` ](https://crates.io/crates/tokio)** : Async runtime for non-blocking
138- diff generation
207+ repository path scanning
208+ - **[`tokio`](https://crates.io/crates/tokio)**: Async runtime with `full`
209+ features for non-blocking diff generation
139210- **[`walkdir`](https://crates.io/crates/walkdir)**: Efficient cross-platform
140- directory traversal
141- - ** [` regex` ](https://crates.io/crates/regex)** : High-performance pattern
142- matching for omit filters
143- - ** [` dashmap` ](https://crates.io/crates/dashmap)** : Concurrent hash map for
144- thread-safe summary aggregation
145- - ** [` futures` ](https://crates.io/crates/futures)** : Streams and combinators for
146- async task orchestration
211+ directory traversal with built-in filtering
212+ - **[`regex`](https://crates.io/crates/regex)**: High-performance `RegexSet` for
213+ matching omit patterns and binary extensions
214+ - **[`dashmap`](https://crates.io/crates/dashmap)**: Sharded concurrent hash map
215+ for lock-free summary aggregation
216+ - **[`futures`](https://crates.io/crates/futures)**: `FuturesUnordered` for
217+ concurrent task orchestration and stream combinators
147218- **[`chrono`](https://crates.io/crates/chrono)**: Date/time handling for tag
148- chronology
219+ chronology and sorting
149220- **[`itertools`](https://crates.io/crates/itertools)**: Extended iterator
150- utilities for result sorting
221+ utilities (`sorted_by`, `sorted_by_key`) for result ordering
222+ - **[`num_cpus`](https://crates.io/crates/num_cpus)**: CPU count detection for
223+ optimal thread pool sizing
224+ - **[`unbug`](https://crates.io/crates/unbug)**: Error handling utilities
151225
152226---
153227
@@ -163,3 +237,20 @@ this work for any purpose. See the [`LICENSE`](LICENSE) file for full details.
163237
164238Stay updated with the latest improvements. See [`CHANGELOG.md`](CHANGELOG.md)
165239for a complete history of changes.
240+
241+ ---
242+
243+ ## Binary Extensions 📦
244+
245+ `Summary` automatically excludes 62 binary file types from diffs using these
246+ case-insensitive patterns:
247+
248+ ```
249+ .7z .accdb .avi .bak .bin .bmp .class .dat .db .dll .dll.lib .dll.exp
250+ .doc .docx .dylib .exe .flac .gif .gz .heic .ico .img .iso .jpeg .jpg
251+ .m4a .mdb .mkv .mov .mp3 .mp4 .o .obj .ogg .pdb .pdf .png .ppt .pptx
252+ .pyc .pyo .rar .so .sqlite .svg .tar .tiff .wav .webp .wmv .xls .xlsx .zip
253+ ```
254+
255+ _(See [`Fn/Summary/Difference.rs:48-102`](Source/Fn/Summary/Difference.rs:48)
256+ for the complete list in source)_
0 commit comments