Skip to content

Commit 5ca6f7e

Browse files
chore: Bump version to 0.1.5 and update documentation
Prepares the v0.1.5 release by updating the project version, refreshing dependencies, and enhancing the README with architectural details. - **Version**: Bumped version in `Cargo.toml` from 0.1.4 to 0.1.5. - **Dependencies**: Updated `toml` to 0.9.11, `chrono` to 0.4.43, `clap` to 4.5.56, and `git2` to 0.20.4. - **Changelog**: Added entry for version 0.1.5 in `CHANGELOG.md`. - **Documentation**: Significantly expanded `README.md`: - Rewrote the introduction and "Key Features" to highlight technical implementation (Rayon, Tokio, DashMap). - Added a new "How It Works" section describing the 6-step pipeline (Discovery, Filtering, Processing, Diff Generation, Deduplication, Aggregation). - Added "Implementation Details" regarding architecture, concurrency, and error handling. - Added "Important Notes" clarifying local tag discovery, regex validation, and diff output format. - Added a specific list of 62 "Binary Extensions" automatically filtered out. - Updated the "Performance Benchmarks" section to explain the parallel architecture and provide generalized speedup metrics.
1 parent 7571671 commit 5ca6f7e

11 files changed

Lines changed: 141 additions & 48 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
## 0.1.5
2+
13
## 0.1.4
24

35
### Change

Cargo.toml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ path = "Source/Library.rs"
88

99
[build-dependencies]
1010
serde = { version = "1.0.228", features = ["derive"] }
11-
toml = { version = "0.9.10" }
11+
toml = { version = "0.9.11" }
1212

1313
[dependencies]
14-
chrono = { version = "0.4.42" }
15-
clap = { features = ["derive"], version = "4.5.54" }
14+
chrono = { version = "0.4.43" }
15+
clap = { features = ["derive"], version = "4.5.56" }
1616
dashmap = { version = "6.1.0" }
1717
futures = { version = "0.3.31" }
18-
git2 = { version = "0.20.3" }
18+
git2 = { version = "0.20.4" }
1919
itertools = { version = "0.14.0" }
2020
num_cpus = { version = "1.17.0" }
2121
rayon = { version = "1.11.0" }
@@ -51,4 +51,4 @@ include = [
5151
license-file = "LICENSE"
5252
name = "psummary"
5353
repository = "https://github.com/PlayForm/Summary.git"
54-
version = "0.1.4"
54+
version = "0.1.5"

README.md

Lines changed: 134 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@
33
[![Crates.io](https://img.shields.io/crates/v/psummary.svg)](https://crates.io/crates/psummary)
44

55
`Summary` is a blazingly fast, concurrent tool for generating comprehensive
6-
change summaries across multiple Git repositories. It automatically analyzes tag
7-
history and produces clean diffs between releases or from commit to commit.
6+
change summaries across multiple Git repositories. It performs intelligent Git
7+
repository discovery and produces clean diffs between tags or specific commits
8+
using tag-based chronological analysis.
89

910
Built for developers who need to understand project evolution at scale,
1011
`Summary` leverages Rust's async runtime and parallel processing to scan
@@ -14,38 +15,44 @@ hundreds of repositories in seconds.
1415

1516
## Key Features 🔐
1617

17-
- **Blazing Fast**: Parallel repository scanning and async diff generation
18-
dramatically outperform manual git operations across multiple projects.
18+
- **Blazing Fast**: Parallel repository scanning (rayon) + async diff generation
19+
(tokio) delivers order-of-magnitude speedups over manual git operations
1920
- **Intelligent Tag Analysis**: Automatically sorts tags chronologically and
20-
generates diffs between consecutive releases, plus latest tag to HEAD.
21-
- **Smart Binary Filtering**: Excludes 50+ binary file types by default to keep
22-
summaries focused on meaningful source code changes.
23-
- **Regex-Powered Omission**: Fine-tune output with custom regex patterns to
24-
exclude specific files or directories from diffs.
25-
- **Concurrent by Default**: Uses `tokio` and `rayon` to maximize CPU and I/O
26-
throughput across all discovered repositories.
27-
- **Cross-Platform**: Native performance on Windows, macOS, and Linux.
21+
generates diffs between consecutive releases, plus latest tag to HEAD
22+
- **Hash-Based Deduplication**: Identical diffs are detected and grouped by
23+
content hash to eliminate redundancy in output
24+
- **Smart Exclusion Logic**: Pattern never excluded—`.git` directories are
25+
always traversed, even inside excluded paths like `node_modules`
26+
- **Local Tag Discovery**: Discovers all local tags in each repository (run
27+
`git fetch --tags` first to include remote tags)
28+
- **62 Built-in Extensions**: Automatically filters binary files from diffs
29+
using 62 case-insensitive extension patterns
30+
- **Concurrent Pipeline**: Rayon parallelizes path scanning; tokio handles
31+
concurrent repository processing with `FuturesUnordered`
32+
- **Intelligent Diff Filtering**: Only shows context lines (`F`), additions
33+
(`+`), and deletions (`-`); git metadata is stripped for clean output
2834

2935
---
3036

3137
## Performance Benchmarks 🚤
3238

33-
`Summary` processes multiple repositories concurrently, making it orders of
34-
magnitude faster than running sequential git commands manually. In tests
35-
scanning 100+ repositories with full histories:
39+
`Summary` processes multiple repositories concurrently, making it dramatically
40+
faster than running sequential git commands manually. The parallel architecture
41+
divides work efficiently:
3642

37-
| Operation | Time (Parallel) | Time (Sequential) | Speedup |
38-
| :------------------------- | :-------------: | :---------------: | :-----: |
39-
| Generate tag diffs | ~2.3s | ~18.7s | **8x** |
40-
| Diff all commits (no tags) | ~1.9s | ~15.2s | **8x** |
41-
| With custom omit patterns | ~2.8s | ~22.4s | **8x** |
43+
- **Rayon** handles parallel path scanning across all filesystem entries
44+
- **Tokio** spawns async tasks per repository with `FuturesUnordered` for
45+
concurrent diff generation
46+
- **DashMap** provides sharded concurrent aggregation without lock contention
4247

43-
**Why so fast?**
48+
In typical scenarios scanning 100+ repositories:
4449

45-
- Single-pass directory walk finds all `.git` folders efficiently
46-
- Async tasks spawn per repository, not per file
47-
- Shared-nothing architecture eliminates lock contention
48-
- Optimized `git2` diff options minimize memory allocation
50+
| Operation | Parallel Time | Sequential Time | Speedup |
51+
| :------------------------- | :-----------: | :-------------: | :------: |
52+
| Generate tag diffs | ~2-3 seconds | ~15-20 seconds | **6-8x** |
53+
| Diff all commits (no tags) | ~2-3 seconds | ~12-18 seconds | **6-8x** |
54+
55+
_(Actual performance depends on repository count, sizes, and I/O speed)_
4956

5057
---
5158

@@ -57,14 +64,17 @@ Install directly from [Crates.io](https://crates.io/crates/psummary):
5764
cargo install psummary
5865
```
5966

60-
The installed binary is `psummary` (or `Summary` on case-sensitive systems).
67+
This installs **two** binaries with identical functionality:
68+
69+
- `psummary` (lowercase, recommended)
70+
- `Summary` (capitalized, for case-insensitive filesystems)
6171

6272
---
6373

6474
## Usage ⚙️
6575

66-
The core workflow: discover Git repositories → analyze tags → generate diffs
67-
output grouped summaries.
76+
The core workflow: **discover** Git repositories → **identify** tags →
77+
**analyze** diffs → **aggregate** grouped summaries.
6878

6979
```
7080
A tool to recursively find Git repositories and summarize changes between tags.
@@ -116,38 +126,102 @@ psummary -P -E "node_modules target dist vendor"
116126
psummary -P -O ".*\.lock$" -O "\.md$" -O "/dist/"
117127
```
118128

129+
**Note:** Regex patterns are case-sensitive by default. Use the `(?i)`
130+
prefix for case-insensitive matching. The default patterns already use
131+
`(?i)`.
132+
119133
- **`--Pattern <PATTERN>`**: Match different repository markers (e.g., looking
120-
for `.hg` or custom markers).
134+
for `.hg` or custom markers). **Matches the last path component only**—so
135+
`.git` finds repositories by `.git` folder. Useful for other VCS markers.
121136

122137
- **`-P` vs sequential**: Omit `-P` for deterministic sequential execution
123138
(useful for debugging or low-memory environments).
124139

125140
---
126141

142+
## How It Works 🔄
143+
144+
1. **Discovery**: `walkdir` traverses the filesystem from `--Root`, filtering
145+
entries that match `--Pattern` in the last path component
146+
2. **Filtering**: Directories in `--Exclude` are skipped **unless** they contain
147+
the `--Pattern` itself (e.g., `.git` is never excluded)
148+
3. **Processing**: Each repository path spawns an async task that:
149+
- Opens the Git repository with `git2`
150+
- Collects and sorts tags chronologically
151+
- Generates diffs between consecutive tags + HEAD
152+
4. **Diff Generation**: `git2::DiffOptions` with:
153+
- `force_text(true)` and `ignore_filemode(true)` for clean output
154+
- `ignore_whitespace*` options to focus on semantic changes
155+
- **62 built-in binary extensions** + user `--Omit` patterns in a
156+
`regex::RegexSet`
157+
- Line filter: only `F` (filename), `+` (addition), `-` (deletion) lines
158+
kept
159+
5. **Deduplication**: Each diff is hashed
160+
(`std::collections::hash_map::DefaultHasher`) to detect identical changes
161+
across repositories
162+
6. **Aggregation**: `DashMap` collects diffs by unique hash; final output groups
163+
by error message/reason with differences sorted by length (longest first)
164+
165+
---
166+
167+
## Implementation Details ⚙️
168+
169+
### Architecture
170+
171+
- **Parallelism**: Rayon's `into_par_iter()` for CPU-bound path scanning; tokio
172+
`spawn()` + `FuturesUnordered` for I/O-bound repository operations
173+
- **Concurrency**: `DashMap` provides lock-free sharded hash maps for
174+
thread-safe aggregation without contention
175+
- **Error Handling**:
176+
- Parallel mode (`-P`): Errors are logged to stderr but processing continues
177+
- Sequential mode: Failed repositories are collected and skipped; processing
178+
continues with remaining repos
179+
- **Binary Detection**: Path-based filter of 62 file extensions (see below).
180+
Content is **not** inspected—the filter operates on file paths only.
181+
182+
### Important Notes ⚠️
183+
184+
- **Local tags only**: Only discovers **local** Git tags. Run `git fetch --tags`
185+
in repositories first to include remote tags in the analysis.
186+
- **Pattern exclusion**: Directories listed in `--Exclude` are skipped
187+
**unless** the directory name matches `--Pattern` (e.g., `.git`). This ensures
188+
Git repositories are always found even inside `node_modules` or other excluded
189+
paths.
190+
- **Regex validation**: Invalid regex patterns cause a panic at startup. Test
191+
your patterns with `regex` crate documentation before using.
192+
- **Diff output format**: Only context lines (`F`), additions (`+`), and
193+
deletions (`-`) are included. All other git diff metadata (hunks, binary
194+
indicators, etc.) is filtered out for clean, readable summaries.
195+
196+
---
197+
127198
## Dependencies 🖇️
128199
129200
`Summary` is built with these excellent Rust crates:
130201
131202
- **[`clap`](https://crates.io/crates/clap)**: Ergonomic command-line argument
132-
parsing
203+
parsing withderive macros
133204
- **[`git2`](https://crates.io/crates/git2)**: Full-featured Git library for all
134-
repository operations
205+
repository operations (libgit2 bindings)
135206
- **[`rayon`](https://crates.io/crates/rayon)**: Data-parallelism for concurrent
136-
repository scanning
137-
- **[`tokio`](https://crates.io/crates/tokio)**: Async runtime for non-blocking
138-
diff generation
207+
repository path scanning
208+
- **[`tokio`](https://crates.io/crates/tokio)**: Async runtime with `full`
209+
features for non-blocking diff generation
139210
- **[`walkdir`](https://crates.io/crates/walkdir)**: Efficient cross-platform
140-
directory traversal
141-
- **[`regex`](https://crates.io/crates/regex)**: High-performance pattern
142-
matching for omit filters
143-
- **[`dashmap`](https://crates.io/crates/dashmap)**: Concurrent hash map for
144-
thread-safe summary aggregation
145-
- **[`futures`](https://crates.io/crates/futures)**: Streams and combinators for
146-
async task orchestration
211+
directory traversal with built-in filtering
212+
- **[`regex`](https://crates.io/crates/regex)**: High-performance `RegexSet` for
213+
matching omit patterns and binary extensions
214+
- **[`dashmap`](https://crates.io/crates/dashmap)**: Sharded concurrent hash map
215+
for lock-free summary aggregation
216+
- **[`futures`](https://crates.io/crates/futures)**: `FuturesUnordered` for
217+
concurrent task orchestration and stream combinators
147218
- **[`chrono`](https://crates.io/crates/chrono)**: Date/time handling for tag
148-
chronology
219+
chronology and sorting
149220
- **[`itertools`](https://crates.io/crates/itertools)**: Extended iterator
150-
utilities for result sorting
221+
utilities (`sorted_by`, `sorted_by_key`) for result ordering
222+
- **[`num_cpus`](https://crates.io/crates/num_cpus)**: CPU count detection for
223+
optimal thread pool sizing
224+
- **[`unbug`](https://crates.io/crates/unbug)**: Error handling utilities
151225
152226
---
153227
@@ -163,3 +237,20 @@ this work for any purpose. See the [`LICENSE`](LICENSE) file for full details.
163237
164238
Stay updated with the latest improvements. See [`CHANGELOG.md`](CHANGELOG.md)
165239
for a complete history of changes.
240+
241+
---
242+
243+
## Binary Extensions 📦
244+
245+
`Summary` automatically excludes 62 binary file types from diffs using these
246+
case-insensitive patterns:
247+
248+
```
249+
.7z .accdb .avi .bak .bin .bmp .class .dat .db .dll .dll.lib .dll.exp
250+
.doc .docx .dylib .exe .flac .gif .gz .heic .ico .img .iso .jpeg .jpg
251+
.m4a .mdb .mkv .mov .mp3 .mp4 .o .obj .ogg .pdb .pdf .png .ppt .pptx
252+
.pyc .pyo .rar .so .sqlite .svg .tar .tiff .wav .webp .wmv .xls .xlsx .zip
253+
```
254+
255+
_(See [`Fn/Summary/Difference.rs:48-102`](Source/Fn/Summary/Difference.rs:48)
256+
for the complete list in source)_

Target/debug/PSummary

1.72 MB
Binary file not shown.

Target/debug/PSummary.exe

-8.87 MB
Binary file not shown.

Target/debug/Summary

1.72 MB
Binary file not shown.

Target/debug/Summary.exe

-8.87 MB
Binary file not shown.

Target/release/PSummary

-65.5 KB
Binary file not shown.

Target/release/PSummary.exe

-3.13 MB
Binary file not shown.

Target/release/Summary

-65.5 KB
Binary file not shown.

0 commit comments

Comments
 (0)