Skip to content

Commit 6ce1970

Browse files
committed
increase benchmark length
1 parent 738fab0 commit 6ce1970

2 files changed

Lines changed: 159 additions & 50 deletions

File tree

README.md

Lines changed: 34 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ The fastest zip/unzip library and CLI for Rust. Parallel compression and extract
1010
- **Atomic archive writes** -- compression writes to a tempfile, fsyncs, then renames; a crash mid-write never produces a corrupt archive
1111
- **Path traversal prevention** -- rejects `../` attacks, absolute paths, and Windows drive letters before any extraction begins
1212
- **ZIP64 support** -- automatic for >65,535 entries, >4 GB files, or >4 GB offsets
13-
- **Incompressible data detection** -- falls back to Stored when DEFLATE would inflate the data
13+
- **Zstd compression** -- Zstandard (method 93) as an alternative to DEFLATE, with full interop
14+
- **Incompressible data detection** -- falls back to Stored when compression would inflate the data
1415
- **Windows long path support** -- `\\?\` extended-length paths for paths exceeding MAX_PATH (260 chars)
1516
- **Adaptive memory management** -- dynamically sizes the in-memory compression threshold based on available system RAM (up to 400 MB budget), so small files stay in memory while large files stream through temp files
1617
- **Deterministic output** -- archives are byte-identical across runs (entries sorted by path)
@@ -25,24 +26,35 @@ The fastest zip/unzip library and CLI for Rust. Parallel compression and extract
2526

2627
| Scenario | Files | Data | ripzip | zip crate | Speedup |
2728
|----------|------:|-----:|-------:|----------:|--------:|
28-
| 10k small source files | 10,000 | 3 MB | 74 ms | 447 ms | **6.0x** |
29-
| 200 x 5 MB log files | 200 | 1 GB | 75 ms (13.3 GB/s) | 422 ms (2.4 GB/s) | **5.6x** |
30-
| 50 x 20 MB binary blobs | 50 | 1 GB | 68 ms (14.7 GB/s) | 435 ms (2.3 GB/s) | **6.4x** |
31-
| Mixed (5k src + 200 MB assets) | 5,020 | 202 MB | 115 ms (1.8 GB/s) | 332 ms (601 MB/s) | **2.9x** |
29+
| 50k small source files | 50,000 | 14 MB | 378ms (38 MB/s) | 2.40s (6 MB/s) | **6.3x** |
30+
| 500 x 10 MB log files | 500 | 5 GB | 488ms (10.2 GB/s) | 2.20s (2.3 GB/s) | **4.5x** |
31+
| 100 x 50 MB binary blobs | 100 | 5 GB | 214ms (23.4 GB/s) | 2.29s (2.2 GB/s) | **10.7x** |
32+
| Mixed (10k src + 1 GB assets) | 10,050 | 1 GB | 531ms (1.9 GB/s) | 1.04s (967 MB/s) | **2.0x** |
3233

3334
### Extraction
3435

3536
| Scenario | Files | Data | ripzip | zip crate | Speedup |
3637
|----------|------:|-----:|-------:|----------:|--------:|
37-
| 10k small source files | 10,000 | 3 MB | 4.39 s | 5.24 s | **1.2x** |
38-
| 200 x 5 MB log files | 200 | 1 GB | 128 ms (7.8 GB/s) | 763 ms (1.3 GB/s) | **6.0x** |
39-
| 50 x 20 MB binary blobs | 50 | 1 GB | 153 ms (6.5 GB/s) | 746 ms (1.3 GB/s) | **4.9x** |
40-
| Mixed (5k src + 200 MB assets) | 5,020 | 202 MB | 1.49 s (135 MB/s) | 2.21 s (91 MB/s) | **1.5x** |
38+
| 50k small source files | 50,000 | 14 MB | 27.47s (1 MB/s) | 33.73s (0 MB/s) | **1.2x** |
39+
| 500 x 10 MB log files | 500 | 5 GB | 1.13s (4.4 GB/s) | 3.68s (1.4 GB/s) | **3.3x** |
40+
| 100 x 50 MB binary blobs | 100 | 5 GB | 1.18s (4.2 GB/s) | 4.45s (1.1 GB/s) | **3.8x** |
41+
| Mixed (10k src + 1 GB assets) | 10,050 | 1 GB | 4.24s (237 MB/s) | 6.20s (162 MB/s) | **1.5x** |
4142

42-
**Takeaway:** ripzip compresses **2.9--6.4x faster** and extracts **1.2--6.0x faster** across all workloads. Speedup scales with individual file size -- the 1 GB log corpus sees the biggest wins because all 28 threads are saturated with real DEFLATE work. The 10k small files scenario is filesystem-metadata-bound, where parallelism still helps but the per-file overhead floor is higher.
43+
**Takeaway:** ripzip compresses **2.0--10.7x faster** and extracts **1.2--3.8x faster** across all workloads. Speedup scales with individual file size -- the 5 GB binary blob corpus sees the biggest compression wins (10.7x) because all 28 threads are saturated with real DEFLATE work on large chunks. The 50k small files scenario is filesystem-metadata-bound, where parallelism still helps but the per-file overhead floor is higher.
4344

4445
Archive sizes are identical between the two -- same DEFLATE algorithm, same compression level.
4546

47+
### Zstd vs Deflate (ripzip, both parallel, level 1)
48+
49+
| Scenario | Files | Data | Deflate | Zstd | Zstd speedup | Deflate archive | Zstd archive |
50+
|----------|------:|-----:|--------:|-----:|-------------:|----------------:|-------------:|
51+
| 50k small source files | 50,000 | 14 MB | 378ms (38 MB/s) | 1.10s (13 MB/s) | **0.3x** | 10 MB | 10 MB |
52+
| 500 x 10 MB log files | 500 | 5 GB | 488ms (10.2 GB/s) | 213ms (23.5 GB/s) | **2.3x** | 62 MB | 592 KB |
53+
| 100 x 50 MB binary blobs | 100 | 5 GB | 214ms (23.4 GB/s) | 163ms (30.7 GB/s) | **1.3x** | 64 MB | 495 KB |
54+
| Mixed (10k src + 1 GB assets) | 10,050 | 1 GB | 531ms (1.9 GB/s) | 645ms (1.6 GB/s) | **0.8x** | 36 MB | 24 MB |
55+
56+
**Takeaway:** Zstd achieves dramatically better compression ratios on large files (100x smaller archives for logs/blobs) while being comparable or faster for compression. On many small files, Deflate wins because Zstd's per-file initialization cost is higher. Extraction speeds are nearly identical -- both are I/O-bound at this level of parallelism.
57+
4658
### Run benchmarks yourself
4759

4860
```
@@ -63,11 +75,14 @@ use std::path::Path;
6375
use ripzip::{NoProgress, compress_directory, extract_to_directory};
6476

6577
// Compress a directory
78+
use ripzip::CompressionMethod;
79+
6680
compress_directory(
6781
Path::new("my_project/"),
6882
Path::new("my_project.zip"),
69-
1, // compression level (1=fastest, 9=smallest)
70-
&NoProgress, // or implement ProgressReporter for progress bars
83+
1, // compression level (1=fastest, 9=smallest)
84+
CompressionMethod::Deflate, // or CompressionMethod::Zstd
85+
&NoProgress, // or implement ProgressReporter for progress bars
7186
)?;
7287

7388
// Extract an archive
@@ -112,15 +127,15 @@ cargo install --path ripzip-cli
112127
```
113128

114129
```
115-
ripzip compress <DIR> -o <FILE> [--level 1-9] [--quiet]
130+
ripzip compress <DIR> -o <FILE> [--level 1-9] [--method deflate|zstd] [--quiet]
116131
ripzip extract <ARCHIVE> [-o <DIR>] [--quiet]
117132
ripzip list <ARCHIVE> [--verbose]
118133
```
119134

120135
Aliases: `c`, `x`, `l`.
121136

122137
```
123-
$ ripzip compress my_project/ -o my_project.zip
138+
$ ripzip compress my_project/ -o my_project.zip --method zstd
124139
[00:00:00] [####################################] 142.3MB/142.3MB (1.8GB/s)
125140
Created my_project.zip
126141
@@ -144,7 +159,7 @@ Compressed Original Method Name
144159
3. **Path traversal prevention** -- all archive paths are validated before any extraction. Paths containing `..`, absolute paths, and Windows drive letters are rejected.
145160
4. **ZIP64** -- automatically used when entry counts exceed 65,535, file sizes exceed 4 GB, or offsets exceed 4 GB.
146161
5. **fsync before rename** -- data is flushed to disk before the atomic rename, ensuring durability.
147-
6. **Incompressible data detection** -- if DEFLATE produces output larger than the input, the file is stored uncompressed.
162+
6. **Incompressible data detection** -- if compression produces output larger than the input, the file is stored uncompressed.
148163

149164
## Architecture
150165

@@ -153,7 +168,7 @@ Compressed Original Method Name
153168
154169
walkdir ──> Vec<FileEntry> ──> rayon::par_iter ──> Vec<CompressedEntry>
155170
|
156-
per-file: read + CRC32 + DEFLATE
171+
per-file: read + CRC32 + DEFLATE/Zstd
157172
(adaptive threshold: in memory or via temp file)
158173
|
159174
v
@@ -172,7 +187,7 @@ Compressed Original Method Name
172187
create directories (sequential)
173188
|
174189
rayon::par_iter (per file):
175-
zero-copy slice from mmap ──> DEFLATE + CRC32 verify ──> write to destination
190+
zero-copy slice from mmap ──> DEFLATE/Zstd + CRC32 verify ──> write to destination
176191
```
177192

178193
## Project Structure
@@ -209,13 +224,13 @@ ripzip-rs/
209224

210225
## Testing
211226

212-
108 tests: 35 unit tests + 73 integration tests.
227+
117 tests: 35 unit tests + 82 integration tests (3 ZIP64 stress tests are `#[ignore]`).
213228

214229
```
215230
cargo test
216231
```
217232

218-
Integration test categories: round-trip, empty files, unicode filenames, large files, deep directories, progress callbacks, error handling, CRC validation, path traversal, parallel determinism, binary data, single files, interop with the `zip` crate, Windows long paths.
233+
Integration test categories: round-trip, empty files, unicode filenames, large files, deep directories, progress callbacks, error handling, CRC validation, path traversal, parallel determinism, binary data, single files, interop with the `zip` crate (Deflate + Zstd), ZIP64, Windows long paths.
219234

220235
## License
221236

0 commit comments

Comments
 (0)