You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-19Lines changed: 34 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,8 @@ The fastest zip/unzip library and CLI for Rust. Parallel compression and extract
10
10
-**Atomic archive writes** -- compression writes to a tempfile, fsyncs, then renames; a crash mid-write never produces a corrupt archive
11
11
-**Path traversal prevention** -- rejects `../` attacks, absolute paths, and Windows drive letters before any extraction begins
12
12
-**ZIP64 support** -- automatic for >65,535 entries, >4 GB files, or >4 GB offsets
13
-
-**Incompressible data detection** -- falls back to Stored when DEFLATE would inflate the data
13
+
-**Zstd compression** -- Zstandard (method 93) as an alternative to DEFLATE, with full interop
14
+
-**Incompressible data detection** -- falls back to Stored when compression would inflate the data
14
15
-**Windows long path support** -- `\\?\` extended-length paths for paths exceeding MAX_PATH (260 chars)
15
16
-**Adaptive memory management** -- dynamically sizes the in-memory compression threshold based on available system RAM (up to 400 MB budget), so small files stay in memory while large files stream through temp files
16
17
-**Deterministic output** -- archives are byte-identical across runs (entries sorted by path)
@@ -25,24 +26,35 @@ The fastest zip/unzip library and CLI for Rust. Parallel compression and extract
25
26
26
27
| Scenario | Files | Data | ripzip | zip crate | Speedup |
**Takeaway:** ripzip compresses **2.9--6.4x faster** and extracts **1.2--6.0x faster** across all workloads. Speedup scales with individual file size -- the 1 GB log corpus sees the biggest wins because all 28 threads are saturated with real DEFLATE work. The 10k small files scenario is filesystem-metadata-bound, where parallelism still helps but the per-file overhead floor is higher.
43
+
**Takeaway:** ripzip compresses **2.0--10.7x faster** and extracts **1.2--3.8x faster** across all workloads. Speedup scales with individual file size -- the 5 GB binary blob corpus sees the biggest compression wins (10.7x) because all 28 threads are saturated with real DEFLATE work on large chunks. The 50k small files scenario is filesystem-metadata-bound, where parallelism still helps but the per-file overhead floor is higher.
43
44
44
45
Archive sizes are identical between the two -- same DEFLATE algorithm, same compression level.
45
46
47
+
### Zstd vs Deflate (ripzip, both parallel, level 1)
**Takeaway:** Zstd achieves dramatically better compression ratios on large files (100x smaller archives for logs/blobs) while being comparable or faster for compression. On many small files, Deflate wins because Zstd's per-file initialization cost is higher. Extraction speeds are nearly identical -- both are I/O-bound at this level of parallelism.
@@ -144,7 +159,7 @@ Compressed Original Method Name
144
159
3.**Path traversal prevention** -- all archive paths are validated before any extraction. Paths containing `..`, absolute paths, and Windows drive letters are rejected.
145
160
4.**ZIP64** -- automatically used when entry counts exceed 65,535, file sizes exceed 4 GB, or offsets exceed 4 GB.
146
161
5.**fsync before rename** -- data is flushed to disk before the atomic rename, ensuring durability.
147
-
6.**Incompressible data detection** -- if DEFLATE produces output larger than the input, the file is stored uncompressed.
162
+
6.**Incompressible data detection** -- if compression produces output larger than the input, the file is stored uncompressed.
148
163
149
164
## Architecture
150
165
@@ -153,7 +168,7 @@ Compressed Original Method Name
@@ -172,7 +187,7 @@ Compressed Original Method Name
172
187
create directories (sequential)
173
188
|
174
189
rayon::par_iter (per file):
175
-
zero-copy slice from mmap ──> DEFLATE + CRC32 verify ──> write to destination
190
+
zero-copy slice from mmap ──> DEFLATE/Zstd + CRC32 verify ──> write to destination
176
191
```
177
192
178
193
## Project Structure
@@ -209,13 +224,13 @@ ripzip-rs/
209
224
210
225
## Testing
211
226
212
-
108 tests: 35 unit tests + 73 integration tests.
227
+
117 tests: 35 unit tests + 82 integration tests (3 ZIP64 stress tests are `#[ignore]`).
213
228
214
229
```
215
230
cargo test
216
231
```
217
232
218
-
Integration test categories: round-trip, empty files, unicode filenames, large files, deep directories, progress callbacks, error handling, CRC validation, path traversal, parallel determinism, binary data, single files, interop with the `zip` crate, Windows long paths.
233
+
Integration test categories: round-trip, empty files, unicode filenames, large files, deep directories, progress callbacks, error handling, CRC validation, path traversal, parallel determinism, binary data, single files, interop with the `zip` crate (Deflate + Zstd), ZIP64, Windows long paths.
0 commit comments