You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
zipsync is a tool to pack and unpack zip archives. It is designed as a single-purpose tool to pack and unpack build cache entries.
3
+
zipsync is a focused tool for packing and unpacking build cache entries using a constrained subset of the ZIP format for high performance. It optimizes the common scenario where most files already exist in the target location and are unchanged.
4
4
5
-
## Implementation
5
+
## Goals & Rationale
6
6
7
-
### Unpack
7
+
-**Optimize partial unpack**: Most builds reuse the majority of previously produced outputs. Skipping rewrites preserves filesystem and page cache state.
8
+
-**Only write when needed**: Fewer syscalls.
9
+
-**Integrated cleanup**: Removes the need for a separate `rm -rf` pass; extra files and empty directories are pruned automatically.
10
+
-**ZIP subset**: Compatibility with malware scanners.
11
+
-**Fast inspection**: The central directory can be enumerated without inflating the entire archive (unlike tar+gzip).
8
12
9
-
- Read the zip central directory record at the end of the zip file and enumerate zip entries
10
-
- Parse the zipsync metadata file in the archive. This contains the SHA-1 hashes of the files
11
-
- Enumerate the target directories, cleanup any files or folders that aren't in the archive
12
-
- If a file exists with matching size + SHA‑1, skip writing; else unpack it
13
+
## How It Works
13
14
14
-
### Pack
15
+
### Pack Flow
15
16
16
-
- Enumerate the target directories.
17
-
- For each file compute a SHA-1 hash for the zipsync metadata file, and the CRC32 (required by zip format), then compress it if needed. Write the headers and file contents to the zip archive.
18
-
- Write the metadata file to the zip archive and the zip central directory record.
17
+
```
18
+
for each file F
19
+
write LocalFileHeader(F)
20
+
stream chunks:
21
+
read -> hash + crc + maybe compress -> write
22
+
finalize compressor
23
+
write DataDescriptor(F)
24
+
add metadata entry (same pattern)
25
+
write central directory records
26
+
```
19
27
20
-
##Constraints
28
+
### Unpack Flow
21
29
22
-
Though archives created by zipsync can be used by other zip compatible programs, the opposite is not the case. zipsync only implements a subset of zip features to achieve greater performance.
30
+
```
31
+
load archive -> parse central dir -> read metadata
32
+
scan filesystem & delete extraneous entries
33
+
for each entry (except metadata):
34
+
if unchanged (sha1 matches) => skip
35
+
else extract (decompress if needed)
36
+
```
37
+
38
+
## Why ZIP (vs tar + gzip)
39
+
40
+
Pros for this scenario:
41
+
42
+
- Central directory enables cheap listing without decompressing entire payload.
43
+
- Widely understood / tooling-friendly (system explorers, scanners, CI tooling).
44
+
- Per-file compression keeps selective unpack simple (no need to inflate all bytes).
45
+
46
+
Trade-offs:
47
+
48
+
- Tar+gzip can exploit cross-file redundancy for better compressed size in datasets with many similar files.
0 commit comments