❯ zstandard looks cool, in my case its important that it somehow cant get corrupted if a write fails, disk is full, app is termniated half way through saving etc
● For crash safety with zstd, the trick is one zstd frame per version rather than one long stream:
- zstd frames are concatenable — cat v1.zst v2.zst v3.zst is a valid zstd stream that decodes to the joined output.
- Each frame has its own content checksum (enable write_checksum=True).
- If the app dies mid-append, the last frame is truncated/corrupt. On read, you iterate frames and stop cleanly at the first one that fails — everything before it is
intact.
import zstandard as zstd
cctx = zstd.ZstdCompressor(level=19, write_checksum=True)
with open("versions.zst", "ab") as f:
f.write(cctx.compress(new_version_bytes))
f.flush(); os.fsync(f.fileno())
To still get good cross-version compression with independent frames, train a dictionary from a sample of versions once (zstd.train_dictionary) and reuse it for every
frame. Independent frames + shared dict ≈ most of the savings, none of the fragility.
Other crash-safe options to consider:
- SQLite (WAL mode) + zstd blobs per row. SQLite gives you real transactional atomicity — partial writes simply don't appear. Probably the most bulletproof if you
don't mind the dependency.
- One file per version with atomic rename (tmp → os.replace). POSIX rename is atomic; either the new version is fully there or it isn't. Loses some compression vs.
shared dict, but trivially correct.
If "append-only single file" is the hard requirement, frame-per-version with a trained dict + fsync is what I'd reach for.
✻ Brewed for 21s
❯ ive seen sqlite do weird things inside of docker containers..
● Yeah, that's a real thing — SQLite's locking relies on fcntl semantics that get flaky on bind mounts, overlayfs, NFS, and especially Docker-Desktop's host mounts on
macOS/Windows. WAL mode in particular doesn't play well with non-local filesystems.
The frame-per-version zstd file dodges all of that — it's just O_APPEND + fsync, which is about as filesystem-portable as it gets. The "is the tail truncated?"
question becomes a simple checksum check rather than a locking/journal recovery dance.
One extra belt-and-suspenders touch if you want it: prefix each frame with a fixed 4-byte length header before writing. zstd frames are self-delimiting so you don't
strictly need it, but it makes the "scan forward, skip a bad frame, recover later good ones" path easier if you ever care about partial recovery beyond the first
corruption point.
❯ zstandard looks cool, in my case its important that it somehow cant get corrupted if a write fails, disk is full, app is termniated half way through saving etc
● For crash safety with zstd, the trick is one zstd frame per version rather than one long stream:
intact.
import zstandard as zstd
cctx = zstd.ZstdCompressor(level=19, write_checksum=True)
with open("versions.zst", "ab") as f:
f.write(cctx.compress(new_version_bytes))
f.flush(); os.fsync(f.fileno())
To still get good cross-version compression with independent frames, train a dictionary from a sample of versions once (zstd.train_dictionary) and reuse it for every
frame. Independent frames + shared dict ≈ most of the savings, none of the fragility.
Other crash-safe options to consider:
don't mind the dependency.
shared dict, but trivially correct.
If "append-only single file" is the hard requirement, frame-per-version with a trained dict + fsync is what I'd reach for.
✻ Brewed for 21s
❯ ive seen sqlite do weird things inside of docker containers..
● Yeah, that's a real thing — SQLite's locking relies on fcntl semantics that get flaky on bind mounts, overlayfs, NFS, and especially Docker-Desktop's host mounts on
macOS/Windows. WAL mode in particular doesn't play well with non-local filesystems.
The frame-per-version zstd file dodges all of that — it's just O_APPEND + fsync, which is about as filesystem-portable as it gets. The "is the tail truncated?"
question becomes a simple checksum check rather than a locking/journal recovery dance.
One extra belt-and-suspenders touch if you want it: prefix each frame with a fixed 4-byte length header before writing. zstd frames are self-delimiting so you don't
strictly need it, but it makes the "scan forward, skip a bad frame, recover later good ones" path easier if you ever care about partial recovery beyond the first
corruption point.