[feature] zstandard compression for text history

❯ zstandard looks cool, in my case its important that it somehow cant get corrupted if a write fails, disk is full, app is termniated half way through saving etc        

● For crash safety with zstd, the trick is one zstd frame per version rather than one long stream:

  - zstd frames are concatenable — cat v1.zst v2.zst v3.zst is a valid zstd stream that decodes to the joined output.
  - Each frame has its own content checksum (enable write_checksum=True).
  - If the app dies mid-append, the last frame is truncated/corrupt. On read, you iterate frames and stop cleanly at the first one that fails — everything before it is
  intact.

  import zstandard as zstd
  cctx = zstd.ZstdCompressor(level=19, write_checksum=True)
  with open("versions.zst", "ab") as f:
      f.write(cctx.compress(new_version_bytes))
      f.flush(); os.fsync(f.fileno())

  To still get good cross-version compression with independent frames, train a dictionary from a sample of versions once (zstd.train_dictionary) and reuse it for every
  frame. Independent frames + shared dict ≈ most of the savings, none of the fragility.

  Other crash-safe options to consider:

  - SQLite (WAL mode) + zstd blobs per row. SQLite gives you real transactional atomicity — partial writes simply don't appear. Probably the most bulletproof if you
  don't mind the dependency.
  - One file per version with atomic rename (tmp → os.replace). POSIX rename is atomic; either the new version is fully there or it isn't. Loses some compression vs.
  shared dict, but trivially correct.

  If "append-only single file" is the hard requirement, frame-per-version with a trained dict + fsync is what I'd reach for.

✻ Brewed for 21s

❯ ive seen sqlite do weird things inside of docker containers..                                                                                                          

● Yeah, that's a real thing — SQLite's locking relies on fcntl semantics that get flaky on bind mounts, overlayfs, NFS, and especially Docker-Desktop's host mounts on
  macOS/Windows. WAL mode in particular doesn't play well with non-local filesystems.

  The frame-per-version zstd file dodges all of that — it's just O_APPEND + fsync, which is about as filesystem-portable as it gets. The "is the tail truncated?"
  question becomes a simple checksum check rather than a locking/journal recovery dance.

  One extra belt-and-suspenders touch if you want it: prefix each frame with a fixed 4-byte length header before writing. zstd frames are self-delimiting so you don't
  strictly need it, but it makes the "scan forward, skip a bad frame, recover later good ones" path easier if you ever care about partial recovery beyond the first
  corruption point.
  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] zstandard compression for text history #4149

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[feature] zstandard compression for text history #4149

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions