Skip to content

fix: rework database dump workflow#419

Merged
adibarra merged 1 commit into
masterfrom
fix/db-dump-xz-split
Jun 5, 2026
Merged

fix: rework database dump workflow#419
adibarra merged 1 commit into
masterfrom
fix/db-dump-xz-split

Conversation

@adibarra
Copy link
Copy Markdown
Contributor

@adibarra adibarra commented Jun 5, 2026

Note

Low Risk
Operational and documentation change only; consumers must switch from zip to the new unpack steps, but application code and DB access are unchanged.

Overview
Replaces weekly DB release packaging from a single .zip with xz-compressed tar archives split into ~1.9 GiB .tar.xz.part* files, so dumps can grow past GitHub’s per-asset size limit while keeping full table dumps intact.

The Database Dump workflow now runs on blacksmith-32vcpu-ubuntu-2404, uses set -euo pipefail, pipes tarxz (lzma2)split, and attaches all parts to the release via an expanded glob. README and the inferencex-data skill document the new download pattern (gh + cat … | xz -d | tar -x) and the simplified DUMP_DIR example path after unpack.

Reviewed by Cursor Bugbot for commit cf39c78. Bugbot is set up for automated code reviews on this repo. Configure here.

@adibarra adibarra marked this pull request as ready for review June 5, 2026 20:23
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
inferencemax-app Ready Ready Preview, Comment Jun 5, 2026 8:26pm

Request Review

The weekly dump zipped the per-table JSON into a single release asset, which
now exceeds GitHub's 2 GiB per-asset cap (~23 GB raw -> ~2 GB zip). Rather than
dropping tables (the #418 stopgap, which breaks anyone rebuilding the DB from
the dump), tar the dump, compress with xz/lzma2 (preset=9e, 192 MiB dict), and
split into <2 GiB parts. split always emits >=1 part, so consumers use one
uniform 'cat parts | xz -d | tar -x' flow.

xz -9e is CPU-bound, so run the job on a 32-vCPU Blacksmith runner to stay
within the 30-min cap. Update README + inferencex-data skill consumer docs to
match. dump-db.ts/load-dump.ts unchanged (still a plain JSON dir).
@adibarra adibarra force-pushed the fix/db-dump-xz-split branch from 0d0d751 to cf39c78 Compare June 5, 2026 20:25
@adibarra adibarra merged commit eea4a23 into master Jun 5, 2026
12 checks passed
@adibarra adibarra deleted the fix/db-dump-xz-split branch June 5, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant