Skip to content

Commit eea4a23

Browse files
authored
fix(db-backup): keep full dump via xz + <2GiB split parts (#419)
1 parent 0c91e4b commit eea4a23

3 files changed

Lines changed: 21 additions & 14 deletions

File tree

.claude/skills/inferencex-data/SKILL.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,12 @@ description: Download and analyze InferenceX ML inference benchmark data — GPU
55

66
# Setup
77

8-
Download the latest database dump from GitHub releases:
8+
Download the latest database dump from GitHub releases. It is xz-compressed and split into
9+
one or more `.tar.xz.part*` files; reassemble them by piping `cat` through `xz` (requires `xz`):
910

1011
```bash
11-
gh release download --repo SemiAnalysisAI/InferenceX-app --pattern 'inferencex-dump-*.zip' --dir .
12-
unzip inferencex-dump-*.zip
12+
gh release download --repo SemiAnalysisAI/InferenceX-app --pattern 'inferencex-dump-*.tar.xz.part*' --dir .
13+
cat inferencex-dump-*.tar.xz.part* | xz -d -T0 | tar -x
1314
```
1415

1516
# Data

.github/workflows/db-backup.yml

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ on:
88
jobs:
99
backup:
1010
timeout-minutes: 30
11-
runs-on: ubuntu-latest
11+
runs-on: blacksmith-32vcpu-ubuntu-2404
1212
permissions:
1313
contents: write
1414
steps:
@@ -27,17 +27,23 @@ jobs:
2727
env:
2828
DATABASE_READONLY_URL: ${{ secrets.DATABASE_READONLY_URL }}
2929
run: |
30+
set -euo pipefail
3031
DUMP_DIR="inferencex-dump-$(date -u +%Y-%m-%d)"
3132
pnpm admin:db:dump "$DUMP_DIR"
32-
cd packages/db && zip -r "${GITHUB_WORKSPACE}/${DUMP_DIR}.zip" "$DUMP_DIR" && cd -
33-
echo "DUMP_ARCHIVE=${DUMP_DIR}.zip" >> "$GITHUB_ENV"
33+
# Keep every table intact: tar the dump, compress with xz/lzma2, and split into
34+
# <2 GiB parts to stay under GitHub's per-release-asset cap (even as data grows).
35+
( cd packages/db
36+
tar -cf - "$DUMP_DIR" \
37+
| xz -T0 --lzma2=preset=9e,dict=192MiB \
38+
| split -b 1900m -d -a 2 - "${GITHUB_WORKSPACE}/${DUMP_DIR}.tar.xz.part" )
39+
echo "DUMP_GLOB=${DUMP_DIR}.tar.xz.part*" >> "$GITHUB_ENV"
3440
echo "TAG=db-dump/$(date -u +%Y-%m-%d)" >> "$GITHUB_ENV"
35-
3641
- name: Create release
3742
env:
3843
GH_TOKEN: ${{ github.token }}
3944
run: |
40-
gh release create "$TAG" "$DUMP_ARCHIVE" \
45+
# $DUMP_GLOB is intentionally unquoted so the shell expands it to every part file.
46+
gh release create "$TAG" $DUMP_GLOB \
4147
--title "DB Dump $(date -u +%Y-%m-%d)" \
42-
--notes "Weekly database dump." \
48+
--notes "Weekly full database dump (xz-compressed, split into <2 GiB parts). Reassemble: cat *.tar.xz.part* | xz -d | tar -x" \
4349
--latest=false

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,17 +55,17 @@ You can run the dashboard against either a live database or a static JSON dump.
5555

5656
#### Option A: JSON Dump (no database required, local dev only)
5757

58-
Download the latest DB dump from [GitHub Releases](https://github.com/SemiAnalysisAI/InferenceX-app/releases), unzip it, and point `DUMP_DIR` at the directory. This only works with `pnpm dev` production builds require a live database.
58+
Download the latest DB dump from [GitHub Releases](https://github.com/SemiAnalysisAI/InferenceX-app/releases), unpack it, and point `DUMP_DIR` at the directory. The dump is xz-compressed and split into one or more `.tar.xz.part*` files; reassemble them by piping `cat` through `xz`. This only works with `pnpm dev`; production builds require a live database.
5959

6060
```bash
6161
cp .env.example .env
6262

63-
# Download and unzip the latest dump
64-
gh release download db-dump/2026-03-30 -p '*.zip'
65-
unzip inferencex-dump-2026-03-30.zip -d inferencex-dump
63+
# Download and unpack the latest dump (requires xz; `brew install xz` on macOS)
64+
gh release download db-dump/2026-03-30 -p 'inferencex-dump-*.tar.xz.part*'
65+
cat inferencex-dump-2026-03-30.tar.xz.part* | xz -d -T0 | tar -x
6666

6767
# Add to .env
68-
echo 'DUMP_DIR=./inferencex-dump/inferencex-dump-2026-03-30' >> .env
68+
echo 'DUMP_DIR=./inferencex-dump-2026-03-30' >> .env
6969
```
7070

7171
Make sure `DATABASE_READONLY_URL` is not set (or is commented out) in your `.env`.

0 commit comments

Comments
 (0)