Each demultiplexed run should have a self-contained sqlite database at demux/RunID/metadata/run.db that makes the run portable and self-describing independent of any central infrastructure.
Created by: the pipeline at the start of process_run(), updated at each pipeline stage completion.
Schema (initial):
CREATE TABLE run (
run_id TEXT PRIMARY KEY,
instrument TEXT,
run_date TEXT,
flowcell_id TEXT,
operator TEXT,
note TEXT,
status TEXT,
created_at TEXT,
updated_at TEXT
);
CREATE TABLE files (
path TEXT PRIMARY KEY,
filename TEXT,
sha512 TEXT,
size_bytes INTEGER,
stage TEXT, -- bcl2fastq, fastqc, multiqc, delivery, etc.
created_at TEXT
);
CREATE TABLE delivery (
destination TEXT, -- NIRD, VIGASP
nird_path TEXT,
status TEXT, -- pending, delivered, failed
delivered_at TEXT,
sha512_verified INTEGER -- boolean
);
CREATE TABLE pipeline_stages (
stage TEXT PRIMARY KEY,
status TEXT, -- pending, running, complete, failed, skipped
started_at TEXT,
completed_at TEXT
);
CREATE TABLE tags (
tag TEXT PRIMARY KEY, -- badrun, control
set_by TEXT,
set_at TEXT
);
Database access control:
Filesystem permissions are the primary access control:
chmod 700 demux/RunID/metadata/
chown <service_user> demux/RunID/metadata/
The database password is derived deterministically from the run metadata so it is computable from first principles without being stored anywhere:
import hashlib
password = hashlib.sha512(
f"{run_id}{flowcell_id}{run_date}{instrument_serial}".encode()
).hexdigest()
Any process that knows the run metadata can recompute the password. No password file, no secrets management, no static string to lose.
The derived password is a lightweight secondary deterrent only. It is intentionally not a strong cryptographic boundary - its purpose is to keep casual users out and make the password self-documenting. Filesystem permissions are what actually controls access.
If stronger encryption is required in the future, SQLCipher is the recommended upgrade path - it is a drop-in sqlite replacement with identical Python API and works with the same password derivation scheme without architectural changes.
Relationship to postgres:
This sqlite file is the portable snapshot. The postgres database is the operational store. Sync direction is sqlite -> postgres on pipeline completion or daemon ingestion. Runs predating postgres can be backfilled by reading their sqlite files.
Lifecycle:
- created at
process_run() start
- updated at each stage transition
- survives
clean - it is metadata, not intermediate output
- removed by
delete along with all of demux/RunID/
- included in
export-rawdata and archive tarballs (see separate ticket)
Acceptance criteria:
demux/RunID/metadata/run.db exists after a successful run.
demux/RunID/metadata/ is owned by the service user with permissions 700.
- all pipeline stage transitions are recorded with timestamps.
- sha512 hashes of all files under
demux/RunID/ are recorded.
- NIRD delivery paths match those used by
deliver_files_to_NIRD().
- database password is deterministically derivable from run metadata alone.
- schema is forward-compatible with the planned postgres migration.
Related: reconstruct-rawdata ticket, reconstruct-samplesheet ticket, --create-samplesheet-with-paths ticket, future postgres migration
Each demultiplexed run should have a self-contained sqlite database at
demux/RunID/metadata/run.dbthat makes the run portable and self-describing independent of any central infrastructure.Created by: the pipeline at the start of
process_run(), updated at each pipeline stage completion.Schema (initial):
Database access control:
Filesystem permissions are the primary access control:
The database password is derived deterministically from the run metadata so it is computable from first principles without being stored anywhere:
Any process that knows the run metadata can recompute the password. No password file, no secrets management, no static string to lose.
The derived password is a lightweight secondary deterrent only. It is intentionally not a strong cryptographic boundary - its purpose is to keep casual users out and make the password self-documenting. Filesystem permissions are what actually controls access.
If stronger encryption is required in the future, SQLCipher is the recommended upgrade path - it is a drop-in sqlite replacement with identical Python API and works with the same password derivation scheme without architectural changes.
Relationship to postgres:
This sqlite file is the portable snapshot. The postgres database is the operational store. Sync direction is sqlite -> postgres on pipeline completion or daemon ingestion. Runs predating postgres can be backfilled by reading their sqlite files.
Lifecycle:
process_run()startclean- it is metadata, not intermediate outputdeletealong with all ofdemux/RunID/export-rawdataandarchivetarballs (see separate ticket)Acceptance criteria:
demux/RunID/metadata/run.dbexists after a successful run.demux/RunID/metadata/is owned by the service user with permissions 700.demux/RunID/are recorded.deliver_files_to_NIRD().Related:
reconstruct-rawdataticket,reconstruct-samplesheetticket,--create-samplesheet-with-pathsticket, future postgres migration