PortOS stores data in two places: PostgreSQL (app-native relational records, search/vector indexes, sync cursors, lineage) and the filesystem under ./data/ (large binary assets, externally-editable prose, model weights, transient queues, and explicitly file-sync-oriented domains).
PostgreSQL is a required install/runtime dependency (see Backup & Restore and scripts/setup-db.js). Files remain first-class for the things a relational DB is bad at. This document is the contract for deciding which home a given domain belongs in — and the checklist a reviewer should apply before any new feature defaults to "just write another data/*.json."
For the full domain-by-domain inventory (every current table and
data/store, with Postgres-fit notes), see the plan doc:docs/plans/2026-06-06-create-postgres-storage-inventory.md. This page covers the contract and decision rules, not the exhaustive list.
| Class | Bytes live | Searchable metadata | Use when | PortOS examples |
|---|---|---|---|---|
db-primary |
PostgreSQL | PostgreSQL | App-native relational records: relationships, indexes, status, lineage, sync cursors, tombstones | catalog_ingredients, catalog_ingredient_relations, memories; target: universes, series, issues, Creative Director, media metadata |
file-primary |
Filesystem | Filesystem (DB may index) | The record IS an external file, or it must survive in a file-sync workflow (iCloud, Git, hand-editing) | Writers Room draft .md bodies, MortalLoom / Health / Meatspace iCloud stores |
asset-file-db-indexed |
Filesystem | PostgreSQL | Large binary payloads whose metadata must be queryable/searchable | Generated images/videos/audio + DB asset rows referencing them via asset_key / media_key |
ephemeral-file |
Filesystem | None (or DB job ref only) | Transient/regenerable runtime state — queues, uploads, caches | data/uploads/*, runtime media job queue, browser profile/cache |
The one rule that ties them together: the DB points to files; it does not absorb the bytes. Any file asset referenced from the DB gets a stable asset_key / media_key row plus integrity metadata. Bytes never go into a column.
Definition. Records that PortOS itself authors and relates: they have foreign keys, statuses, audit trails, search/vector indexes, and federated sync cursors/tombstones. The DB is the source of truth; there is no meaningful file representation of the record.
When to use. The record participates in relationships (series.universeId, issue.seriesId, catalog refs), needs cross-record queries ("everything related to this universe"), needs full-text or vector search, or needs per-table sequence cursors for peer sync.
Where it lives. PostgreSQL via server/lib/db.js + server/scripts/init-db.sql. Service modules own the storage adapter (e.g. server/services/catalogDB.js, server/services/memoryDB.js).
Examples.
catalog_ingredients— typed creative records with JSONB payload, tags, embeddings, generatedsearch_tsv, soft delete, sync sequence.catalog_ingredient_relations— directed ingredient→ingredient graph edges (the strongest argument for Postgres as the catalog graph store).memories/memory_links— long-term memory + pgvector similarity.creative_director_projects— Creative Director project/treatment/scene/run state, one row per project (id/status/timestamps as columns, the full record indataJSONB). Migrated from the monolithicdata/creative-director-projects.jsonin Phase 3 (#997); CD is local-only, so the row carries no sync cursor/tombstone. Adapter:server/services/creativeDirector/projectsDB.js.catalog_user_types— user-defined ingredient types (the registry that defines catalog row semantics), one row per type (idPK, the definition indataJSONB,updated_at/deleted_atmirroring the federation LWW clock + tombstone). Migrated from thedata/settings.jsoncatalogUserTypesslice in Phase 4 lead-in (#1001) so type evolution versions/syncs alongside the catalog data it governs. Federates via the catalog synccatalogTypesenvelope block (wire shape unchanged by the move). Adapter:server/services/catalogUserTypes/db.js, dispatched viastore.js.universes/universe_runs— Universe Builder records (canon bibles, categories, composite sheets, locks, influences) one row per universe with the full sanitized record indataJSONB andname/schema_version/ephemeral/updated_at/deleted/deleted_atmirrored into columns; render-run history one row per run (local-only, capped 200, never federated). Migrated fromdata/universes/{id}/index.json(collectionStore) in Phase 3 Create slice 1 (#1014). NOsync_sequence— universes federate via the EXISTINGdataSyncsnapshot/push model (LWW on the body'supdatedAt), so the storage swap is invisible to peers (no schema-version bump). The store bumps an in-process mutation epoch on every write thatdataSyncfolds into its checksum fingerprint, since a DB edit no longer changes thedata/universes/directory the fingerprint used to watch.universe_runsis intentionally never federated — a regenerable render cache under a 200-row global cap that two producers would mutually evict, while the durable universe record already syncs (ADR tribe + universe-runs local, #1724). Adapter:server/services/universeBuilder/db.js, dispatched viastore.js.tribe_people/tribe_touchpoints/tribe_memory_links— the Tribe relationship/CRM graph (people + their care cadence, contact touchpoints, and cross-links into brainmemories). Intentionally machine-local — never federated (ADR tribe + universe-runs local, #1724): it is relationship-graph data, mirroring the deliberate "memory_links are instance-local" boundary inmemorySync.js(memory nodes federate, the link graph does not), and is coupled to machine-local domains —tribe_memory_linksextends the non-federatedmemory_linkslayer andtribe_touchpointscarry per-machine calendar-account refs. NOsync_sequence, no peer-sync record kind, nodataSynccategory. Adapter:server/services/tribe.js.
Postgres-First target. Pipeline series/issues, Story Builder sessions, and searchable media metadata are still db-primary targets — they currently live in data/ JSON but carry relationships and status that belong in the DB. The schema for the Create domains is designed in docs/plans/2026-06-07-create-relational-schema-design.md (#999), with implementation tracked as #1014–#1018. (Creative Director project/scene/run state moved to Postgres in Phase 3 / #997; catalog user-defined types moved in Phase 4 lead-in / #1001; universes moved in Phase 3 Create slice 1 / #1014 — see the universes entry above. Pipeline series/issues #1015, Story Builder #1016, Writers Room #1017, and the catalog ref resolver #1018 are the remaining slices.)
Definition. The record either is an external file (long prose, a model, a repo) or must remain a file to preserve a sync/editing workflow PortOS does not own (iCloud, external editors, Git). A DB row may index it, but the file is authoritative for the body.
When to use. The payload is long externally-editable prose; the domain syncs through iCloud/file-sync outside PortOS; or forcing the record through the app DB would break an existing sync boundary.
Where it lives. Filesystem under ./data/ (or an OS-managed sync container). DB may hold metadata/index rows (hashes, word counts, segment indexes) but not the body.
Examples.
- Writers Room draft bodies —
data/writers-room/works/{workId}/drafts/{draftId}.md. Keep.mdfile-backed; store metadata/index rows in DB. - MortalLoom / Health / Meatspace health data —
data/health,data/meatspace, MortalLoom iCloud store. Kept file-backed to preserve iCloud/file sync and avoid routing sensitive health records through the app DB before that boundary is designed. - App scaffolds / cloned repos / browser profiles —
data/repos,data/browser-profile— inherently filesystem-oriented. - LoRA training datasets —
data/lora-datasets/{id}/index.json+images/*.png(collectionStore). The record is inseparable from the image bytes it organizes, has no cross-record queries beyond a small characterId scan, and is machine-local likedata/loras/itself (training artifacts tied to this machine's GPU output — never federates, no sync cursor/tombstone). Backed up in full: uploads and hand-edited captions are not re-creatable. Training RUN records aredb-primary(lora_training_runs); run artifacts (checkpoints/samples) live underdata/training-runs/{runId}/with checkpoints/cache excluded from backup.
Definition. Large binary payloads (images, video, audio, model weights) stay on disk as bytes, while their searchable metadata — provenance, gen params, favorites, notes, lineage, collection membership — lives in PostgreSQL as asset rows that reference the file by a stable key.
When to use. You have generated or imported binary assets that the user needs to search, filter, favorite, or relate to other records, but the bytes themselves are large and have no business in a column.
Where it lives. Bytes under ./data/ (data/images/*, data/videos/*, data/audio/*, data/music/*, thumbnails). Metadata in DB asset rows keyed by asset_key / media_key, with integrity metadata (SHA-256 — see server/lib/assetHash.js). The DB row references the file; it never embeds the bytes.
Examples.
- Generated images —
data/images/*bytes +.metadata.jsonsidecars; indexed into themedia_assetstable (#1000) keyedimage:<filename>. Sidecars remain authoritative; the DB row is a derived, queryable mirror. Adapter:server/services/mediaAssetIndex/. - Generated videos —
data/videos/*,data/video-thumbnails/*bytes, tracked indata/video-history.json; indexed intomedia_assetskeyedvideo:<jobId>. History file remains authoritative. - Media collections — many-to-many links over assets/universes/series/catalog media pointers (
db-primarylink tables) pointing atasset-file-db-indexedbytes. Stilldata/media-collections/*JSON today — a follow-up slice of #1000.
Media asset index (media_assets, #1000). One row per generated image/video: media_key (<kind>:<ref>) PK, kind/ref/created_at mirror columns for queries, the full metadata record in data JSONB. It is a derived index — the on-disk sidecars + video-history.json stay authoritative — reconciled from disk at boot (upsert every asset, prune rows whose file is gone) and kept warm by a generation-completed hook. Local-only (rebuilt from disk), so no sync cursor/tombstone. Adapter: server/services/mediaAssetIndex/{logic,db,index}.js.
Standalone media-library federation (mediaLibrary, #1566). For full-sync peers, the standalone media-library bytes (generated images + sidecars, videos, pipeline audio, uploaded music) mirror across the pair — not just bytes referenced by a synced creative record. The sender advertises a library-level manifest at GET /api/peer-sync/library-manifest ({ schemaVersion, manifestHash, assets:[{kind,filename,sha256,sidecarSha256?}] }); the receiver's periodic sweep (syncMediaLibraryFromPeer, driven from initSharing) diffs it against local disk, receiver-pulls missing bytes through the SAME diffAssetManifestAgainstLocal + pullOneAsset machinery as the per-record path, then rebuilds the derived media_assets index. Video thumbnails are regenerated locally on video pull (not byte-federated); video-history.json metadata already union-merges via the videoHistory dataSync category; the generic data/history.jsonl action log is machine-local and never federated. Byte replication is gated to peer.fullSync and honors backup DEFAULT_EXCLUDES (a media dir excluded from backup isn't federated). Manifest envelope versioned by PORTOS_SCHEMA_VERSIONS.mediaLibrary (a non-record category — see NON_RECORD_SCHEMA_CATEGORIES); the receiver gently skips a sender ahead of its version.
Postgres-First target (remaining). data/history.jsonl (action log) and the durable portions of data/media-jobs.json (job history / lineage) are still file-backed — follow-up slices. Do not move generated image/video/audio bytes into PostgreSQL.
Definition. Regenerable, short-lived runtime state. Losing it costs at most an in-flight job or a cache rebuild — never durable user data. It should never be the only home for anything the user expects to persist.
When to use. Upload staging, in-flight job queues, caches, and scratch state. If a record must survive a reinstall or be queryable across records, it is not ephemeral-file — promote it.
Where it lives. Filesystem under ./data/, frequently excluded from backups (see DEFAULT_EXCLUDES in server/services/backup.js). The DB may hold a durable job reference even when the staging bytes are ephemeral.
Examples.
- Upload staging —
data/uploads/*. Ephemeral; do not put in DB except as job references. - Media job queue — runtime queue state can stay file-backed short term, but job history and artifact lineage are
db-primaryand should move to the DB. - Browser CDP profile / downloads —
data/browser-profile/,data/browser-downloads/— cache, non-overridable backup excludes.
The contract draws a single line:
- PostgreSQL owns app-native records, relationships, indexes, sync cursors, tombstones, lineage, status, and searchable metadata.
- Files own large binary payloads, long externally-editable prose bodies, model weights, temporary uploads, and iCloud-backed health/life stores.
- File assets referenced from the DB get a stable
asset_key/media_keyrow plus integrity metadata. The DB points to files; it does not absorb the bytes.
Defaulting a new Create feature to a fresh data/*.json file is the anti-pattern this contract exists to stop. Monolithic JSON in hot paths (media-jobs.json, video-history.json — and creative-director-projects.json before #997 moved it to Postgres) causes write contention and growth risk; string-id cross-references across separate JSON stores drift with no integrity check. New relational surfaces should be db-primary from the start.
Each file→Postgres migrator parks its source aside (<domain>.imported, per-record index.json.imported / manifest.imported.json) instead of deleting it, as a one-release recovery copy. server/scripts/pruneImportedLegacyFiles.js (run at boot from server/index.js, after every store's file→DB warm, registered in the ledger by scripts/migrations/077-prune-imported-legacy-files.js) removes the .imported copies only when it has verified, by record identity, that every migrated record those artifacts hold is still present in the database. It reads the record ids straight off the parked artifacts — the parsed index.json.id of each per-record directory (the same id the migrator inserted, which can differ from the folder name), or the parsed JSON .id of the creative-director export, each writers-room manifest/folder/exercise, the universe config.runs[], and each manifest's drafts[] — and checks those exact ids exist via WHERE id = ANY(...). Identity rather than a row count, because a count can be satisfied by unrelated rows after a wipe+restore to a different record set (and the migrators' imported count comes from INSERT … ON CONFLICT DO NOTHING, so it can undercount). Anything that can't be verified withholds the whole domain's prune: a missing id, a present-but-unparseable artifact, an id-less record, or a domain whose migration is still pending (legacy source on disk, no marker). The prune deliberately does not touch the deeper *.bak-NNN monolith backups (the file→file split migrations 034–036) or the file-split backups (037 history, 059 media-collections): those predate the DB, can hold records the split migrator skipped, and carry soft-deleted records the live DB legitimately lacks, so they're neither identity-verifiable nor safe to auto-delete — they're left for manual cleanup. None of these artifacts are excluded from rsync backups: while a prune is blocked they are the only recovery source and pg_dump is capturing the incomplete DB, so a snapshot must keep them — once pruned from disk they leave subsequent snapshots naturally.
PortOS treats PostgreSQL as a mandatory install/runtime dependency for every install and every federated peer machine (decision: ADR — PostgreSQL as the Primary Datastore). Run it as either:
- System (native) PostgreSQL on
:5432—PGMODE=native, or - Docker PostgreSQL on
:5561—PGMODE=docker(the default).
Provision either path with npm run setup:db (also run automatically by npm run setup and npm start). It auto-detects an already-healthy local PostgreSQL and uses native mode; otherwise it starts/initializes the Docker container, or — when Docker is unavailable — offers to bootstrap native PostgreSQL. See Setup path below.
The file backend (server/services/memory.js, JSON under ./data/) is unsupported for production and for federated peers. It exists only so the test suite (and ad-hoc local development) can boot without a database. It is not a fallback, a "lite" mode, or a way to run PortOS without Postgres:
- It is reached only via the explicit
MEMORY_BACKEND=fileenv var (set fromPGMODE=filein.env, mapped by the launcher) or automatically underNODE_ENV=test. There is no menu choice for it (scripts/setup-db.jsoffers only Docker and Native), andnpm run setup:dbwithPGMODE=fileprints an "unsupported" notice and refuses to provision it. - When
MEMORY_BACKENDis unset, PortOS requires a healthy database and does NOT silently fall back to file storage — an unreachable/unmigrated DB is an error condition.server/services/memoryBackend.jsfails fast with an actionable message (run npm run setup:db) rather than serving a half-broken install. This no-silent-fallback behavior is intentional; do not "fix" it.
Why file storage cannot be a supported mode:
- No creative-catalog / vector equivalent. The catalog graph, memory similarity, and hybrid search depend on PostgreSQL + pgvector (HNSW vector search fused with
tsvectorfull-text). There is no file-backed implementation of these — a file-backed install would serve a half-broken app the moment a user touched the catalog, memory search, or anydb-primaryCreate domain. As each Create domain migrates to Postgres (universes #1014, pipeline #1015, Story Builder #1016, Writers Room #1017, catalog refs #1018), its file path survives only under this dev/test escape hatch. - Federation assumes Postgres. Cross-machine sync (snapshot/push + last-writer-wins) and the
db-primarysequence cursors/tombstones are designed around the database. A file-backed peer is not a supported member of a federation. - Backup assumes Postgres. The backup/restore contract treats the
pg_dumplogical dump as required system state (see Backup & Restore) — a file-backed install has no dump to capture or verify.
The escape hatch is guarded from bitrot by the test suite (tests boot with NODE_ENV=test and exercise the file backend), so the path stays runnable — but "the tests use it" is not an argument that it is a deployment option. It isn't.
npm run setup:db → scripts/setup-db.js is the single command that makes PostgreSQL ready, and is wired into npm run setup and npm start so a normal install never has to think about it. Its happy path:
- Already healthy? If the configured role can authenticate to the configured database and the
memoriestable fromserver/scripts/init-db.sqlexists, it reports ready and exits (fast path — no re-provisioning on everynpm start). PGMODE=docker(default): starts thepgvector/pgvector:pg17container (docker-compose.yml), waits for it to accept connections and finish applying the init schema, then reports ready on:5561. If Docker is missing or not running, it prints platform-specific install/start hints and — if a healthy native PostgreSQL is already present — switches to native automatically.PGMODE=native: runsscripts/db.sh setup-native(idempotent: brew install, role, db, extensions, schema) and verifies at the domain level (role can auth + schema present) before reporting ready on:5432.- Failure is non-zero exit. A started-but-unresponsive container, a failed native bootstrap, or a non-interactive context with no usable DB exits non-zero with an actionable message — so the
&&-chainednpm starthalts here instead of crash-looping under PM2 against an unready database.
PGPASSWORD/PGUSER/PGDATABASE/PGPORT are resolved from process.env first, then .env, then the backward-compatible defaults (portos/portos/portos/5432). The default portos password is an intentional local-development fallback (see the Distribution model note in CLAUDE.md); production deployments override it via PGPASSWORD.
Apply this checklist to every new feature that persists data, and require it in PR review. A new data/*.json store must justify itself against these questions — the default for app-native records is PostgreSQL.
- Which class is it? Tag the domain
db-primary,file-primary,asset-file-db-indexed, orephemeral-file. If you cannot pick one cleanly, the design is probably mixing concerns. - Does it relate to other records? FKs, cross-record queries, "show everything related to X", graph edges →
db-primary. Do not encode relationships as string ids across separate JSON files (they drift with no integrity check). - Does it need search? Full-text or vector search → PostgreSQL (
search_tsv/ pgvector), not an app-level scan over JSON files. - If you chose a new
data/*.json, why not the DB? Acceptable reasons: large binary bytes (asset-file-db-indexed— index the metadata, keep bytes on disk), long externally-editable prose, an iCloud/file-sync workflow PortOS does not own, or genuinely transient runtime state (ephemeral-file). "It was faster to write a JSON file" is not acceptable for app-native relational records. - Bytes vs. pointer. If binary assets are involved, confirm bytes stay on disk and the DB holds only an
asset_key/media_keyrow + integrity metadata. Never store bytes in a column. - Federation. If the record syncs to peers, does it have a per-table/per-record sequence cursor and tombstone strategy? (See
server/lib/schemaVersions.js,server/lib/syncWire.js.) Cross-machine sync is first-class — see the Distribution model inCLAUDE.md. - Migration. On-disk/DB format changes need a migration in
scripts/migrations/(applied-list tracked per install indata/migrations.applied.json) and seed files indata.reference/. Other installs and other federated machines upgrade independently. - Backup coverage. Will the new store be captured by backup?
db-primaryis covered by the Postgres dump;file-primary/asset-file-db-indexedby the rsync snapshot;ephemeral-fileis correctly excluded (DEFAULT_EXCLUDES). Confirm the store lands in the right bucket. See Backup & Restore.
- Backup & Restore — what gets backed up (data/ files + mandatory Postgres dump) and how restore works.
docs/plans/2026-06-06-create-postgres-storage-inventory.md— full inventory + migration phases.server/lib/README.md— storage helpers (collectionStore,fileUtils,db.js,assetHash).