-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
PostgreSQL restore fails with "could not open relation with OID" due to parallel restore race condition #4127
Description
Description
The built-in PostgreSQL restore feature fails consistently, even on a completely fresh database with no existing data. The restore process uses parallel workers that race against each other, causing some tables to have stale OIDs by the time other workers attempt to COPY data into them.
This affects databases with TimescaleDB, foreign key constraints, or any moderately complex schema.
To Reproduce
- Create a PostgreSQL database service in Dokploy with TimescaleDB enabled
- Set up S3 backup destination and create a backup
- Delete the database volume (full prune to ensure clean state)
- Redeploy the database and trigger restore from S3
- Observe the restore failing repeatedly
Current Behavior
The restore loops indefinitely, with errors like:
ERROR: could not open relation with OID 21066
STATEMENT: COPY public.sessions (id, data, expiry_date) FROM stdin;
ERROR: could not open relation with OID 21066
STATEMENT: COPY public.technologies (id, name, category) FROM stdin;
ERROR: could not open relation with OID 21066
STATEMENT: COPY public.urls (id, domain_id, url, crawl_date, crawl_id, created_at) FROM stdin;
This was reproduced twice — once on a dirty volume and once on a completely fresh volume — confirming the issue is in the restore logic itself, not leftover data.
Root Cause
Dokploy's backup command uses the custom format:
pg_dump -Fc --no-acl --no-owner -h localhost -U ${user} --no-password '${database}' | gzipThe custom format (-Fc) requires pg_restore which by default uses parallel workers. These workers race each other during schema creation — one worker drops and recreates a table while another worker already has its OID cached, resulting in stale OID errors when COPY is attempted.
Expected Behavior
The restore should complete successfully without errors, even for complex schemas with extensions like TimescaleDB.
Proposed Fix
Replace the single-shot pg_restore with a 3-section sequential restore:
gunzip -c backup.sql.gz > /tmp/dump.dump
pg_restore --no-acl --no-owner -d ${database} --section=pre-data /tmp/dump.dump
pg_restore --no-acl --no-owner -d ${database} --section=data /tmp/dump.dump
pg_restore --no-acl --no-owner -d ${database} --section=post-data /tmp/dump.dumpThis ensures schema is fully created before data is inserted, and constraints/indexes are applied only after all data is loaded.
Alternatively, switch the backup format to plain SQL and restore via psql pipe, which is fully sequential and requires no intermediate file:
# Backup
pg_dump --no-acl --no-owner '${database}' | gzip
# Restore
gunzip -c backup.sql.gz | psql ${database}Environment
- Dokploy version: latest
- PostgreSQL version: 18.3
- PostgreSQL image: TimescaleDB (timescale/timescaledb-ha:pg18)
- OS: Ubuntu 24.04
- Deployment: same server
Related Issues
- pg_restore continuously retrying #1730 (closed as wontfix) — same root cause, this issue provides additional reproduction steps and a confirmed workaround
Workaround
Until fixed, the only reliable restore method is a direct pipe bypassing Dokploy entirely:
pg_dump --no-acl --no-owner "postgres://user:pass@source:5432/db" | psql "postgres://user:pass@dest:5432/db"