Skip to content

PostgreSQL restore fails with "could not open relation with OID" due to parallel restore race condition #4127

@vikyw89

Description

@vikyw89

Description

The built-in PostgreSQL restore feature fails consistently, even on a completely fresh database with no existing data. The restore process uses parallel workers that race against each other, causing some tables to have stale OIDs by the time other workers attempt to COPY data into them.

This affects databases with TimescaleDB, foreign key constraints, or any moderately complex schema.

To Reproduce

  1. Create a PostgreSQL database service in Dokploy with TimescaleDB enabled
  2. Set up S3 backup destination and create a backup
  3. Delete the database volume (full prune to ensure clean state)
  4. Redeploy the database and trigger restore from S3
  5. Observe the restore failing repeatedly

Current Behavior

The restore loops indefinitely, with errors like:

ERROR: could not open relation with OID 21066
STATEMENT: COPY public.sessions (id, data, expiry_date) FROM stdin;

ERROR: could not open relation with OID 21066
STATEMENT: COPY public.technologies (id, name, category) FROM stdin;

ERROR: could not open relation with OID 21066
STATEMENT: COPY public.urls (id, domain_id, url, crawl_date, crawl_id, created_at) FROM stdin;

This was reproduced twice — once on a dirty volume and once on a completely fresh volume — confirming the issue is in the restore logic itself, not leftover data.

Root Cause

Dokploy's backup command uses the custom format:

pg_dump -Fc --no-acl --no-owner -h localhost -U ${user} --no-password '${database}' | gzip

The custom format (-Fc) requires pg_restore which by default uses parallel workers. These workers race each other during schema creation — one worker drops and recreates a table while another worker already has its OID cached, resulting in stale OID errors when COPY is attempted.

Expected Behavior

The restore should complete successfully without errors, even for complex schemas with extensions like TimescaleDB.

Proposed Fix

Replace the single-shot pg_restore with a 3-section sequential restore:

gunzip -c backup.sql.gz > /tmp/dump.dump

pg_restore --no-acl --no-owner -d ${database} --section=pre-data /tmp/dump.dump
pg_restore --no-acl --no-owner -d ${database} --section=data /tmp/dump.dump
pg_restore --no-acl --no-owner -d ${database} --section=post-data /tmp/dump.dump

This ensures schema is fully created before data is inserted, and constraints/indexes are applied only after all data is loaded.

Alternatively, switch the backup format to plain SQL and restore via psql pipe, which is fully sequential and requires no intermediate file:

# Backup
pg_dump --no-acl --no-owner '${database}' | gzip

# Restore
gunzip -c backup.sql.gz | psql ${database}

Environment

  • Dokploy version: latest
  • PostgreSQL version: 18.3
  • PostgreSQL image: TimescaleDB (timescale/timescaledb-ha:pg18)
  • OS: Ubuntu 24.04
  • Deployment: same server

Related Issues

Workaround

Until fixed, the only reliable restore method is a direct pipe bypassing Dokploy entirely:

pg_dump --no-acl --no-owner "postgres://user:pass@source:5432/db" | psql "postgres://user:pass@dest:5432/db"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions