Skip to content

ConsolidateQRepPartitions fails with ZSTD "Data corruption detected" on Snowflake COPY INTO during initial snapshot #4106

@steverob

Description

@steverob

Summary

Initial snapshot of certain tables fails during the ConsolidateQRepPartitions step with a ZSTD decompression error when loading AVRO-staged data into Snowflake
via COPY INTO. The error retries indefinitely with the same corrupt stage files — PeerDB never re-uploads fresh data on retry.

Environment

  • PeerDB version: stable-v0.36.12
  • Source: RDS PostgreSQL 16.8
  • Destination: Snowflake
  • Deployment: Docker Compose (self-hosted on EC2)

Steps to Reproduce

  1. Create a CDC mirror from PostgreSQL to Snowflake
  2. Include a table with ~2M+ rows and a JSONB column
  3. Initial snapshot partitions are read from Postgres successfully (131K rows/batch, multiple parallel partitions)
  4. All partitions are staged as AVRO files in @PEERDB_INTERNAL.peerdb_stage_clone...
  5. ConsolidateQRepPartitions runs COPY INTO with FILE_FORMAT=(TYPE=AVRO), PURGE=TRUE

Error

failed to copy stage to destination: failed to handle append mode: failed to run COPY INTO command:
100079 (22000): Invalid data encountered during decompression for file: 'not_file',
compression type used: 'ZSTD', cause: 'Data corruption detected'

Key Observations

  • Other tables in the same mirror work fine — One with 9M+ rows completed successfully with identical configuration
  • The error is on the Consolidate step, not the partition replication step — data is read from Postgres and staged to Snowflake successfully, but the COPY INTO from stage to destination table fails
  • Retries always fail because PeerDB retries the COPY INTO against the same corrupt stage files rather than re-uploading fresh AVRO data
  • Clearing the stage manually (REMOVE @PEERDB_INTERNAL.peerdb_stage...) does not help — the next retry still fails, suggesting the files are being written
    corrupt in the first place
  • Warehouse size is not the cause — tested with X-SMALL and MEDIUM, same result
  • Fresh database/schema doesn't help — reproduced after dropping and recreating the entire Snowflake database

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions