What you'll practice: using
pg_dumpandpg_restorefrom the shell, importing/exporting CSV with\copyandCOPY, and thinking about backup strategies.
Unlike most sections in this course, the commands here run in your shell, not inside psql. pg_dump and pg_restore are command-line tools that ship with PostgreSQL.
- Dump an entire database with
pg_dump - Dump a single schema or table
- Use the plain SQL format vs the custom archive format (
-Fc) - Restore with
psql -f(plain SQL) and withpg_restore(custom format) - Export a table to CSV with
\copy - Import a CSV back with
\copy - Know the difference between
COPY(server-side) and\copy(client-side)
Load the sample schema:
psql -U postgres -d sql_exercise -f 04-advanced-databases/19-backup-and-restore/schema.sqlThis creates a tiny adv_backup schema with one city table.
Create a scratch directory to hold your dumps — anywhere you like, e.g. ~/sql-backups/:
mkdir -p ~/sql-backupspg_dump -U postgres -d sql_exercise \
--schema=adv_backup --table='adv_backup.city' \
> ~/sql-backups/city.sqlThen look at the file:
less ~/sql-backups/city.sqlQuestion: what does the file contain? Can a human read it? What commands does it use to recreate the data?
Drop the table, then restore from the file:
psql -U postgres -d sql_exercise -c 'DROP TABLE adv_backup.city CASCADE;'
psql -U postgres -d sql_exercise -f ~/sql-backups/city.sql
psql -U postgres -d sql_exercise -c 'SELECT COUNT(*) FROM adv_backup.city;'Expected: 8.
pg_dump -U postgres -d sql_exercise --schema-only --schema=adv_backup \
> ~/sql-backups/adv_backup_schema.sqlQuestion: look at the file. Which statements are there? (CREATE TABLE, CREATE SEQUENCE, ALTER TABLE ... ADD CONSTRAINT, etc. — but no INSERTs.)
pg_dump -U postgres -d sql_exercise --data-only --schema=adv_backup \
> ~/sql-backups/adv_backup_data.sqlQuestion: does this file have CREATE TABLEs or just COPY / INSERT statements?
The plain SQL format is readable but slow and can't be restored in parallel. The custom format (-Fc) is compressed, binary, and supports selective restore:
pg_dump -U postgres -d sql_exercise -Fc --schema=adv_backup \
-f ~/sql-backups/adv_backup.dumpQuestion: try opening the file in less — is it human-readable? Compare its size to the plain SQL version.
First drop the schema:
psql -U postgres -d sql_exercise -c 'DROP SCHEMA adv_backup CASCADE;'Then restore:
pg_restore -U postgres -d sql_exercise ~/sql-backups/adv_backup.dump
psql -U postgres -d sql_exercise -c 'SELECT COUNT(*) FROM adv_backup.city;'Expected: 8.
Question: pg_restore has a -j N flag for parallel restore. When would that matter?
pg_restore --list ~/sql-backups/adv_backup.dumpQuestion: what does the output look like? This is the dump's "table of contents" — useful when you want to restore only a subset.
pg_dump -U postgres -Fc sql_exercise -f ~/sql-backups/sql_exercise_full.dumpQuestion: how big is the file? ls -lh ~/sql-backups/sql_exercise_full.dump.
(If you want to see a plain SQL version of the whole DB for comparison: pg_dump -U postgres sql_exercise > ~/sql-backups/sql_exercise_full.sql.)
Back in psql:
\copy adv_backup.city TO '~/sql-backups/city.csv' CSV HEADERVerify: cat ~/sql-backups/city.csv (from your shell) — you should see a comma-separated file with a header row.
Note: \copy is a psql meta-command that runs client-side, so it uses a path on YOUR machine. COPY ... TO '/path' (no backslash) runs server-side and needs a path the server can write to.
Empty the table and reload from the CSV:
TRUNCATE adv_backup.city RESTART IDENTITY;
\copy adv_backup.city (name, country, population) FROM '~/sql-backups/city.csv' CSV HEADER
SELECT COUNT(*) FROM adv_backup.city;Expected: 8.
\copy (SELECT name, population FROM adv_backup.city WHERE country = 'GB' ORDER BY population DESC) TO '~/sql-backups/gb_cities.csv' CSV HEADERVerify: cat ~/sql-backups/gb_cities.csv.
Dump the adv_backup schema, then use sed (or any text tool) to rewrite it into a different schema name:
pg_dump -U postgres -d sql_exercise --schema=adv_backup > ~/sql-backups/adv_backup.sql
sed 's/adv_backup/adv_backup_copy/g' ~/sql-backups/adv_backup.sql > ~/sql-backups/adv_backup_copy.sql
psql -U postgres -d sql_exercise -f ~/sql-backups/adv_backup_copy.sqlVerify:
psql -U postgres -d sql_exercise -c '\dn'
psql -U postgres -d sql_exercise -c 'SELECT COUNT(*) FROM adv_backup_copy.city;'Question: this is a quick-and-dirty way to "duplicate a schema". When would you do this? (Hint: making a snapshot before a risky migration.)
pg_dump dumps one database at a time. It does not dump cluster-level objects like roles. For those, use pg_dumpall --globals-only:
pg_dumpall -U postgres --globals-only > ~/sql-backups/globals.sqlQuestion: what's in the file? (Look for CREATE ROLE.) Why is this an important piece of a real backup?
Write a shell one-liner that:
- Dumps
sql_exerciseinto a timestamped file under~/sql-backups/ - Also dumps the globals
- Prints a message when done
Hint:
TS=$(date +%Y%m%d-%H%M%S)
pg_dump -U postgres -Fc sql_exercise -f ~/sql-backups/sql_exercise_$TS.dump && \
pg_dumpall -U postgres --globals-only > ~/sql-backups/globals_$TS.sql && \
echo "Backup complete: $TS"This is the kind of thing you'd drop into a cron job or a systemd timer for a real backup strategy.
Read pg_dump --help for -j. Dump a bigger database in parallel (e.g. using the directory format -Fd -j 4). When does parallel dumping actually help?
Use pg_restore --list from Exercise 7 to list what's in the archive, then selectively restore only the city table using -L (a list file) or -t (table).
Read about WAL archiving and pg_basebackup. At a high level, how would you restore your database to its state as of "yesterday at 3:17 PM"? What pieces do you need in place BEFORE the disaster for this to work?
DROP SCHEMA IF EXISTS adv_backup CASCADE;
DROP SCHEMA IF EXISTS adv_backup_copy CASCADE;And from your shell, if you want to remove the backup files:
rm -rf ~/sql-backups