Skip to content

Latest commit

 

History

History
263 lines (167 loc) · 7.41 KB

File metadata and controls

263 lines (167 loc) · 7.41 KB

Section 19 — Backup & Restore

What you'll practice: using pg_dump and pg_restore from the shell, importing/exporting CSV with \copy and COPY, and thinking about backup strategies.

Unlike most sections in this course, the commands here run in your shell, not inside psql. pg_dump and pg_restore are command-line tools that ship with PostgreSQL.

Learning objectives

  • Dump an entire database with pg_dump
  • Dump a single schema or table
  • Use the plain SQL format vs the custom archive format (-Fc)
  • Restore with psql -f (plain SQL) and with pg_restore (custom format)
  • Export a table to CSV with \copy
  • Import a CSV back with \copy
  • Know the difference between COPY (server-side) and \copy (client-side)

Setup

Load the sample schema:

psql -U postgres -d sql_exercise -f 04-advanced-databases/19-backup-and-restore/schema.sql

This creates a tiny adv_backup schema with one city table.

Create a scratch directory to hold your dumps — anywhere you like, e.g. ~/sql-backups/:

mkdir -p ~/sql-backups

Exercises

Exercise 1 — Dump a single table as plain SQL

pg_dump -U postgres -d sql_exercise \
        --schema=adv_backup --table='adv_backup.city' \
        > ~/sql-backups/city.sql

Then look at the file:

less ~/sql-backups/city.sql

Question: what does the file contain? Can a human read it? What commands does it use to recreate the data?


Exercise 2 — Restore from plain SQL

Drop the table, then restore from the file:

psql -U postgres -d sql_exercise -c 'DROP TABLE adv_backup.city CASCADE;'
psql -U postgres -d sql_exercise -f ~/sql-backups/city.sql
psql -U postgres -d sql_exercise -c 'SELECT COUNT(*) FROM adv_backup.city;'

Expected: 8.


Exercise 3 — Dump just the schema (no data)

pg_dump -U postgres -d sql_exercise --schema-only --schema=adv_backup \
        > ~/sql-backups/adv_backup_schema.sql

Question: look at the file. Which statements are there? (CREATE TABLE, CREATE SEQUENCE, ALTER TABLE ... ADD CONSTRAINT, etc. — but no INSERTs.)


Exercise 4 — Dump just the data (no schema)

pg_dump -U postgres -d sql_exercise --data-only --schema=adv_backup \
        > ~/sql-backups/adv_backup_data.sql

Question: does this file have CREATE TABLEs or just COPY / INSERT statements?


Exercise 5 — Custom-format dump

The plain SQL format is readable but slow and can't be restored in parallel. The custom format (-Fc) is compressed, binary, and supports selective restore:

pg_dump -U postgres -d sql_exercise -Fc --schema=adv_backup \
        -f ~/sql-backups/adv_backup.dump

Question: try opening the file in less — is it human-readable? Compare its size to the plain SQL version.


Exercise 6 — Restore from a custom-format dump with pg_restore

First drop the schema:

psql -U postgres -d sql_exercise -c 'DROP SCHEMA adv_backup CASCADE;'

Then restore:

pg_restore -U postgres -d sql_exercise ~/sql-backups/adv_backup.dump
psql -U postgres -d sql_exercise -c 'SELECT COUNT(*) FROM adv_backup.city;'

Expected: 8.

Question: pg_restore has a -j N flag for parallel restore. When would that matter?


Exercise 7 — List what's in a custom-format dump without restoring

pg_restore --list ~/sql-backups/adv_backup.dump

Question: what does the output look like? This is the dump's "table of contents" — useful when you want to restore only a subset.


Exercise 8 — Dump the ENTIRE database

pg_dump -U postgres -Fc sql_exercise -f ~/sql-backups/sql_exercise_full.dump

Question: how big is the file? ls -lh ~/sql-backups/sql_exercise_full.dump.

(If you want to see a plain SQL version of the whole DB for comparison: pg_dump -U postgres sql_exercise > ~/sql-backups/sql_exercise_full.sql.)


Exercise 9 — CSV export with \copy (client-side)

Back in psql:

\copy adv_backup.city TO '~/sql-backups/city.csv' CSV HEADER

Verify: cat ~/sql-backups/city.csv (from your shell) — you should see a comma-separated file with a header row.

Note: \copy is a psql meta-command that runs client-side, so it uses a path on YOUR machine. COPY ... TO '/path' (no backslash) runs server-side and needs a path the server can write to.


Exercise 10 — CSV import with \copy

Empty the table and reload from the CSV:

TRUNCATE adv_backup.city RESTART IDENTITY;
\copy adv_backup.city (name, country, population) FROM '~/sql-backups/city.csv' CSV HEADER
SELECT COUNT(*) FROM adv_backup.city;

Expected: 8.


Exercise 11 — Export a query result (not a full table)

\copy (SELECT name, population FROM adv_backup.city WHERE country = 'GB' ORDER BY population DESC) TO '~/sql-backups/gb_cities.csv' CSV HEADER

Verify: cat ~/sql-backups/gb_cities.csv.


Exercise 12 — Dump with a schema prefix change (cross-schema restore)

Dump the adv_backup schema, then use sed (or any text tool) to rewrite it into a different schema name:

pg_dump -U postgres -d sql_exercise --schema=adv_backup > ~/sql-backups/adv_backup.sql
sed 's/adv_backup/adv_backup_copy/g' ~/sql-backups/adv_backup.sql > ~/sql-backups/adv_backup_copy.sql
psql -U postgres -d sql_exercise -f ~/sql-backups/adv_backup_copy.sql

Verify:

psql -U postgres -d sql_exercise -c '\dn'
psql -U postgres -d sql_exercise -c 'SELECT COUNT(*) FROM adv_backup_copy.city;'

Question: this is a quick-and-dirty way to "duplicate a schema". When would you do this? (Hint: making a snapshot before a risky migration.)


Exercise 13 — Backup the globals

pg_dump dumps one database at a time. It does not dump cluster-level objects like roles. For those, use pg_dumpall --globals-only:

pg_dumpall -U postgres --globals-only > ~/sql-backups/globals.sql

Question: what's in the file? (Look for CREATE ROLE.) Why is this an important piece of a real backup?


Exercise 14 — Write a backup one-liner

Write a shell one-liner that:

  1. Dumps sql_exercise into a timestamped file under ~/sql-backups/
  2. Also dumps the globals
  3. Prints a message when done

Hint:

TS=$(date +%Y%m%d-%H%M%S)
pg_dump -U postgres -Fc sql_exercise -f ~/sql-backups/sql_exercise_$TS.dump && \
pg_dumpall -U postgres --globals-only > ~/sql-backups/globals_$TS.sql && \
echo "Backup complete: $TS"

This is the kind of thing you'd drop into a cron job or a systemd timer for a real backup strategy.


Bonus challenges

Bonus 1 — Parallel dump

Read pg_dump --help for -j. Dump a bigger database in parallel (e.g. using the directory format -Fd -j 4). When does parallel dumping actually help?

Bonus 2 — Selective restore

Use pg_restore --list from Exercise 7 to list what's in the archive, then selectively restore only the city table using -L (a list file) or -t (table).

Bonus 3 — Point-in-time recovery (PITR)

Read about WAL archiving and pg_basebackup. At a high level, how would you restore your database to its state as of "yesterday at 3:17 PM"? What pieces do you need in place BEFORE the disaster for this to work?


Cleanup

DROP SCHEMA IF EXISTS adv_backup      CASCADE;
DROP SCHEMA IF EXISTS adv_backup_copy CASCADE;

And from your shell, if you want to remove the backup files:

rm -rf ~/sql-backups