Section 03 — Normalization

What you'll practice: spotting the three classic anomalies (insert, update, delete) in denormalized tables, then rewriting them step by step through 1NF → 2NF → 3NF. You'll also see transitive dependencies and partial dependencies in real data.

Learning objectives

Define 1NF, 2NF, and 3NF in plain English
Spot a repeating group and a multi-valued attribute
Spot a partial dependency (non-key attribute depending on part of a composite PK)
Spot a transitive dependency (non-key attribute depending on another non-key attribute)
Decompose a denormalized table into normalized tables without losing information
Verify your normalized schema with the same sample data

The three normal forms — quick reference

Form	Rule	In English
1NF	Every column holds a single atomic value; no repeating groups	No comma-separated lists; no `phone1`, `phone2`, `phone3` columns
2NF	Already 1NF, and every non-key attribute depends on the entire primary key	Only matters when the PK is composite. If a column only needs part of the PK, it belongs in a different table.
3NF	Already 2NF, and no non-key attribute depends on another non-key attribute	If `employee.dept_code` tells you `dept_location`, then `dept_location` belongs in a `department` table, not on `employee`

Anomalies are why we care. They come in three flavours:

Update anomaly — you have to change the same fact in many rows, and if you miss one, the database contradicts itself
Insert anomaly — you can't record a new fact because there's no row to hang it on (e.g., you can't add a new department without also hiring someone into it)
Delete anomaly — deleting one row accidentally erases facts that should have survived (e.g., firing the last Marketing employee also deletes the fact that Marketing exists)

How to run these exercises

Connect to the sql_exercise database:

psql -U postgres -d sql_exercise

Load the starting (bad) schema and seed data from your shell:

psql -U postgres -d sql_exercise -f 03-database-design/03-normalization/schema.sql
psql -U postgres -d sql_exercise -f 03-database-design/03-normalization/seed.sql

Then inside psql:

SET search_path TO dd_normalization;

Take a look at what you've loaded:

SELECT * FROM bad_student_courses;
SELECT * FROM bad_orders;
SELECT * FROM bad_employees;

Scan the output. You should feel the repetition — that's the point. Now let's fix it.

Exercises

Exercise 1 — Spot the 1NF violations in `bad_student_courses`

Look at the student_phones column. One of the rows has three phone numbers jammed into it separated by commas.

Deliverable: write down (as SQL comments) all the 1NF violations you can find in this table. At minimum:

What is the repeating / multi-valued column?
Why is that a problem? (Think about: how would you write a query for "find every student whose phone starts with 555-03"?)
Is there any other column or concept that feels repetitive across rows?

Try it: try to run the query "find every student with phone number 555-0101". How ugly is it?

-- This is gross:
SELECT DISTINCT student_id, student_name
FROM bad_student_courses
WHERE student_phones LIKE '%555-0101%';

Question: what if there were a phone number 555-010? Would your LIKE query incorrectly match rows containing 555-0101? (Yes. This is why 1NF matters.)

Exercise 2 — Rewrite `bad_student_courses` to 1NF

Create a new table (or set of tables) that captures the same data without any multi-valued columns. Keep all the other flaws for now — you're only fixing 1NF.

Hint: put phones in their own table: student_phone_1nf(student_id, phone). The main table becomes student_courses_1nf with student_phones column removed.

Deliverable:

CREATE TABLE student_phone_1nf (
    student_id TEXT NOT NULL,
    phone      TEXT NOT NULL,
    PRIMARY KEY (student_id, phone)
);

CREATE TABLE student_courses_1nf (
    student_id      TEXT,
    student_name    TEXT,
    student_major   TEXT,
    course_code     TEXT,
    course_title    TEXT,
    course_credits  INTEGER,
    instructor_name TEXT,
    instructor_dept TEXT,
    grade           TEXT,
    PRIMARY KEY (student_id, course_code)
);

Now populate both 1NF tables from the original bad_student_courses. You'll have to INSERT the phone numbers by hand — string_to_array(student_phones, ', ') is a handy psql trick.

Verify: count rows. student_phone_1nf should have one row per phone number (not per enrollment). student_courses_1nf should have one row per (student, course) pair — 7 rows matching the original data.

Exercise 3 — Spot the 2NF violation in `student_courses_1nf`

The primary key of student_courses_1nf is (student_id, course_code). For every other column, ask yourself: does this column depend on the full key, or only on part of it?

student_name — depends on what?
student_major — depends on what?
course_title — depends on what?
course_credits — depends on what?
instructor_name — depends on what?
grade — depends on what?

Deliverable: mark each column as FULL, PARTIAL (student_id), or PARTIAL (course_code).

Question: which columns are causing the 2NF violation, and what update anomaly can you demonstrate?

Try it: update Alice's name to "Alicia Johnson" for just one of her courses. Now run SELECT DISTINCT student_id, student_name FROM student_courses_1nf;. How many "Alice"s are there now?

UPDATE student_courses_1nf
SET student_name = 'Alicia Johnson'
WHERE student_id = 'S001' AND course_code = 'CS101';

SELECT DISTINCT student_id, student_name FROM student_courses_1nf;
-- Congratulations, Alice now has two names.

Exercise 4 — Rewrite to 2NF

Split student_courses_1nf so that every non-key attribute depends on the full primary key of its table. You should end up with something like:

student_2nf(student_id PK, student_name, student_major)
course_2nf(course_code PK, course_title, course_credits, instructor_name, instructor_dept)
enrollment_2nf(student_id, course_code, grade, PK(student_id, course_code))
student_phone_2nf — keep from Exercise 2

Deliverable: the CREATE TABLE statements and INSERTs to populate them from the 1NF tables.

Verify:

SELECT COUNT(*) FROM student_2nf;     -- 3 students
SELECT COUNT(*) FROM course_2nf;      -- 4 courses
SELECT COUNT(*) FROM enrollment_2nf;  -- 7 enrollments

Question: now that student_name lives in only one place, can the "Alice has two names" anomaly happen? Why not?

Exercise 5 — Spot the 3NF violation in `course_2nf`

Look at course_2nf:

course_code PK | course_title | course_credits | instructor_name | instructor_dept

instructor_dept depends on instructor_name, not on course_code. That's a transitive dependency — course_code → instructor_name → instructor_dept.

Deliverable: state in one sentence why this is a problem. Then demonstrate the anomaly:

-- Prof. Knuth's department gets renamed. Update it on one row.
UPDATE course_2nf
SET instructor_dept = 'School of Computing'
WHERE instructor_name = 'Prof. Knuth' AND course_code = 'CS101';

-- Is Prof. Knuth now in two departments?
SELECT DISTINCT instructor_name, instructor_dept FROM course_2nf;

Exercise 6 — Rewrite `course_2nf` to 3NF

Pull the instructor out into its own table. Final schema:

student_3nf (unchanged from 2NF)
instructor_3nf(instructor_id PK, instructor_name, instructor_dept)
course_3nf(course_code PK, course_title, course_credits, instructor_id FK)
enrollment_3nf(student_id, course_code, grade)
student_phone_3nf

Deliverable: the CREATE TABLE statements with proper foreign keys, plus INSERTs to populate them.

Hint: you'll need to assign instructor IDs yourself. Prof. Knuth = 1, Prof. Dijkstra = 2, Prof. Noether = 3, Prof. Cauchy = 4.

Verify: run this join and confirm you get the same student/course/instructor mapping as the original bad_student_courses:

SELECT s.student_name, c.course_code, c.course_title, i.instructor_name, e.grade
FROM enrollment_3nf e
JOIN student_3nf s    ON s.student_id = e.student_id
JOIN course_3nf c     ON c.course_code = e.course_code
JOIN instructor_3nf i ON i.instructor_id = c.instructor_id
ORDER BY s.student_name, c.course_code;

Exercise 7 — `bad_orders` to 1NF

Look at bad_orders. It already happens to be in 1NF (no multi-valued columns — each line item is its own row). Confirm this by checking every column for atomicity.

Deliverable: one-line comment confirming it's already in 1NF, plus identify the composite primary key. (Hint: what uniquely identifies a row? Order + product.)

Exercise 8 — Spot the 2NF violation in `bad_orders`

With composite PK (order_id, product_id):

customer_name, customer_city, customer_zip, order_date depend on order_id alone
product_name, product_category, product_unit_price depend on product_id alone
quantity and line_total depend on both

Every "part-of-key" dependency is a 2NF violation.

Deliverable: write a query that demonstrates the update anomaly. Hint: change a customer's city on one of their orders and then run SELECT DISTINCT customer_id, customer_city FROM bad_orders;.

Exercise 9 — Rewrite `bad_orders` to 3NF in one jump

Because bad_orders has no transitive dependencies between its non-key columns, once you fix 2NF you're also in 3NF. Split into:

customer_3nf(customer_id PK, customer_name, customer_city, customer_zip)
product_3nf(product_id PK, product_name, product_category, product_unit_price)
order_3nf(order_id PK, customer_id FK, order_date)
order_line_3nf(order_id, product_id, quantity, line_total, PK(order_id, product_id), FKs to both)

Deliverable: full CREATE TABLE statements with constraints and FKs, plus INSERTs populating them from bad_orders.

Verify:

SELECT COUNT(*) FROM customer_3nf;    -- 3
SELECT COUNT(*) FROM product_3nf;     -- 5
SELECT COUNT(*) FROM order_3nf;       -- 4
SELECT COUNT(*) FROM order_line_3nf;  -- 8

Then reconstruct the original denormalized view via join:

SELECT o.order_id, p.product_id, o.order_date,
       c.customer_id, c.customer_name, c.customer_city, c.customer_zip,
       p.product_name, p.product_category, p.product_unit_price,
       ol.quantity, ol.line_total
FROM order_line_3nf ol
JOIN order_3nf o    ON o.order_id = ol.order_id
JOIN customer_3nf c ON c.customer_id = o.customer_id
JOIN product_3nf p  ON p.product_id = ol.product_id
ORDER BY o.order_id, p.product_id;

The result should match the original bad_orders content exactly.

Exercise 10 — Spot the 3NF violation in `bad_employees`

bad_employees already has a PK (employee_id) and no composite key, so 2NF is automatic. But look at the dependency chain:

employee_id → dept_code → dept_name, dept_location, dept_manager

That middle arrow is the transitive dependency. dept_name doesn't depend on employee_id — it depends on dept_code, which depends on employee_id.

Deliverable: list all three anomalies that follow:

Update anomaly: the Engineering department moves from London to Bristol. How many rows must you update?
Insert anomaly: you want to add a new department "Research" in Oxford. Can you? (What do you need in order to INSERT?)
Delete anomaly: Erik (the only Marketing employee) quits. After you DELETE his row, what have you also lost from the database?

Demonstrate at least one of these anomalies by running actual SQL against bad_employees.

Exercise 11 — Rewrite `bad_employees` to 3NF

Split into:

department_3nf(dept_code PK, dept_name, dept_location, dept_manager)
employee_3nf(employee_id PK, first_name, last_name, email, salary, dept_code FK)

Deliverable: CREATE TABLE statements with constraints and FKs, plus INSERTs populating them from bad_employees.

Verify: after populating, test the three anomalies again:

Move Engineering to Bristol:

UPDATE department_3nf SET dept_location = 'Bristol' WHERE dept_code = 'ENG';

How many rows did you change? (1.)

Add a Research department with no employees:

INSERT INTO department_3nf VALUES ('RES', 'Research', 'Oxford', 'New Hire');

(Works now.)

Delete Erik. Does Marketing still exist?

DELETE FROM employee_3nf WHERE first_name = 'Erik';
SELECT * FROM department_3nf WHERE dept_code = 'MKT';

(Yes — the department row is untouched.)

Exercise 12 — Your turn: normalize a new table

Here's a fresh denormalized table. Do NOT run it yet — just read it:

CREATE TABLE bad_movie_rentals (
    rental_id         INTEGER,
    customer_id       INTEGER,
    customer_name     TEXT,
    customer_email    TEXT,
    customer_tier     TEXT,           -- 'BRONZE','SILVER','GOLD'
    tier_discount_pct INTEGER,        -- depends on tier: BRONZE=0, SILVER=5, GOLD=10
    movie_id          INTEGER,
    movie_title       TEXT,
    movie_genre       TEXT,
    movie_director    TEXT,
    director_country  TEXT,           -- depends on director
    rented_on         DATE,
    returned_on       DATE,
    fee               NUMERIC(6,2)
);

Deliverable: identify every normalization issue (1NF, 2NF, 3NF) and produce the final 3NF schema as CREATE TABLE statements. You do not have to run them — this is a pen-and-paper exercise graded against your solutions file.

Hints: the PK of the denormalized table is rental_id, not composite — so 2NF is automatic. The real work is 3NF. Find two transitive dependencies. (Yes, two.)

Exercise 13 — Compare against the reference

Run the reference solution to see one valid normalized form:

psql -U postgres -d sql_exercise \
     -f solutions/03-database-design/03-normalization/solutions.sql

Read the reference and compare with your work. Where did you agree? Where did you split differently? (Different names and slightly different structures are fine — the logical decomposition is what matters.)

Bonus challenges

Bonus 1 — Boyce-Codd Normal Form (BCNF)

3NF is "good enough" for almost every real system. BCNF is a stricter version that closes a loophole where a non-key attribute is itself a determinant. Read the first two paragraphs of the Wikipedia BCNF page and give an example of a table that is 3NF but not BCNF.

Bonus 2 — When NOT to normalize

Normalization reduces redundancy but introduces joins. In a read-heavy reporting system (think: analytics dashboards, OLAP), you may deliberately denormalize for speed. Write a one-paragraph note explaining when denormalization is acceptable and what tradeoff you're making.

Bonus 3 — Functional dependency diagram

For bad_movie_rentals from Exercise 12, draw a functional dependency diagram on paper (arrows from determinants to dependents). Circle the transitive dependencies. This is the "pen and paper" version of what normalization is really doing.

Bonus 4 — Find a 2NF violation in the wild

Open any side project you've worked on. Find a table where you might have accidentally stored something that violates 2NF or 3NF. (Common culprit: a users table with country_code AND country_name.) Write a comment describing what you found.

Cleanup

DROP SCHEMA dd_normalization CASCADE;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Section 03 — Normalization

Learning objectives

The three normal forms — quick reference

How to run these exercises

Exercises

Exercise 1 — Spot the 1NF violations in `bad_student_courses`

Exercise 2 — Rewrite `bad_student_courses` to 1NF

Exercise 3 — Spot the 2NF violation in `student_courses_1nf`

Exercise 4 — Rewrite to 2NF

Exercise 5 — Spot the 3NF violation in `course_2nf`

Exercise 6 — Rewrite `course_2nf` to 3NF

Exercise 7 — `bad_orders` to 1NF

Exercise 8 — Spot the 2NF violation in `bad_orders`

Exercise 9 — Rewrite `bad_orders` to 3NF in one jump

Exercise 10 — Spot the 3NF violation in `bad_employees`

Exercise 11 — Rewrite `bad_employees` to 3NF

Exercise 12 — Your turn: normalize a new table

Exercise 13 — Compare against the reference

Bonus challenges

Bonus 1 — Boyce-Codd Normal Form (BCNF)

Bonus 2 — When NOT to normalize

Bonus 3 — Functional dependency diagram

Bonus 4 — Find a 2NF violation in the wild

Cleanup

FilesExpand file tree

exercises.md

Latest commit

History

exercises.md

File metadata and controls

Section 03 — Normalization

Learning objectives

The three normal forms — quick reference

How to run these exercises

Exercises

Exercise 1 — Spot the 1NF violations in bad_student_courses

Exercise 2 — Rewrite bad_student_courses to 1NF

Exercise 3 — Spot the 2NF violation in student_courses_1nf

Exercise 4 — Rewrite to 2NF

Exercise 5 — Spot the 3NF violation in course_2nf

Exercise 6 — Rewrite course_2nf to 3NF

Exercise 7 — bad_orders to 1NF

Exercise 8 — Spot the 2NF violation in bad_orders

Exercise 9 — Rewrite bad_orders to 3NF in one jump

Exercise 10 — Spot the 3NF violation in bad_employees

Exercise 11 — Rewrite bad_employees to 3NF

Exercise 12 — Your turn: normalize a new table

Exercise 13 — Compare against the reference

Bonus challenges

Bonus 1 — Boyce-Codd Normal Form (BCNF)

Bonus 2 — When NOT to normalize

Bonus 3 — Functional dependency diagram

Bonus 4 — Find a 2NF violation in the wild

Cleanup

Exercise 1 — Spot the 1NF violations in `bad_student_courses`

Exercise 2 — Rewrite `bad_student_courses` to 1NF

Exercise 3 — Spot the 2NF violation in `student_courses_1nf`

Exercise 5 — Spot the 3NF violation in `course_2nf`

Exercise 6 — Rewrite `course_2nf` to 3NF

Exercise 7 — `bad_orders` to 1NF

Exercise 8 — Spot the 2NF violation in `bad_orders`

Exercise 9 — Rewrite `bad_orders` to 3NF in one jump

Exercise 10 — Spot the 3NF violation in `bad_employees`

Exercise 11 — Rewrite `bad_employees` to 3NF