What you'll practice: spotting the three classic anomalies (insert, update, delete) in denormalized tables, then rewriting them step by step through 1NF → 2NF → 3NF. You'll also see transitive dependencies and partial dependencies in real data.
- Define 1NF, 2NF, and 3NF in plain English
- Spot a repeating group and a multi-valued attribute
- Spot a partial dependency (non-key attribute depending on part of a composite PK)
- Spot a transitive dependency (non-key attribute depending on another non-key attribute)
- Decompose a denormalized table into normalized tables without losing information
- Verify your normalized schema with the same sample data
| Form | Rule | In English |
|---|---|---|
| 1NF | Every column holds a single atomic value; no repeating groups | No comma-separated lists; no phone1, phone2, phone3 columns |
| 2NF | Already 1NF, and every non-key attribute depends on the entire primary key | Only matters when the PK is composite. If a column only needs part of the PK, it belongs in a different table. |
| 3NF | Already 2NF, and no non-key attribute depends on another non-key attribute | If employee.dept_code tells you dept_location, then dept_location belongs in a department table, not on employee |
Anomalies are why we care. They come in three flavours:
- Update anomaly — you have to change the same fact in many rows, and if you miss one, the database contradicts itself
- Insert anomaly — you can't record a new fact because there's no row to hang it on (e.g., you can't add a new department without also hiring someone into it)
- Delete anomaly — deleting one row accidentally erases facts that should have survived (e.g., firing the last Marketing employee also deletes the fact that Marketing exists)
Connect to the sql_exercise database:
psql -U postgres -d sql_exerciseLoad the starting (bad) schema and seed data from your shell:
psql -U postgres -d sql_exercise -f 03-database-design/03-normalization/schema.sql
psql -U postgres -d sql_exercise -f 03-database-design/03-normalization/seed.sqlThen inside psql:
SET search_path TO dd_normalization;Take a look at what you've loaded:
SELECT * FROM bad_student_courses;
SELECT * FROM bad_orders;
SELECT * FROM bad_employees;Scan the output. You should feel the repetition — that's the point. Now let's fix it.
Look at the student_phones column. One of the rows has three phone numbers jammed into it separated by commas.
Deliverable: write down (as SQL comments) all the 1NF violations you can find in this table. At minimum:
- What is the repeating / multi-valued column?
- Why is that a problem? (Think about: how would you write a query for "find every student whose phone starts with 555-03"?)
- Is there any other column or concept that feels repetitive across rows?
Try it: try to run the query "find every student with phone number 555-0101". How ugly is it?
-- This is gross:
SELECT DISTINCT student_id, student_name
FROM bad_student_courses
WHERE student_phones LIKE '%555-0101%';Question: what if there were a phone number 555-010? Would your LIKE query incorrectly match rows containing 555-0101? (Yes. This is why 1NF matters.)
Create a new table (or set of tables) that captures the same data without any multi-valued columns. Keep all the other flaws for now — you're only fixing 1NF.
Hint: put phones in their own table: student_phone_1nf(student_id, phone). The main table becomes student_courses_1nf with student_phones column removed.
Deliverable:
CREATE TABLE student_phone_1nf (
student_id TEXT NOT NULL,
phone TEXT NOT NULL,
PRIMARY KEY (student_id, phone)
);
CREATE TABLE student_courses_1nf (
student_id TEXT,
student_name TEXT,
student_major TEXT,
course_code TEXT,
course_title TEXT,
course_credits INTEGER,
instructor_name TEXT,
instructor_dept TEXT,
grade TEXT,
PRIMARY KEY (student_id, course_code)
);Now populate both 1NF tables from the original bad_student_courses. You'll have to INSERT the phone numbers by hand — string_to_array(student_phones, ', ') is a handy psql trick.
Verify: count rows. student_phone_1nf should have one row per phone number (not per enrollment). student_courses_1nf should have one row per (student, course) pair — 7 rows matching the original data.
The primary key of student_courses_1nf is (student_id, course_code). For every other column, ask yourself: does this column depend on the full key, or only on part of it?
student_name— depends on what?student_major— depends on what?course_title— depends on what?course_credits— depends on what?instructor_name— depends on what?grade— depends on what?
Deliverable: mark each column as FULL, PARTIAL (student_id), or PARTIAL (course_code).
Question: which columns are causing the 2NF violation, and what update anomaly can you demonstrate?
Try it: update Alice's name to "Alicia Johnson" for just one of her courses. Now run SELECT DISTINCT student_id, student_name FROM student_courses_1nf;. How many "Alice"s are there now?
UPDATE student_courses_1nf
SET student_name = 'Alicia Johnson'
WHERE student_id = 'S001' AND course_code = 'CS101';
SELECT DISTINCT student_id, student_name FROM student_courses_1nf;
-- Congratulations, Alice now has two names.Split student_courses_1nf so that every non-key attribute depends on the full primary key of its table. You should end up with something like:
student_2nf(student_id PK, student_name, student_major)course_2nf(course_code PK, course_title, course_credits, instructor_name, instructor_dept)enrollment_2nf(student_id, course_code, grade, PK(student_id, course_code))student_phone_2nf— keep from Exercise 2
Deliverable: the CREATE TABLE statements and INSERTs to populate them from the 1NF tables.
Verify:
SELECT COUNT(*) FROM student_2nf; -- 3 students
SELECT COUNT(*) FROM course_2nf; -- 4 courses
SELECT COUNT(*) FROM enrollment_2nf; -- 7 enrollmentsQuestion: now that student_name lives in only one place, can the "Alice has two names" anomaly happen? Why not?
Look at course_2nf:
course_code PK | course_title | course_credits | instructor_name | instructor_dept
instructor_dept depends on instructor_name, not on course_code. That's a transitive dependency — course_code → instructor_name → instructor_dept.
Deliverable: state in one sentence why this is a problem. Then demonstrate the anomaly:
-- Prof. Knuth's department gets renamed. Update it on one row.
UPDATE course_2nf
SET instructor_dept = 'School of Computing'
WHERE instructor_name = 'Prof. Knuth' AND course_code = 'CS101';
-- Is Prof. Knuth now in two departments?
SELECT DISTINCT instructor_name, instructor_dept FROM course_2nf;Pull the instructor out into its own table. Final schema:
student_3nf(unchanged from 2NF)instructor_3nf(instructor_id PK, instructor_name, instructor_dept)course_3nf(course_code PK, course_title, course_credits, instructor_id FK)enrollment_3nf(student_id, course_code, grade)student_phone_3nf
Deliverable: the CREATE TABLE statements with proper foreign keys, plus INSERTs to populate them.
Hint: you'll need to assign instructor IDs yourself. Prof. Knuth = 1, Prof. Dijkstra = 2, Prof. Noether = 3, Prof. Cauchy = 4.
Verify: run this join and confirm you get the same student/course/instructor mapping as the original bad_student_courses:
SELECT s.student_name, c.course_code, c.course_title, i.instructor_name, e.grade
FROM enrollment_3nf e
JOIN student_3nf s ON s.student_id = e.student_id
JOIN course_3nf c ON c.course_code = e.course_code
JOIN instructor_3nf i ON i.instructor_id = c.instructor_id
ORDER BY s.student_name, c.course_code;Look at bad_orders. It already happens to be in 1NF (no multi-valued columns — each line item is its own row). Confirm this by checking every column for atomicity.
Deliverable: one-line comment confirming it's already in 1NF, plus identify the composite primary key. (Hint: what uniquely identifies a row? Order + product.)
With composite PK (order_id, product_id):
customer_name,customer_city,customer_zip,order_datedepend on order_id aloneproduct_name,product_category,product_unit_pricedepend on product_id alonequantityandline_totaldepend on both
Every "part-of-key" dependency is a 2NF violation.
Deliverable: write a query that demonstrates the update anomaly. Hint: change a customer's city on one of their orders and then run SELECT DISTINCT customer_id, customer_city FROM bad_orders;.
Because bad_orders has no transitive dependencies between its non-key columns, once you fix 2NF you're also in 3NF. Split into:
customer_3nf(customer_id PK, customer_name, customer_city, customer_zip)product_3nf(product_id PK, product_name, product_category, product_unit_price)order_3nf(order_id PK, customer_id FK, order_date)order_line_3nf(order_id, product_id, quantity, line_total, PK(order_id, product_id), FKs to both)
Deliverable: full CREATE TABLE statements with constraints and FKs, plus INSERTs populating them from bad_orders.
Verify:
SELECT COUNT(*) FROM customer_3nf; -- 3
SELECT COUNT(*) FROM product_3nf; -- 5
SELECT COUNT(*) FROM order_3nf; -- 4
SELECT COUNT(*) FROM order_line_3nf; -- 8Then reconstruct the original denormalized view via join:
SELECT o.order_id, p.product_id, o.order_date,
c.customer_id, c.customer_name, c.customer_city, c.customer_zip,
p.product_name, p.product_category, p.product_unit_price,
ol.quantity, ol.line_total
FROM order_line_3nf ol
JOIN order_3nf o ON o.order_id = ol.order_id
JOIN customer_3nf c ON c.customer_id = o.customer_id
JOIN product_3nf p ON p.product_id = ol.product_id
ORDER BY o.order_id, p.product_id;The result should match the original bad_orders content exactly.
bad_employees already has a PK (employee_id) and no composite key, so 2NF is automatic. But look at the dependency chain:
employee_id → dept_code → dept_name, dept_location, dept_manager
That middle arrow is the transitive dependency. dept_name doesn't depend on employee_id — it depends on dept_code, which depends on employee_id.
Deliverable: list all three anomalies that follow:
- Update anomaly: the Engineering department moves from London to Bristol. How many rows must you update?
- Insert anomaly: you want to add a new department "Research" in Oxford. Can you? (What do you need in order to
INSERT?) - Delete anomaly: Erik (the only Marketing employee) quits. After you
DELETEhis row, what have you also lost from the database?
Demonstrate at least one of these anomalies by running actual SQL against bad_employees.
Split into:
department_3nf(dept_code PK, dept_name, dept_location, dept_manager)employee_3nf(employee_id PK, first_name, last_name, email, salary, dept_code FK)
Deliverable: CREATE TABLE statements with constraints and FKs, plus INSERTs populating them from bad_employees.
Verify: after populating, test the three anomalies again:
-
Move Engineering to Bristol:
UPDATE department_3nf SET dept_location = 'Bristol' WHERE dept_code = 'ENG';
How many rows did you change? (1.)
-
Add a Research department with no employees:
INSERT INTO department_3nf VALUES ('RES', 'Research', 'Oxford', 'New Hire');
(Works now.)
-
Delete Erik. Does Marketing still exist?
DELETE FROM employee_3nf WHERE first_name = 'Erik'; SELECT * FROM department_3nf WHERE dept_code = 'MKT';
(Yes — the department row is untouched.)
Here's a fresh denormalized table. Do NOT run it yet — just read it:
CREATE TABLE bad_movie_rentals (
rental_id INTEGER,
customer_id INTEGER,
customer_name TEXT,
customer_email TEXT,
customer_tier TEXT, -- 'BRONZE','SILVER','GOLD'
tier_discount_pct INTEGER, -- depends on tier: BRONZE=0, SILVER=5, GOLD=10
movie_id INTEGER,
movie_title TEXT,
movie_genre TEXT,
movie_director TEXT,
director_country TEXT, -- depends on director
rented_on DATE,
returned_on DATE,
fee NUMERIC(6,2)
);Deliverable: identify every normalization issue (1NF, 2NF, 3NF) and produce the final 3NF schema as CREATE TABLE statements. You do not have to run them — this is a pen-and-paper exercise graded against your solutions file.
Hints: the PK of the denormalized table is rental_id, not composite — so 2NF is automatic. The real work is 3NF. Find two transitive dependencies. (Yes, two.)
Run the reference solution to see one valid normalized form:
psql -U postgres -d sql_exercise \
-f solutions/03-database-design/03-normalization/solutions.sqlRead the reference and compare with your work. Where did you agree? Where did you split differently? (Different names and slightly different structures are fine — the logical decomposition is what matters.)
3NF is "good enough" for almost every real system. BCNF is a stricter version that closes a loophole where a non-key attribute is itself a determinant. Read the first two paragraphs of the Wikipedia BCNF page and give an example of a table that is 3NF but not BCNF.
Normalization reduces redundancy but introduces joins. In a read-heavy reporting system (think: analytics dashboards, OLAP), you may deliberately denormalize for speed. Write a one-paragraph note explaining when denormalization is acceptable and what tradeoff you're making.
For bad_movie_rentals from Exercise 12, draw a functional dependency diagram on paper (arrows from determinants to dependents). Circle the transitive dependencies. This is the "pen and paper" version of what normalization is really doing.
Open any side project you've worked on. Find a table where you might have accidentally stored something that violates 2NF or 3NF. (Common culprit: a users table with country_code AND country_name.) Write a comment describing what you found.
DROP SCHEMA dd_normalization CASCADE;