Why can't we just store all our data in one giant table, like an Excel sheet? Because of Database Anomalies. An anomaly is a "glitch" that occurs when a table's design is logically flawed, leading to inconsistent or missing data.
You cannot add a piece of data because another piece of data is missing.
- Example: You have a table
Student_Courses. You cannot record a newCourseuntil you have at least oneStudentenrolled in it. (BecauseStudent_IDis forced to be NOT NULL).
You have to change a piece of data in multiple rows, risking a "Partial Update."
- Example: If a student's
Addressis stored in 50 different rows (one for each course they take), and they move house, you must update 50 rows. If you miss one, the student appears to live in two different places!
Deleting one piece of information accidentally deletes another, unrelated piece of information.
- Example: If you delete the last student enrolled in the "Advanced SQL" course, you accidentally delete the record that the "Advanced SQL" course even exists!
Anomalies are the symptoms of Redundancy (storing the same data twice). Normalization is the "medicine" we use to cure these symptoms by splitting tables and building relationships.
Redundancy consumes disk space, but more importantly, it creates Cognitive Load for the developer.
- In a "Flat" table, the developer must remember to update every occurrence of a value.
- In a "Normalized" table, the value exists in one place only. Changing it once updates it for the whole system.
The "Employee-Dept" Mess:
| Emp_Name | Dept_Name | Dept_Head |
|---|---|---|
| John | IT | Mr. Smith |
| Sara | IT | Mr. Smith |
If Mr. Smith retires, you must update 2 rows. If the IT department has 5,000 employees, you must update 5,000 rows. This is an Update Anomaly.
- Task 1: Name and briefly explain the three types of database anomalies.
- Task 2: If deleting a row for a
Salecauses you to lose the only record of aCustomer's phone number, which anomaly has occurred?