Normalization is the process of reorganizing data in a database so that it meets two basic requirements:
- No redundancy of data: All data is stored in only one place.
- Logical data dependencies: All related data items are stored together.
Need for normalization:
- It eliminates redundant data.
- It reduces chances of data error.
- It is important because it allows the database to take up less disk space.
- It helps increase performance.
- It improves data integrity and consistency.
Redundancy is at the root of several problems associated with relational schemas.
Problems caused by redundancy:
- Redundant storage: Some information is stored repeatedly.
- Update anomalies: If one copy of such repeated data is updated, inconsistencies are created unless all other copies are similarly updated.
- Insertion anomalies: When inserting new records, repeated information is added to the schema.
- Deletion anomalies: Deleting a record can also delete associated important information, leading to data loss.
-
Redundant storage:
Information aboutDeptID,DeptName, andDeptLocis repeated in the table. -
Update anomalies:
If theDeptLocforDeptID 101is changed from Pune to Chennai, this creates inconsistencies since multiple records will have to be updated. -
Insertion anomalies:
If we want to add a new tuple, such as(5, EEE, 50000)forDeptID 101, the repeated data for(101, XYZ, Pune)will occur. -
Deletion anomalies:
If we delete a record withEmpID 4, information aboutDeptID 102,DeptName PQR, andDeptLoc Mumbaiwill also be deleted. This causes data loss.
First Normal Form (1NF) is the most basic level of database normalization. A database table is said to be in 1NF if it satisfies the following conditions:
Key Rules for 1NF:
- Atomic Values: Each cell of the table must contain a single value (no multiple or composite values in a single cell).
- Unique Columns: Each column must contain values of the same type (homogeneity).
- Uniqueness of Rows: Each row must be unique and identifiable (usually achieved using a primary key).
- No Repeating Groups: There should be no repeating groups or arrays in a table.
Example Before Applying 1NF:
| StudentID | Name | Courses |
|---|---|---|
| 1 | John | Math, Physics |
| 2 | Alice | Chemistry, Biology |
| 3 | Bob | Math, Chemistry |
Issues:
The column "Courses" contains multiple values (not atomic).
This violates the rule of atomicity in 1NF.
Conversion to 1NF:
| StudentID | Name | Course |
|---|---|---|
| 1 | John | Math |
| 1 | John | Physics |
| 2 | Alice | Chemistry |
| 2 | Alice | Biology |
| 3 | Bob | Math |
| 3 | Bob | Chemistry |
Fixes:
Split the "Courses" column into individual rows so that each cell contains a single atomic value.
Maintain a unique combination of rows (e.g., using StudentID and Course).
Benefits of 1NF:
- Prevents ambiguity in data.
- Ensures data is structured properly for relational database design.
- Simplifies querying, as each piece of data is stored in its own field.
Definition: A table is in 2NF if:
- It is in 1NF (data is atomic, no repeating groups).
- It has no partial dependencies (all non-prime attributes depend on the entire primary key, not just a part of it).
Simple Definition:
A table is in 2NF if every non-key column depends on the entire primary key, not just a part of it.
Let’s say your table’s primary key is made of two columns (a composite key). If a non-key column depends on only one of those two — that’s a partial dependency ❌
2NF removes that.
| Student_ID | Course_ID | Student_Name | Course_Name |
|---|---|---|---|
| 1 | C1 | Alice | DBMS |
| 2 | C1 | Bob | DBMS |
| 1 | C2 | Alice | OS |
Composite Primary Key: (Student_ID, Course_ID)
(because one student can take many courses)
| Column | Depends On |
|---|---|
Student_Name |
Student_ID ✅ (only part of the key) |
Course_Name |
Course_ID ✅ (only part of the key) |
So both Student_Name and Course_Name depend on part of the key — not the whole (Student_ID, Course_ID) pair.
➡️ ❌ Not in 2NF (has partial dependencies).
Split the table into 3 smaller ones:
1️⃣ Students Table
Students(Student_ID, Student_Name)| Student_ID | Student_Name |
|---|---|
| 1 | Alice |
| 2 | Bob |
2️⃣ Courses Table
Courses(Course_ID, Course_Name)| Course_ID | Course_Name |
|---|---|
| C1 | DBMS |
| C2 | OS |
3️⃣ Enrollments Table
Enrollments(Student_ID, Course_ID)| Student_ID | Course_ID |
|---|---|
| 1 | C1 |
| 2 | C1 |
| 1 | C2 |
✅ Now:
- Every non-key attribute depends on the whole key (no partial dependencies).
- So this structure is in 2NF.
Would you like me to explain 3NF next (it’s actually easier once you get this)?
Benefits of 2NF:
- No partial dependencies:
EmpName is dependent only on EmpID.
ProjectName is dependent only on ProjectID. - Reduced redundancy:
Employee and project details are stored only once. - Improved consistency: Updates to employee or project names happen in one place, avoiding anomalies.
Definition: A table is in Third Normal Form (3NF) if: It is in Second Normal Form (2NF). It has no transitive dependencies.
Why Normalize to 3NF? To remove transitive dependencies and ensure that non-prime attributes have a direct relationship with the primary key, reducing redundancy and improving data integrity.
Example:
Table in 2NF (but not in 3NF):
Student Table
| StudentID | StudentName | DeptID | DeptName | DeptHead |
|---|---|---|---|---|
| 1 | Alice | D1 | Science | Dr. Smith |
| 2 | Bob | D2 | Arts | Dr. Taylor |
| 3 | Charlie | D1 | Science | Dr. Smith |
Primary Key: StudentID
Non-prime attributes: StudentName, DeptID, DeptName, DeptHead.
Issues:
- Transitive Dependency: DeptName and DeptHead depend on DeptID, which in turn depends on StudentID. StudentID → DeptID → DeptName, DeptHead.
- Redundancy: DeptName and DeptHead repeat for the same DeptID.
Converting to 3NF:
Step 1: Identify Transitive Dependencies
DeptName and DeptHead depend on DeptID, not directly on StudentID.
Step 2: Split the Table to Remove Transitive Dependencies
Student Table:
| StudentID | StudentName | DeptID |
|---|---|---|
| 1 | Alice | D1 |
| 2 | Bob | D2 |
| 3 | Charlie | D1 |
Department Table:
| DeptID | DeptName | DeptHead |
|---|---|---|
| D1 | Science | Dr. Smith |
| D2 | Arts | Dr. Taylor |
Benefits of 3NF:
- No transitive dependency: Non-prime attributes (DeptName, DeptHead) now depend only on the primary key of their respective table (DeptID in Department table).
- Reduced redundancy: Department details are stored only once in the Department table.
- Improved consistency: Updating department information affects only the Department table.
Definition: A table is in BCNF if:
It is in 3NF. For every functional dependency X→Y, X is a superkey (i.e., it uniquely identifies rows).
Example:
Table in 3NF (but not in BCNF):
| CourseID | Instructor | Room |
|---|---|---|
| C101 | Dr. Smith | R1 |
| C102 | Dr. Brown | R2 |
| C101 | Dr. Smith | R2 |
Functional Dependencies:
CourseID → Instructor
Instructor → Room
Issue:
Instructor → Room violates BCNF because Instructor is not a superkey.
Conversion to BCNF:
Course Table:
| CourseID | Instructor |
|---|---|
| C101 | Dr. Smith |
| C102 | Dr. Brown |
Instructor Table:
| Instructor | Room |
|---|---|
| Dr. Smith | R1 |
| Dr. Brown | R2 |
| Normal Form | Requirement | Goal |
|---|---|---|
| 2NF | No partial dependencies | Eliminate redundancy from composite keys. |
| 3NF | No transitive dependencies | Ensure non-prime attributes depend only on the primary key. |
| BCNF | Every determinant is a superkey | Ensure all functional dependencies are preserved. |