Skip to content

Latest commit

 

History

History
291 lines (214 loc) · 9.55 KB

File metadata and controls

291 lines (214 loc) · 9.55 KB

Normalization:

Normalization is the process of reorganizing data in a database so that it meets two basic requirements:

  1. No redundancy of data: All data is stored in only one place.
  2. Logical data dependencies: All related data items are stored together.

Need for normalization:

  1. It eliminates redundant data.
  2. It reduces chances of data error.
  3. It is important because it allows the database to take up less disk space.
  4. It helps increase performance.
  5. It improves data integrity and consistency.

Redundancy and Its Problems

Redundancy is at the root of several problems associated with relational schemas.

Problems caused by redundancy:

  1. Redundant storage: Some information is stored repeatedly.
  2. Update anomalies: If one copy of such repeated data is updated, inconsistencies are created unless all other copies are similarly updated.
  3. Insertion anomalies: When inserting new records, repeated information is added to the schema.
  4. Deletion anomalies: Deleting a record can also delete associated important information, leading to data loss.

Explanation of Anomalies Using Example

  1. Redundant storage:
    Information about DeptID, DeptName, and DeptLoc is repeated in the table.

  2. Update anomalies:
    If the DeptLoc for DeptID 101 is changed from Pune to Chennai, this creates inconsistencies since multiple records will have to be updated.

  3. Insertion anomalies:
    If we want to add a new tuple, such as (5, EEE, 50000) for DeptID 101, the repeated data for (101, XYZ, Pune) will occur.

  4. Deletion anomalies:
    If we delete a record with EmpID 4, information about DeptID 102, DeptName PQR, and DeptLoc Mumbai will also be deleted. This causes data loss.


1NF (First Normal Form)

First Normal Form (1NF) is the most basic level of database normalization. A database table is said to be in 1NF if it satisfies the following conditions:

Key Rules for 1NF:

  • Atomic Values: Each cell of the table must contain a single value (no multiple or composite values in a single cell).
  • Unique Columns: Each column must contain values of the same type (homogeneity).
  • Uniqueness of Rows: Each row must be unique and identifiable (usually achieved using a primary key).
  • No Repeating Groups: There should be no repeating groups or arrays in a table.

Example Before Applying 1NF:

StudentID Name Courses
1 John Math, Physics
2 Alice Chemistry, Biology
3 Bob Math, Chemistry

Issues:
The column "Courses" contains multiple values (not atomic).
This violates the rule of atomicity in 1NF.

Conversion to 1NF:

StudentID Name Course
1 John Math
1 John Physics
2 Alice Chemistry
2 Alice Biology
3 Bob Math
3 Bob Chemistry

Fixes:
Split the "Courses" column into individual rows so that each cell contains a single atomic value. Maintain a unique combination of rows (e.g., using StudentID and Course).

Benefits of 1NF:

  • Prevents ambiguity in data.
  • Ensures data is structured properly for relational database design.
  • Simplifies querying, as each piece of data is stored in its own field.

2NF (Second Normal Form)*

Definition: A table is in 2NF if:

  • It is in 1NF (data is atomic, no repeating groups).
  • It has no partial dependencies (all non-prime attributes depend on the entire primary key, not just a part of it).

🧩 What is Second Normal Form (2NF)?

Simple Definition:

A table is in 2NF if every non-key column depends on the entire primary key, not just a part of it.


🧠 What does that mean?

Let’s say your table’s primary key is made of two columns (a composite key). If a non-key column depends on only one of those two — that’s a partial dependency

2NF removes that.


🧩 Example — Before 2NF

Student_ID Course_ID Student_Name Course_Name
1 C1 Alice DBMS
2 C1 Bob DBMS
1 C2 Alice OS

Composite Primary Key: (Student_ID, Course_ID) (because one student can take many courses)


🔍 Dependencies:

Column Depends On
Student_Name Student_ID ✅ (only part of the key)
Course_Name Course_ID ✅ (only part of the key)

So both Student_Name and Course_Name depend on part of the key — not the whole (Student_ID, Course_ID) pair. ➡️ ❌ Not in 2NF (has partial dependencies).


✅ Convert to 2NF (Remove Partial Dependencies)

Split the table into 3 smaller ones:

1️⃣ Students Table

Students(Student_ID, Student_Name)
Student_ID Student_Name
1 Alice
2 Bob

2️⃣ Courses Table

Courses(Course_ID, Course_Name)
Course_ID Course_Name
C1 DBMS
C2 OS

3️⃣ Enrollments Table

Enrollments(Student_ID, Course_ID)
Student_ID Course_ID
1 C1
2 C1
1 C2

✅ Now:

  • Every non-key attribute depends on the whole key (no partial dependencies).
  • So this structure is in 2NF.

Would you like me to explain 3NF next (it’s actually easier once you get this)?

Benefits of 2NF:

  • No partial dependencies:
    EmpName is dependent only on EmpID.
    ProjectName is dependent only on ProjectID.
  • Reduced redundancy:
    Employee and project details are stored only once.
  • Improved consistency: Updates to employee or project names happen in one place, avoiding anomalies.

3NF (Third Normal Form) Normalization:

Definition: A table is in Third Normal Form (3NF) if: It is in Second Normal Form (2NF). It has no transitive dependencies.

Why Normalize to 3NF? To remove transitive dependencies and ensure that non-prime attributes have a direct relationship with the primary key, reducing redundancy and improving data integrity.

Example: Table in 2NF (but not in 3NF):
Student Table

StudentID StudentName DeptID DeptName DeptHead
1 Alice D1 Science Dr. Smith
2 Bob D2 Arts Dr. Taylor
3 Charlie D1 Science Dr. Smith

Primary Key: StudentID
Non-prime attributes: StudentName, DeptID, DeptName, DeptHead.

Issues:

  • Transitive Dependency: DeptName and DeptHead depend on DeptID, which in turn depends on StudentID. StudentID → DeptID → DeptName, DeptHead.
  • Redundancy: DeptName and DeptHead repeat for the same DeptID.

Converting to 3NF: Step 1: Identify Transitive Dependencies
DeptName and DeptHead depend on DeptID, not directly on StudentID.
Step 2: Split the Table to Remove Transitive Dependencies

Student Table:

StudentID StudentName DeptID
1 Alice D1
2 Bob D2
3 Charlie D1

Department Table:

DeptID DeptName DeptHead
D1 Science Dr. Smith
D2 Arts Dr. Taylor

Benefits of 3NF:

  • No transitive dependency: Non-prime attributes (DeptName, DeptHead) now depend only on the primary key of their respective table (DeptID in Department table).
  • Reduced redundancy: Department details are stored only once in the Department table.
  • Improved consistency: Updating department information affects only the Department table.

BCNF (Boyce-Codd Normal Form)

Definition: A table is in BCNF if:

It is in 3NF. For every functional dependency X→Y, X is a superkey (i.e., it uniquely identifies rows).

Example: Table in 3NF (but not in BCNF):

CourseID Instructor Room
C101 Dr. Smith R1
C102 Dr. Brown R2
C101 Dr. Smith R2

Functional Dependencies:
CourseID → Instructor
Instructor → Room

Issue:
Instructor → Room violates BCNF because Instructor is not a superkey.

Conversion to BCNF: Course Table:

CourseID Instructor
C101 Dr. Smith
C102 Dr. Brown

Instructor Table:

Instructor Room
Dr. Smith R1
Dr. Brown R2

Comparison between Normal Forms:

Normal Form Requirement Goal
2NF No partial dependencies Eliminate redundancy from composite keys.
3NF No transitive dependencies Ensure non-prime attributes depend only on the primary key.
BCNF Every determinant is a superkey Ensure all functional dependencies are preserved.