Normalization In Database

Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links them using relationships. The purpose of Normalization in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.

Anomalies which normalization solves:-

  • Update anomalies − when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
  • Deletion anomalies − We tried to delete a record, but parts of it remained undeleted because of unawareness, the data is also saved somewhere else.
  • Insert anomalies − We tried to insert data in a record that does not exist at all.

Before we move to normalization part, we should first discuss function dependency. functional dependency tells about the relationship between the attributes within the table.

suppose we have a table for which X →A holds true. it tells that attribute A is derivable from X attribute. for the same values of X , there must be same value of A. A single table holds many functional dependency.

Normalization of table is achieved through 4 step process, which are 1NF, 2NF, 3NF, BCNF.

First Normal Form:-

If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is in first normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form if every attribute in that relation is singled valued attribute.

+----+------+---------+
| ID | Name | Courses |
+----+------+---------+
| 1 | A | C1, C2 |
| 2 | E | C3 |
| 3 | M | C2, C3 |
+----+------+---------+

after applying 1NF:-

+----+------+--------+
| ID | Name | Course |
+----+------+--------+
| 1 | A | c1 |
| 1 | A | c2 |
| 2 | E | c3 |
| 3 | M | c2 |
| 3 | M | c3 |
+----+------+--------+

Second Normal Form:-

To be in second normal form, a relation must be in first normal form and relation must not contain any partial dependency. A relation is in 2NF if it has No Partial Dependency.

Partial Dependency — If the proper subset of candidate key determines non-prime attribute, it is called partial dependency.

+---------+-----------+------------+
| STUD_NO | COURSE_NO | COURSE_FEE |
+---------+-----------+------------+
| 1 | C1 | 1000 |
| 2 | C2 | 1500 |
| 1 | C4 | 2000 |
| 4 | C3 | 1000 |
| 4 | C1 | 1000 |
| 2 | C5 | 2000 |
+---------+-----------+------------+

Here STUD_NO, COURSE_NO are part of candidate key which means they can determine COURSE_FEE independently, which exhibits Partial Dependency.

after 2NF:-

Table 1, where STUD_NO --> COURSE_NO
+---------+-----------+
| STUD_NO | COURSE_NO |
+---------+-----------+
| 1 | C1 |
| 2 | C2 |
| 1 | C4 |
| 4 | C3 |
| 4 | C1 |
| 2 | C5 |
+---------+-----------+

Table 2, Where STUD_NO --> COURSE_FEE
+---------+------------+
| STUD_NO | COURSE_FEE |
+---------+------------+
| 1 | 1000 |
| 2 | 1500 |
| 1 | 2000 |
| 4 | 1000 |
| 4 | 1000 |
| 2 | 2000 |
+---------+------------+

Third Normal Form:-

A relation is in third normal form, if there is no transitive dependency for non-prime attributes as well as it is in second normal form.

Transitive dependency :- if A → B, B →C then A →C. here A is candidate key. this indirect dependence of non prime attribute to the candidate key is called transitive dependency.

+---------+-----------+------------+--------------+
| STUD_NO | STUD_NAME | STUD_STATE | STUD_COUNTRY |
+---------+-----------+------------+--------------+
| 1 | A | HARYANA | INDIA |
| 2 | A | PUNJAB | INDIA |
| 3 | B | PUNJAB | INDIA |
+---------+-----------+------------+--------------+
here STUD_NO --> STUD_NAME, STUD_NO --> STUD_STATE
, STUD_STATE --> STUD_COUNTRY
here transitively STUD_NO --> STUD_COUNTRY

after third normal form:-

Table 1, where STUD_NO --> STUD_NAME, STUD_NO --> STUD_STATE
+---------+-----------+------------+
| STUD_NO | STUD_NAME | STUD_STATE |
+---------+-----------+------------+
| 1 | A | HARYANA |
| 2 | A | PUNJAB |
| 3 | B | PUNJAB |
+---------+-----------+------------+
table 2, where STUD_STATE --> STUD_COUNTRY
+------------+--------------+
| STUD_STATE | STUD_COUNTRY |
+------------+--------------+
| HARYANA | INDIA |
| PUNJAB | INDIA |
| PUNJAB | INDIA |
+------------+--------------+

Boyce-Codd Normal Form (BCNF):-

A relation R is in BCNF if R is in Third Normal Form and for every functional dependency, left hand side attribute is super key. A relation is in BCNF iff in every functional dependency X –> Y, X is a super key.

in previous example, if we look closely after achieving 3NF we have gotten two tables where each table has one single attribute which is determining all the other attributes. Functional dependency of those tables are {STUD_NO →STUD_NAME, STUD_STATE} and {STUD_STATE → STUD_COUNTRY}. Here STUD_NO and STUD_STATE are super key in their respective tables which means these two tables are in BCNF.