Normalization in SQL is the process of organizing a database to reduce redundancy and improve data integrity. It involves dividing a database into two or more tables and defining relationships between the tables. The main objectives of normalization are to:
- Eliminate redundant data (e.g., storing the same data in more than one table).
- Ensure data dependencies make sense (e.g., only storing related data in a table).
Normalization is typically done in stages, called normal forms, each with specific requirements and constraints. The most commonly discussed normal forms are the first three: 1NF, 2NF, and 3NF. There are higher normal forms (BCNF, 4NF, 5NF, etc.), but they are less commonly used.
First Normal Form (1NF)
1NF requires that the values in each column of a table be atomic (indivisible). This means no repeating groups or arrays should be present in any column.
Example
Consider a table that stores customer orders:
OrderID CustomerName Products
1 Alice Apples, Oranges
2 Bob Bananas
3 Charlie Apples, Grapes
To convert this to 1NF, we need to ensure that each column contains only atomic values:
OrderID CustomerName Product
1 Alice Apples
1 Alice Oranges
2 Bob Bananas
3 Charlie Apples
3 Charlie Grapes
Second Normal Form (2NF)
2NF requires that the table be in 1NF and that all non-key attributes be fully functional dependent on the primary key. This means that no partial dependency of any column on the primary key is allowed. Only apply 2NF to tables with composite primary keys.
Example
Consider a table that stores orders with product details:
OrderID ProductID CustomerName ProductName Quantity
1 101 Alice Apples 10
2 102 Bob Bananas 5
3 103 Charlie Grapes 7
Here, OrderID
and ProductID
together form the primary key. To convert to 2NF, we should separate the customer and product details into different tables:
Orders:
OrderID CustomerName
1 Alice
2 Bob
3 Charlie
OrderDetails:
OrderID ProductID Quantity
1 101 10
2 102 5
3 103 7
Products:
ProductID ProductName
101 Apples
102 Bananas
103 Grapes
Third Normal Form (3NF)
3NF requires that the table be in 2NF and that all the attributes are not only fully functionally dependent on the primary key but also non-transitively dependent. This means that no non-primary key attribute should depend on another non-primary key attribute.
Example
Consider a table with transitive dependency:
OrderID CustomerID CustomerName CustomerAddress
1 201 Alice 123 Maple St
2 202 Bob 456 Oak St
3 203 Charlie 789 Pine St
Here, CustomerAddress
depends on CustomerID
, not directly on OrderID
. To convert to 3NF, separate the customer details into a different table:
Orders:
OrderID CustomerID
1 201
2 202
3 203
Customers:
CustomerID CustomerName CustomerAddress
201 Alice 123 Maple St
202 Bob 456 Oak St
203 Charlie 789 Pine St
Normalization improves database design by ensuring the integrity and efficiency of the data:
- 1NF: Eliminate repeating groups and ensure atomicity.
- 2NF: Eliminate partial dependencies on a composite primary key.
- 3NF: Eliminate transitive dependencies to ensure all attributes are directly dependent on the primary key.
By following these normalization steps, you create a well-structured database that minimizes redundancy and dependency issues.