Sunday, November 7, 2010

Database Normalization

What is Normalization?
Normalization is the process of efficiently organizing data in database. There are two goals of the normalization process.
          1. Eliminating redundant data (For example, storing the same data in more than one table).
          2. Ensuring data dependencies make sense (Only storing related data in a table).
The Normal Forms      
The database community has developed series of guidelines for ensuring those databases are normalized. These are referred to as normal forms and numbered for one through five.
          1. First Normal Form (1NF)
          First normal form (1NF) sets the very basic rules for an organized database.                                          
  •  Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns.
For example, if an address attribute is considered atomic for given application, not including the city as part of the address would violate the first normal form.
2. Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data
  •  Meet all the requirements of the first normal form.
  •  Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  •  Create relationships between these new tables and their predecessors through the use of foreign key.
As an example of violating the second normal form, suppose that you define a relation called order that represents information about order and order details. The order relation contains the following attributes: orderid, productid, orderdate, qty, customerid, and companyname. The primary key is defined on ordered and productid.

 Data model before applying 2NF

The second normal for is violated in above figure. Because there are nonkey attributes that depend only on part of a candidate key (the primary key in this example). For example, you can find the orderdate of order, as well as customerid and companyname, based on the ordered alone. To confirm to the second normal form, you would need to split your original relation into tow relation: Orders and OrderDetails. The Orders relation would include the attributes: ordered, orderdate, customerid and companyname. The primary key is defined on orderid. The OrderDetails relation would include

 Data model after applying 2NF and before 3NF

3. Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
  •  Meet all the requirements of the second normal form.
  •  Remove columns that are not dependent upon the primary key.
Our Orders and OrderDetails relations now conform to the second normal form. Remember that the Orders relation at this point contains the attributes orderid, orderdate, customerid, and companyname with the primary key defined on orderid. Both customerid and companyname depend on the whole primary key—orderid. For example, you need the whole primary key to find the customerid who placed the order. Similarly, you need the whole primary key to find the company name of the customer who placed the order. However, customerid and companyname are also dependent on each other. To meet the third normal form, you need to add a Customers relation with attributes customerid (primary key) and companyname and remove the companyname attribute from the Orders relation.

Data model after applying 3NF
 
4. Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:
  •  Meet all the requirements of the third normal form.
  •  A relation is in 4NF if it has no multi-value dependencies.