normalization

Put Stuff Where it Belongs

Some time ago I came across this while reading a book – SQL Server 2008 Step by Step (Mike Hotek, Microsoft Press) – and to date I think it’s the best description of normalization that I’ve ever seen… it just took me this long to get around to obtaining permission to post the excerpt!

I don’t have much further to say on it, other than many thanks to the O’Reilly permissions people, so without further ado…

Entire books have been written and multi-week courses taught about database design. In all of this material, you will find discussions of first, second, and third normal forms along with building logical and physical data models. You could spend significant amounts of time learning about metadata and data modeling tools. Lost in all of this material is the simple fact that tables have to be created to support an application and the people creating the tables have more important things to worry about than which normal form a database is in or if the remember to build a logical model and render a physical model from the logical model.

A database in the real world is not going to meet any theoretical designs, no matter how you try to force a square peg into a round hole.

Database design is actually a very simple process, once you stop over-thinking what you are doing. The process of designing a database can be summed up in one simple sentence: “Put stuff where it belongs.”

Boiling down these tens of thousands of pages of database design material into a single sentence will certainly have some people turning purple, so let’s investigate this simple assertion a little more closely.

If you were to design a database that will store customers, customer orders, products and the products that a customer ordered, the process of outlining a set of tables is very straightforward. Our customers can have a first name, last name and an address. We now have a table named Customer with three columns of data. However, if you want utilize the address to ship an order to, you will need to break the address into its component parts of a location, city, state or province, and a postal code. If you only allowed one address for a customer, the address information would go into the customer table. However, if you wanted to be able to store multiple addresses for a customer, you now need a second table that might be called CustomerAddress. If a customer is only allowed to place a single order, then the order information would go into the customer table. However, if a customer is allowed to place more than one order, you would want to split the orders into a separate table that might be called Order. If an order can be composed of more than one item, you would want to add a table that might be called OrderDetails to hold the multiple items for a given order. We could follow this logic through all of the pieces of data that you would want to store and in the end, you will have designed a database by applying one simple principle: “Put stuff where it belongs.”

Microsoft® SQL Server® 2008 Step by Step by Mike Hotek published by Microsoft Press, A Division of Microsoft Corporation Copyright © 2009 Mike Hotek. All rights reserved. Used with permission.