Lecture 5: Normalization of Database Tables

Database Systems

J Mwaura

Database Tables & Normalization

Table is the basic building block of database design

Normalization is a process for evaluating and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies

  • the process involves assigning attributes to tables based on the concept of determination in Relational Database Modeling

Simply put - Normalization is a technique for producing a set of tables with desirable properties that support the requirements of a user or company

Need for Normalization

The objective of normalization is to ensure that each table conforms to the concept of well- formed relations or tables

When designing a new database structure

  • Normalization is used to analyze the relationships among the attributes within each entity and determine if the structure can be improved through normalization

When modifying existing data structures

  • Normalization is used to analyze relationships among the attributes or fields in the data structure

Database Design: Normalization

Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a database

Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables

Database Design: Normalization

Take a look at the following table

normal-forms

Insertion anomaly occurs when extra data beyond the desired data must be added to the database. e.g., to insert a course (CourseNo), it is necessary to know a student (StdSSN) and offering (OfferNo) because the combination of StdSSN and OfferNo is the primary key.

Update anomaly occurs when it is necessary to change multiple rows to modify ONLY a single fact. e.g., if we change the StdClass of student S1 (JUN), two rows, row 1 and 2 must be changed.

Deletion anomaly occurs whenever deleting a row accidentally causes other data to be deleted. e.g., if we delete the enrollment (EnrGrade) of S2 in O3 (third row), we lose the information about offering O3 and course C3 because these values are unique to the table (cell). Furthermore O3 is a primary key

Properties of a Well-Formed Relation

Each table represents a single subject

  • For example, a COURSE table will contain only data that directly pertain to courses. Similarly, a STUDENT table will contain only student data

No data item will be unnecessarily stored in more than one table

  • Ensures that the data is updated in only one place

All nonprime attributes in a table are dependent on the primary key - the entire primary key and nothing but the primary key

  • Ensures that the data is uniquely identifiable by a primary key value

Each table is void of insertion, update, or deletion anomalies

  • Ensures the integrity and consistency of the data

Well-Structured/Formed Relation

A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies

Goal is to avoid (minimize) anomalies

  • Insertion Anomaly - adding new rows forces user to create duplicate data
  • Deletion Anomaly - deleting a row may cause loss of other data representing completely different facts
  • Modification Anomaly -changing data in a row forces changes to other rows because of duplication

General rule of thumb: a table should not pertain to more than one entity type

Normal Forms

normal-forms

Read more on Higher-Level Normal Forms;

  • Boyce-Codd Normal Form (BCNF)
    • A table is in Boyce-Codd normal form (BCNF) when every determinant in the table is a candidate key
  • Fourth Normal Form (4NF)
    • A table is in fourth normal form (4NF) when it is in 3NF and has no multivalued dependencies

Functional Dependencies

Functional Dependency The value of one attribute (the determinant) determines the value of another attribute

  • A->B reads Attribute B is functionally dependent on A
  • A->B means if two rows have same value of A they necessarily have same value of B

FDs are determined by semantics: You can't say that a FD exists just by looking at data. But can say whether it does not exist by looking at data

Functional Dependencies and Keys

Functional Dependency The value of one attribute (the determinant) determines the value of another attribute

Candidate key

  • Attribute that uniquely identifies a row in a relation
  • Could be a combination of (non-redundant) attributes
  • Each non-key field is functionally dependent on every candidate key

Functional Dependencies

Partial & full dependency

normal-forms

Transitive Dependencies

A transitive dependency is when a non-key attribute depends on another non-key attribute

Note: This is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third attribute

Dependencies

Partial, full & transitive

fd td

Sample Original Table

normal-forms

Conversion To 1st Normal Form - 1NF

Steps;

  • Eliminate the repeating groups For example, a car can have multiple colors for its top, interior, bottom, trim, and so on
  • Identify the primary key
  • Identify all dependencies

Properties of 1NF;

  • No repeating groups in the table i.e., each row/column intersection contains one and only one value, not a set of values
  • All of the key attributes are defined
  • All attributes are dependent on the primary key

Table in 1NF

normal-forms

1NF Dependency Diagram

normal-forms

Conversion To 2nd Normal Form - 2NF

Conversion to 2NF occurs only

  • when the 1NF has a composite primary key; If the 1NF has a single-attribute primary key, then the table is automatically in 2NF

Steps;

  • Make new tables to eliminate partial dependencies
  • Re-assign corresponding dependent attributes

Properties of 2NF;

  • It includes no partial dependencies; i.e., no attribute is dependent on only a portion of the primary key

Table in 2NF

normal-forms

2NF Dependency Diagram

normal-forms

Conversion To 3rd Normal Form - 3NF

Steps;

  • Make new tables to eliminate transitive dependencies
  • Reassign corresponding dependent attributes

Properties in 3NF;

  • It is in 2NF
  • It contains no transitive dependencies i.e. no non-key attribute is functionally dependent on another non-key attribute

Table in 3NF

normal-forms

3NF Dependency Diagram

normal-forms

Lets look at an example

😍

Example - not 1NF

For example the telNos column contains multiple values - intersection must be atomic

Example - is 1NF

No multiple values -meaning that intersection is atomic

Example - not 2NF

2NF ONLY applies to tables with composite primary keys (more than one primary key)

Example - is 2NF

2NF is a table that is in 1NF and every non-primary-key column is fully functional dependent on the primary key

Example - not 3NF

3NF is a table that is in 1NF and 2NF and in which all non-primary-key column can be worked out from only the primary key column(s) and no other columns

Example - is 3NF

3NF is a table that is in 1NF and 2NF and in which all non-primary-key column can be worked out from only the primary key column(s) and no other columns

Wrap-up!

normal-forms

Wrap-up!

normal-forms

End of Lecture 5

Database Systems

That's it!

Queries about this Lesson, please send them to: jmwaura@jkuat.ac.ke

*References*

  • Database Systems: Design, Implementation, and Management, 12th ed. Carlos Coronel & Steven Morris
  • Database Modeling and Design; Logical Design, 5th ed. Taby Teorey et.al
  • Fundamentals of database systems, 6th ed. Ramez Elmasri & Shamkant B. Navathe
Courtesy of
Database Systems