DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)
Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a database. The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF). The identification of various types of update anomalies such as insertion, deletion, and modification anomalies can be found when tables that break the rules of 1NF, 2NF, and 3NF and they are likely to contain redundant data and suffer from update anomalies. Normalization is a technique for producing a set of tables with desirable properties that support the requirements of a user or company. Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables. Take a look at the following example: StdSSN S1 S1 S2 S2 StdCity SEATTLE SEATTLE BOTHELL BOTHELL StdClass JUN JUN JUN JUN OfferNo O1 O2 O3 O2 OffTerm FALL FALL SPRING FALL OffYear 2006 2006 2007 2006 EnrGrade 3.5 3.3 3.1 3.4 CourseNo C1 C2 C3 C2 CrsDesc DB VB OO VB
The insertion anomaly: Occurs when extra data beyond the desired data must be added to the database. For example, to insert a course (CourseNo), it is necessary to know a student (StdSSN) and offering (OfferNo) because the combination of StdSSN and OfferNo is the primary key. Remember that a row cannot exist with NULL values for part of its primary key. The update anomaly: Occurs when it is necessary to change multiple rows to modify ONLY a single fact. For example, if we change the StdClass of student S1 (JUN), two rows, row 1 and 2 must be changed. If S1 was enrolled in 10 classes, 10 rows must be changed. The deletion anomaly: Occurs whenever deleting a row inadvertently causes other data to be deleted. For example, if we delete the enrollment (EnrGrade) of S2 in O3 (third row), we lose the information about offering O3 and course C3 because these values are unique to the table (cell). Furthermore O3 is a primary key. RECAP Problems associated with data redundancy are illustrated by comparing the Staff and Branch tables with the StaffBranch table. Tables that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies. See the following Figure for an example of a table with redundant data called StaffBranch. There are two main types of insertion anomalies, which we illustrate using this table.
Insertion anomalies 1. To insert the details of a new member of staff (staffNo, name, position and salary) located at a given branch into the StaffBranch table, we must also enter the correct details for that branch (branchNo, branchAddress and telNo). For example, to insert the details of a new member of staff at branch B002, we must enter the correct details of branch B002 so that the branch details are consistent with values for branch B002 in other records of the StaffBranch table. The data shown in the StaffBranch table is also shown in the Staff and Branch tables. These tables do have redundant data and do not suffer from this potential inconsistency, because for each staff member we only enter the appropriate branch number into the Staff table. In addition, the details of branch B002 are recorded only once in the database as a single record in the Branch table. 2. To insert details of a new branch that currently has no members of staff into the StaffBranch table, it’s necessary to enter NULLs into the staff-related columns, such as staffNo. However, as staffNo is the primary key for the StaffBranch table, attempting to enter nulls for staffNo violates entity integrity, and is not allowed. The design of the tables shown in Staff and Branch avoids this problem because new branch details are entered into the Branch table separately from the staff details. The details of staff ultimately located at a new branch can be entered into the Staff table at a later...
Please join StudyMode to read the full document