Second Normal Form is one of the stages in the normalization process used in relational database design. A table is said to be in 2NF if it meets the following criteria:
It is in First Normal Form (1NF).
It has no partial dependencies. This means no column depends on just part of any candidate key.
Why was 2NF Invented?
Normalization is the process of organizing data to reduce redundancy and prevent undesirable characteristics like insertion, update, and deletion anomalies. The primary purpose of 2NF is to address partial dependencies. Partial dependencies can introduce unnecessary data duplication and can make the database difficult to maintain. By ensuring a table is in 2NF, you are taking steps to ensure that the data is logically organized and that the relationship between the data is maintained consistently.
Partial Dependency
Partial dependency occurs when an attribute (or column) in a table depends on only a part of the primary key and not on the whole key, especially in tables where the primary key consists of multiple columns (composite key).
In simple terms, if you have a table with a composite key (more than one column as its primary key), and if any non-key attribute is dependent on only one part of this composite key rather than the whole composite key, then it's a partial dependency.
Example:
Let's take a university scenario. Suppose there's a table that records the grades students receive for courses they've taken with specific instructors:
CREATE TABLE StudentCourses (
StudentID INT,
CourseID INT,
InstructorName VARCHAR(255),
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID)
);
Here, the combination of StudentID
and CourseID
is the primary key. So for each student-course pairing, there's a unique grade.
However, notice the problem with the column InstructorName
. The instructor of a course isn't dependent on which student took it. Instead, the instructor is only dependent on the course itself (CourseID
). In this scenario, the InstructorName
has a partial dependency on the composite primary key because it's only dependent on CourseID
and not the entire key (StudentID
, CourseID
).
To rectify this and remove the partial dependency, we would typically break this table into two tables:
A
Courses
table that linksCourseID
toInstructorName
.A
StudentGrades
table that records theGrade
for eachStudentID
andCourseID
combination.
By doing this, we've removed the partial dependency and brought the structure closer to being in Second Normal Form (2NF).
What Does 2NF Solve in SQL?
Reduces Redundancy: By ensuring that a table has no partial dependencies, you can reduce the duplication of data in your database.
Prevents Update Anomalies: Without 2NF, you may have to update multiple rows if you want to update a single fact, which increases the risk of inconsistent data.
Simplifies Data Structure: A normalized structure is often simpler to understand and maintain.
Workshop Exercise for 2NF in MySQL:
Objective: To understand and apply 2NF on a relational database.
To create database:
CREATE DATABASE UniversityDB;
GRANT ALL PRIVILEGES ON UniversityDB.* TO 'user'@'localhost';
FLUSH PRIVILEGES;
USE UniversityDB;
Imagine we have a table for recording student grades:
CREATE TABLE StudentGrades (
StudentID INT,
CourseID INT,
InstructorName VARCHAR(255),
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID)
);
In the above table, both StudentID
and CourseID
together make the primary key. But there's a problem: The InstructorName
is dependent only on the CourseID
, not on the StudentID
. This is a partial dependency.
Task: Convert the above table into 2NF.
Solution:
Step 1: Create a separate table for courses and their respective instructors.
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
InstructorName VARCHAR(255)
);
Step 2: Modify the StudentGrades
table to remove the InstructorName
.
CREATE TABLE StudentGrades2NF (
StudentID INT,
CourseID INT,
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID)
);
With this new structure, the partial dependency is removed. If an instructor changes for a course, we only need to update it in one place (Courses
table), rather than updating it for every student entry in the old StudentGrades
table.
Exercise:
Populate the
Courses
table with at least 3 courses and instructors.Populate the
StudentGrades2NF
table with student grades, making sure that the courses match those in theCourses
table.Change the instructor for one of the courses in the
Courses
table and observe that the grades inStudentGrades2NF
remain unaffected.
This exercise should help in understanding the benefits and application of 2NF in practice.
Exercise Solutions:
- Populate the
Courses
table with at least 3 courses and instructors.
INSERT INTO Courses (CourseID, InstructorName) VALUES
(101, 'Dr. Smith'),
(102, 'Prof. Johnson'),
(103, 'Dr. Brown');
- Populate the
StudentGrades2NF
table with student grades, ensuring that the courses match those in theCourses
table.
INSERT INTO StudentGrades2NF (StudentID, CourseID, Grade) VALUES
(1, 101, 'A'),
(1, 102, 'B'),
(2, 101, 'B'),
(2, 103, 'A'),
(3, 102, 'A');
- Change the instructor for one of the courses in the
Courses
table and observe that the grades inStudentGrades2NF
remain unaffected.
For example, let's say we want to change the instructor for CourseID
102 to "Dr. White":
UPDATE Courses SET InstructorName = 'Dr. White' WHERE CourseID = 102;
Now, even after updating the instructor for course 102, the grades in StudentGrades2NF
remain unchanged. The new instructor name will only reflect in the Courses
table, thereby showing the advantage of removing the partial dependency.