Second Normal Form 2NF in SQL

Second Normal Form is one of the stages in the normalization process used in relational database design. A table is said to be in 2NF if it meets the following criteria:

  1. It is in First Normal Form (1NF).

  2. It has no partial dependencies. This means no column depends on just part of any candidate key.

Why was 2NF Invented?

Normalization is the process of organizing data to reduce redundancy and prevent undesirable characteristics like insertion, update, and deletion anomalies. The primary purpose of 2NF is to address partial dependencies. Partial dependencies can introduce unnecessary data duplication and can make the database difficult to maintain. By ensuring a table is in 2NF, you are taking steps to ensure that the data is logically organized and that the relationship between the data is maintained consistently.

Partial Dependency

Partial dependency occurs when an attribute (or column) in a table depends on only a part of the primary key and not on the whole key, especially in tables where the primary key consists of multiple columns (composite key).

In simple terms, if you have a table with a composite key (more than one column as its primary key), and if any non-key attribute is dependent on only one part of this composite key rather than the whole composite key, then it's a partial dependency.

Example:

Let's take a university scenario. Suppose there's a table that records the grades students receive for courses they've taken with specific instructors:

CREATE TABLE StudentCourses (
    StudentID INT,
    CourseID INT,
    InstructorName VARCHAR(255),
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID)
);

Here, the combination of StudentID and CourseID is the primary key. So for each student-course pairing, there's a unique grade.

However, notice the problem with the column InstructorName. The instructor of a course isn't dependent on which student took it. Instead, the instructor is only dependent on the course itself (CourseID). In this scenario, the InstructorName has a partial dependency on the composite primary key because it's only dependent on CourseID and not the entire key (StudentID, CourseID).

To rectify this and remove the partial dependency, we would typically break this table into two tables:

  1. A Courses table that links CourseID to InstructorName.

  2. A StudentGrades table that records the Grade for each StudentID and CourseID combination.

By doing this, we've removed the partial dependency and brought the structure closer to being in Second Normal Form (2NF).

What Does 2NF Solve in SQL?

  1. Reduces Redundancy: By ensuring that a table has no partial dependencies, you can reduce the duplication of data in your database.

  2. Prevents Update Anomalies: Without 2NF, you may have to update multiple rows if you want to update a single fact, which increases the risk of inconsistent data.

  3. Simplifies Data Structure: A normalized structure is often simpler to understand and maintain.

Workshop Exercise for 2NF in MySQL:

Objective: To understand and apply 2NF on a relational database.

To create database:

CREATE DATABASE UniversityDB;
GRANT ALL PRIVILEGES ON UniversityDB.* TO 'user'@'localhost';
FLUSH PRIVILEGES;
USE UniversityDB;

Imagine we have a table for recording student grades:

CREATE TABLE StudentGrades (
    StudentID INT,
    CourseID INT,
    InstructorName VARCHAR(255),
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID)
);

In the above table, both StudentID and CourseID together make the primary key. But there's a problem: The InstructorName is dependent only on the CourseID, not on the StudentID. This is a partial dependency.

Task: Convert the above table into 2NF.

Solution:

Step 1: Create a separate table for courses and their respective instructors.

CREATE TABLE Courses (
    CourseID INT PRIMARY KEY,
    InstructorName VARCHAR(255)
);

Step 2: Modify the StudentGrades table to remove the InstructorName.

CREATE TABLE StudentGrades2NF (
    StudentID INT,
    CourseID INT,
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID)
);

With this new structure, the partial dependency is removed. If an instructor changes for a course, we only need to update it in one place (Courses table), rather than updating it for every student entry in the old StudentGrades table.

Exercise:

  1. Populate the Courses table with at least 3 courses and instructors.

  2. Populate the StudentGrades2NF table with student grades, making sure that the courses match those in the Courses table.

  3. Change the instructor for one of the courses in the Courses table and observe that the grades in StudentGrades2NF remain unaffected.

This exercise should help in understanding the benefits and application of 2NF in practice.

Exercise Solutions:

  1. Populate the Courses table with at least 3 courses and instructors.
INSERT INTO Courses (CourseID, InstructorName) VALUES 
(101, 'Dr. Smith'),
(102, 'Prof. Johnson'),
(103, 'Dr. Brown');

  1. Populate the StudentGrades2NF table with student grades, ensuring that the courses match those in the Courses table.
INSERT INTO StudentGrades2NF (StudentID, CourseID, Grade) VALUES 
(1, 101, 'A'),
(1, 102, 'B'),
(2, 101, 'B'),
(2, 103, 'A'),
(3, 102, 'A');

  1. Change the instructor for one of the courses in the Courses table and observe that the grades in StudentGrades2NF remain unaffected.

For example, let's say we want to change the instructor for CourseID 102 to "Dr. White":

UPDATE Courses SET InstructorName = 'Dr. White' WHERE CourseID = 102;

Now, even after updating the instructor for course 102, the grades in StudentGrades2NF remain unchanged. The new instructor name will only reflect in the Courses table, thereby showing the advantage of removing the partial dependency.