In Git, data is stored as objects. Each object is identified by a unique SHA-1 hash. There are several types of objects in Git: commit, tree, blob, and tag. Among these, the "blob" object type represents the content of a file.
A blob stands for "binary large object" and it's just a chunk of data. Each version of a file in a Git repository corresponds to a blob. The blob holds the file data, but it doesn't contain any metadata about the file (like its name or its path). Metadata is stored in a tree object that references the blob.
In essence, the blob is the most fundamental object in Git: it represents the content of a single version of a file.
Workshop
Objective:
To understand the relationship between file content and Git blobs and to be able to inspect blobs directly.
Prerequisites:
A basic understanding of Git.
Git installed on your system.
Exercise Steps:
Setup a new Git repository:
mkdir git-blob-exercise cd git-blob-exercise git init
Create a new file and inspect its contents:
echo "Hello, Git BLOB!" > hello.txt cat hello.txt
Add the file to Git and commit:
git add hello.txt git commit -m "Initial commit"
Find the blob hash for the file:
git ls-tree HEAD
You'll see output like:
Note down the
1be678f79f1078a680269a9a4f30b69e29624dd7
valueInspect the blob content:
git cat-file -p 1be678f79f1078a680269a9a4f30b69e29624dd7
This should output:
Hello, Git BLOB!
Modify the file and inspect its new blob:
echo "Hello again, Git BLOB!" >> hello.txt git add hello.txt git commit -m "Update hello.txt" git ls-tree HEAD
You'll see a new blob hash
24fb00014f282b21890906c857d5ef719776efc8
forhello.txt
.Inspect the new blob content:
git cat-file -p 24fb00014f282b21890906c857d5ef719776efc8
This should output the updated content of the file.
Discussion:
Blobs represent the content of a file, not its path or name.
Different content will generate a different blob hash, even if the file name remains the same. This is because the blob's hash is generated based on its content.
Trees, another type of Git object, are responsible for holding the filenames and structuring directories. They reference blobs for the actual file content.
git hash-object and git cat-file commands
Both git hash-object
and git cat-file
are lower-level Git commands that deal with the internal workings of Git. Let's delve into each of them:
git hash-object
The git hash-object
command takes a file and calculates the SHA-1 checksum for the file's content. This checksum is what Git uses to uniquely identify objects within its object database. The command essentially emulates how Git computes the hash of an object based on its content.
# Syntax
git hash-object <file>
# Example
git hash-object README.md
Options:
-w
: Writes the object to the object database. This allows you to manually create an object hash and store it within your Git repository.
# Example: Compute hash and write the object to the database
git hash-object -w README.md
git cat-file
The git cat-file
command is like the counterpart of git hash-object
. It is used to view the type or the content of an object in the Git database given its hash.
# Syntax
git cat-file -t <hash> # Show type of object
git cat-file -p <hash> # Show content of object
# Example: Show type of an object
git cat-file -t 5d8265c
# Output: commit
# Example: Show content of an object
git cat-file -p 5d8265c
Most valuable options:
-t
: Display the type of the object. This will return one of four types: 'blob', 'tree', 'commit', or 'tag'.-p
: "Pretty-print" the contents of the object. Useful for viewing commits and trees.-s
: Display the size of the object.
# Example: Show size of an object
git cat-file -s 5d8265c
Working Together
These commands are usually used in sequence for debugging or script automation. For instance, you can calculate the hash for a file using git hash-object
, and then use that hash with git cat-file
to view the original content and validate the integrity of the file.
# Find the hash of README.md
hash=$(git hash-object README.md)
# Use that hash to retrieve the object's type and content
git cat-file -t $hash
git cat-file -p $hash
In summary, git hash-object
and git cat-file
provide a way to interact with Git's internal object model. They are particularly useful for debugging, scripting, or deep diving into how Git works.