Git workshop: BLOB object type

Photo by Irina Iacob on Unsplash

Git workshop: BLOB object type

In Git, data is stored as objects. Each object is identified by a unique SHA-1 hash. There are several types of objects in Git: commit, tree, blob, and tag. Among these, the "blob" object type represents the content of a file.

A blob stands for "binary large object" and it's just a chunk of data. Each version of a file in a Git repository corresponds to a blob. The blob holds the file data, but it doesn't contain any metadata about the file (like its name or its path). Metadata is stored in a tree object that references the blob.

In essence, the blob is the most fundamental object in Git: it represents the content of a single version of a file.

Workshop

Objective:

To understand the relationship between file content and Git blobs and to be able to inspect blobs directly.

Prerequisites:

  1. A basic understanding of Git.

  2. Git installed on your system.

Exercise Steps:

  1. Setup a new Git repository:

     mkdir git-blob-exercise
     cd git-blob-exercise
     git init
    
  2. Create a new file and inspect its contents:

     echo "Hello, Git BLOB!" > hello.txt
     cat hello.txt
    
  3. Add the file to Git and commit:

     git add hello.txt
     git commit -m "Initial commit"
    
  4. Find the blob hash for the file:

     git ls-tree HEAD
    

    You'll see output like:

    Note down the 1be678f79f1078a680269a9a4f30b69e29624dd7 value

  5. Inspect the blob content:

     git cat-file -p 1be678f79f1078a680269a9a4f30b69e29624dd7
    

    This should output: Hello, Git BLOB!

  6. Modify the file and inspect its new blob:

     echo "Hello again, Git BLOB!" >> hello.txt
     git add hello.txt
     git commit -m "Update hello.txt"
     git ls-tree HEAD
    

    You'll see a new blob hash 24fb00014f282b21890906c857d5ef719776efc8 for hello.txt.

  7. Inspect the new blob content:

     git cat-file -p 24fb00014f282b21890906c857d5ef719776efc8
    

    This should output the updated content of the file.

Discussion:

  • Blobs represent the content of a file, not its path or name.

  • Different content will generate a different blob hash, even if the file name remains the same. This is because the blob's hash is generated based on its content.

  • Trees, another type of Git object, are responsible for holding the filenames and structuring directories. They reference blobs for the actual file content.

git hash-object and git cat-file commands

Both git hash-object and git cat-file are lower-level Git commands that deal with the internal workings of Git. Let's delve into each of them:

git hash-object

The git hash-object command takes a file and calculates the SHA-1 checksum for the file's content. This checksum is what Git uses to uniquely identify objects within its object database. The command essentially emulates how Git computes the hash of an object based on its content.

# Syntax
git hash-object <file>

# Example
git hash-object README.md

Options:

  • -w: Writes the object to the object database. This allows you to manually create an object hash and store it within your Git repository.
# Example: Compute hash and write the object to the database
git hash-object -w README.md

git cat-file

The git cat-file command is like the counterpart of git hash-object. It is used to view the type or the content of an object in the Git database given its hash.

# Syntax
git cat-file -t <hash>  # Show type of object
git cat-file -p <hash>  # Show content of object

# Example: Show type of an object
git cat-file -t 5d8265c
# Output: commit

# Example: Show content of an object
git cat-file -p 5d8265c

Most valuable options:

  • -t: Display the type of the object. This will return one of four types: 'blob', 'tree', 'commit', or 'tag'.

  • -p: "Pretty-print" the contents of the object. Useful for viewing commits and trees.

  • -s: Display the size of the object.

# Example: Show size of an object
git cat-file -s 5d8265c
💡
This is the uncompressed size of the object's content as stored in the Git object database. Note that this is not necessarily the size of the corresponding file in your working directory; it is the size of the internal Git object representing that content. For blobs, it's generally the file size, but for commits and trees, it's the size of the internal data structure that Git uses to represent those objects.

Working Together

These commands are usually used in sequence for debugging or script automation. For instance, you can calculate the hash for a file using git hash-object, and then use that hash with git cat-file to view the original content and validate the integrity of the file.

# Find the hash of README.md
hash=$(git hash-object README.md)

# Use that hash to retrieve the object's type and content
git cat-file -t $hash
git cat-file -p $hash

In summary, git hash-object and git cat-file provide a way to interact with Git's internal object model. They are particularly useful for debugging, scripting, or deep diving into how Git works.

References

  1. Git Internals - Git Objects