The Structure of Git Blob Objects

Photo by Kaleidico on Unsplash

The Structure of Git Blob Objects

In Git, a blob (short for "binary large object") is used to store the content of a file. A blob is essentially a snapshot of the content of a file at a given point in time. Unlike trees and commits, blobs are very straightforward: they contain only the content of the file and have no additional metadata like file name or permissions. Blobs themselves are not concerned with file names or directory structures; that information is maintained by tree objects.

Structure of a Blob Object

A blob object typically consists of:

  1. Header: The header contains the object type ("blob") and the length of the content in bytes, separated by a space, and ending with a null byte (\0).

    • Example: If the content is "Hello, Git!" then the header might look like: blob 11\0.
  2. Content: This is the actual content of the file, stored as-is, usually in a compressed form.

    • Example: The text "Hello, Git!"
  3. SHA-1 Hash: The SHA-1 hash of the blob serves as a unique identifier. It is generated by hashing the header and content.

The blob object is stored in a compressed form (usually zlib compression) in the .git/objects/ directory. The object file name is the SHA-1 hash of the object. The first two characters of the hash are used as the name of the subdirectory inside .git/objects/, and the remaining 38 characters serve as the filename within that directory.

Example

For example, if you have a file called hello.txt with the content "Hello, Git!", the blob object would have the following structure:

  1. Header: blob 11\0

  2. Content: Hello, Git!

  3. SHA-1 Hash: The hash generated by hashing the header and content together.

You could use the git hash-object command to find the SHA-1 hash for the content of hello.txt:

git hash-object hello.txt

And you could use the git cat-file command to inspect the blob object:

git cat-file -p <blob_hash>

This will output Hello, Git!, the content of the file.

Blob objects are designed to be simple containers for file content, while the complexity of relationships, history, and metadata is managed by tree and commit objects.