Under The Hood Of Git And .git Folder

As developers, we inevitably use some form of version control system. The most popular one is Git. But have you ever stopped to ask how Git actually works under the hood?

This question is more important than it might seem. At some point, you’ll run into an issue that can’t be fixed by copying a command from the internet. Without understanding Git’s internal workings, such problems are very frustrating.

By learning how Git works from first principles, you gain the ability to reason about issues, debug confidently, and truly understand what your tools are doing rather than treating them as black boxes.

What Even Is A Git Repository?

When we run the git init command, Git does more than just print
“Initialized empty Git repository.”bBehind the scenes, Git creates a hidden folder called .git in your project directory. You can see this folder by running:

ls -a

This .git directory is the heart of your Git repository. Every piece of information that Git needs to track changes commit history, branches, metadata, and objects is stored here. In simple terms, creating the .git folder is like telling Git “Hey Git, this is the directory you should watch. Track everything that changes here.” From this moment onward, your directory officially becomes a Git repository.

If you open the .git folder, you’ll find several files and subdirectories such as:

HEAD
objects/
refs/
index
logs/
config
hooks/

Each of these plays a specific role in how Git functions internally.

There’s a lot to unpack here, and covering everything at once would be overwhelming. So for the scope of this blog, we’ll focus only on the pieces required to understand how a commit works internally.

We’ll start with the most important file in Git’s workflow, the HEAD file.

HEAD

In a Git repo, HEAD is a pointer to the current working branch. If you are working on the “Sacred Branch” then HEAD will point to that branch

It help Git to know where you are currently working. When you open the HEAD fil you might see a file path refs/heads/master.

This naturally brings us to the refs folder. Let us see what’s going on in there.

/refs

The refs directory is where Git stores references, i.e. human-readable names that point to specific commits in your repository. Inside this directory, you’ll find subfolders that represent different kinds of references:

refs/heads/
Stores references to the tips or latest commit of local branches. Each file corresponds to a branch and contains the SHA-1 (or SHA-256) hash of the commit that branch currently points to. SHA stands for Secure Hash Algorithm. If you add more branches git will create folders for each of them separately.
refs/tags/
Stores references to tags, which usually point to a fixed commit and do not move.

Now if you open the file specified in the HEAD file using cat you will see the SHA of the current commit on that branch like “2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824“.

cat .git/refs/heads/master <- file path specified inside HEAD file.

But what does git do with this SHA? To anwer this let us now go into the object folder.

/object

The objects directory is where Git stores the actual contents of your repository as a collection of immutable objects. Together, these objects form a snapshot-based history of your codebase. Inside this directory, you’ll notice many subfolders with two-character names. These names correspond to the first two characters of an object’s SHA hash.

For example, consider a Git object with the SHA:

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Git stores this object inside the directory:

.git/objects/2c/

The remaining characters of the SHA are used as the filename inside that folder.

Splitting objects into directories based on the first two characters of the hash prevents any single directory from containing too many files, which keeps Git fast and efficient even for large repositories.

Let us go deeper inside this “2c” folder to see (see what I did there?) what it has.

Viewing the contents of Git objects

Alright let’s unpack these objects shall we? Now that we know Git stores data inside the .git/objects directory, let’s see how we can actually inspect what’s inside these two-character-named folders.

Step 1: List the objects directory

ls .git/objects

You’ll see output like:

00 02 2c 3f 7a 9b info pack

Ignore info and pack for now.
Each of the remaining folders represents the first two characters of a Git object’s hash.

Step 2: Look inside one folder

ls .git/objects/2c

Output might look like:

f24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

This filename, combined with the folder name, gives the full hash “2c + f24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824”

Step 3: Why you cannot open these files normally

If you try:

cat .git/objects/2c/f24dba5f...

You’ll see unreadable binary output.

That’s because:

Git compresses objects
Git adds headers (object type + size)
You are not meant to read or change these files directly

But we can’t let that stop us can we? So how do we read them?

We can use the following command:

git cat-file -p 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Depending on the object type, you’ll see different output.

For a blob (file contents):

Hello World

For a tree (directory structure):

100644 blob a3f5...    main.cpp
040000 tree b7c9...    src

For a commit:

tree 8b1d...
parent 3f6a...
author John Doe <john@example.com> 1704622341 +0530
committer John Doe <john@example.com> 1704622341 +0530

Initial commit

Git Objects

Now that we can read the objects in the directory let us try to understand what kind of objects does git use and why? Everything Git tracks is stored as an object, and each object is identified by the hash of its contents.

The three most important Git objects you need to understand are Blob, Tree, Commit Together, they describe the complete state of your repository at any point in time.

Blob objects

A blob (binary large object) stores the contents of a file, and nothing else. If two files have the exact same content even if they have different names Git stores only one blob for both. This is one of the reasons Git is so efficient with storage.

Tree objects

A tree object represents a directory. A tree contains:

References to blob objects (files)
References to other tree objects (subdirectories)
Filenames and permissions

In other words, trees define how blobs are organized. You can think of a tree as a snapshot of a folder at a specific moment in time.

Commit objects

A commit object ties everything together. A commit contains a reference to a tree object, parent commits, author and committer information, ommit message. A commit does not store file changes.
It stores a pointer to a tree, which in turn points to blobs and other trees.

What Actually Happens When We Commit?

Consider a folder structure like the one given bellow

In terms of Git objects the Repo and Folder 0 will be categorized as trees and File 0 and File 1 as Blobs.

Now say in this state I commit my changes, then in terms of Git objects the repo would look something like this

You start with commit 0, which points to a tree representing the full repository snapshot at that time.

Now suppose you:

modify file 1
leave folder 0 and everything inside it unchanged
create a new commit

The key question is exactly the one you asked, Will Git take a snapshot of the entire repository again? The answer is (drum rolllsss) NO. This is where Git is brilliant. So what will happen exactly?

Even though Git conceptually takes a snapshot of the entire repository, it does not duplicate everything. Instead, Git creates new objects only for the parts that changed. In this case, a new blob is created for the modified file, while the unchanged folder and files are reused by pointing to the same tree and blob objects from the previous commit.

The result is a new commit that represents a complete snapshot of the repository, without wasting space by copying unchanged data.Well, the new commit commit 1 will only create a snapshop for the changed file and for the unchanged file it will just point towards the reference in its previous snapshot or commit as we can see in the diagram above.

Conclusion

So there you have it, folks. You now know what to look for when a commit doesn’t quite make sense and how to break down Git-related problems from first principles instead of guessing commands. By understanding how Git stores data internally commits, trees, blobs, and references you’re no longer treating Git as a black box. You can reason about what’s happening behind the scenes, debug issues with confidence, and make sense of seemingly “weird” Git behavior in your projects.

Under The Hood Of Git And .git Folder

What Even Is A Git Repository?

HEAD

/refs

/object

Viewing the contents of Git objects

Step 1: List the objects directory

Step 2: Look inside one folder

Step 3: Why you cannot open these files normally

Git Objects

Blob objects

Tree objects

Commit objects

What Actually Happens When We Commit?

Conclusion

Comments

More from this blog

Understanding Network Devices

Emmet for HTML: A Beginner’s Guide to Writing Faster Markup

Understanding CSS Selectors & The Foundation of Styling Web Pages

Understanding HTML Tags and Elements

How a Browser Works: A Beginner-Friendly Guide to Browser Internals

Command Palette

What Even Is A Git Repository?

HEAD

/refs

/object

Viewing the contents of Git objects

Step 1: List the objects directory

Step 2: Look inside one folder

Step 3: Why you cannot open these files normally

Git Objects

Blob objects

Tree objects

Commit objects

What Actually Happens When We Commit?

Conclusion

Comments

More from this blog