NOTE: A fundamental understanding of Git and GitHub is recommended to understand the following article in a better way.
It’s been a while since I've been using Git and GitHub to manage my projects. I already knew the basic commands and workflow of initializing Git, tracking changes, committing them, and also how to view these changes or the status using git log
and git status
.
But recently, during a lecture on VCS (Version Control System) at college, a friend of mine asked me where Git (a VCS) actually stores the history of our project. I knew that it created the .git
directory, hidden inside the directory where it was initialized, but I had never explored it enough to confidently answer his question.
Fortunately, the other day, during a class of mine by Piyush Garg and Hitesh Choudhary, we explored the .git
directory and dug into some very interesting things about it.
Let me take you through the journey of the .git
directory and what I explored.
To observe the .git
directory, initialize Git and commit some changes (at least 2-3).
.git Directory
The .git
is a hidden directory that is created by Git automatically whenever the git init
command is executed inside any directory. You can view it through the terminal command ls -a
(list all) in Linux or Mac and dir /a
in Windows.
You can also view it in the code editor itself. Those of you using VS Code can go to the settings tab using Ctrl + Shift + P
or Cmd + Shift + P
, and type in "User settings" (not the JSON one). Click on it and search for exclude
and remove **/git
from the pattern as shown.
Now that you have access to the .git
directory, let’s explore what’s in it.
If we list or see the files and directories of .git
, we can see various directories like hooks
, info
, logs
, objects
, refs
, and files such as index
and HEAD
. Let’s understand them one by one.
NOTE: Based on the stage your project is at, there might be more or fewer files and directories in the .git
directory
Index File
The index file, also known as the staging area, stores a snapshot of the changes that are staged to be committed. It contains metadata and references to the actual file content, but not the content itself.
If you try to cat
(a Linux command) or even open the file using IDE, you would notice some gibberish, non-relevant language in it alongside some of your project file names somewhere.
Well, the file is actually in a binary format. If you have ever used the git status
command, the info that Git displays in the terminal after the execution of this command is what is stored in the index file (i.e., "recently staged changes"). Most of the Git files are in binary format for compression purposes.
Where is the Info After Staging (i.e., Committed Files History) Stored?
Objects Directory
If you go inside the objects
directory located in .git
and see its contents, you will notice directories named after two alphanumeric characters. These two alphanumeric characters of each directory represent the initial two letters of a commit hash.
The commit hash is a unique identifier for each commit and is generated using the SHA-1 algorithm based on the commit's content, message, author, and timestamp. Git uses this hash to uniquely track the state of the project at each commit.
When you look inside each of these object directories, you will find files named after commit hashes (or other object types, such as tree or blob). These files contain the actual objects representing commits, file contents, or directory structures. The commit object includes the hash, metadata, and references to the previous commit.
What About the Branches?
HEAD File
The HEAD
file at the root level of the .git
directory contains a reference to the currently checked-out branch. It typically looks like ref: refs/heads/main
, indicating the main
branch. If you're in a detached HEAD state (e.g., checking out a specific commit), it will contain a commit hash instead.
REFS Directory
This is the directory where the record of branches and tags are kept. Both the local and remote branch info are kept in this directory. Tags are used to mark release points. Learn more about tags.
LOGS Directory
This is where Git tracks the history of changes (commits, branch updates) in the repository. The CLI commands like git log
or git reflog
interact with these files to provide us the information on the terminal.
Also, you might notice heads
and remotes
, indicating the local and remote references to the current state of branches and their updates.
CONFIG File
The config
file stores configuration settings for the specific Git repository, including user information, remote repository URLs, and other Git behaviors (like merge strategies or ignore patterns) specific to that repository.
HOOKS Directory
In a Git workflow, certain tasks like linting, formatting, or running tests can be automated using Git hooks. These hooks are scripts that run at specific points during the Git process (e.g., before committing, pushing, or merging) to maintain consistency and quality in the codebase.
The HOOKS
directory contains these scripts, and you can customize them to suit your project's needs.
Git hooks are nothing but scripts that run at specific points in a Git workflow. Learn more about it.
If you're a web developer, you might have encountered the .husky
package, which makes it easier to achieve this.
INFO Directory
Inside the info
directory, you will find the exclude
file. This file contains patterns for files and directories that should be ignored by Git, similar to .gitignore
, but it is specific to the repository. It’s useful for local ignore rules that you don’t want to commit to the repository.
Conclusion
There may be additional files and directories inside the .git directory, depending on the stage your project is at, all of which play a role in Git's functionality. However, it's not recommended for developers to go into the .git directory themselves but to use CLI commands to interact with and utilize git. That said, understanding its inner workings has been an interesting journey.
If you found this article insightful, feel free to drop a follow and a like.