I’m going to step back from tips over the next few days and cover some fundamentals over the Easter weekend.
There are 4 kinds of objects that Git works with:
- The blob
- The tree
- The commit
- The tag
Although I’ll cover the blob today there are common characteristics for each of these objects
- These objects just store contents. For a file, it’s just the contents of the file, not the name or other properties. For trees, the contents are just a list of the objects it contains, not the name of the tree.
- These objects are referenced by an id. This is made by taking the contents of the file (or tree, or whatever), adding a header with its size, compressing it with zlib and taking it’s SHA-1 hash. For convenience, I refer to these as the
- You usually see objects referred to by their 40 character string. Objects are immutable, and for our purposes, unique. Generally, using the first 7 or so characters of the sha is enough to completely reference them. By immutable, I mean that if you edit a file and check it in again, a whole new object is created with a new sha, but the old sha still exists so you can usually go back to it. This is what makes it easy in Git to go backwards and forwards in time.
- If the contents of two files are the same, then their shas are the same. This is important. It means that a file I create on my computer has the same sha as a file on your computer with the same contents, regardless of the name of the file.
- These objects live in your git repository, which is in the .git/objects/ directory, but don’t try and look at them directly. If you’re a software developer you know better than to use an implementation over an interface. There are plenty of commands that let you look at these objects from the command line.
With this in mind, let’s look at the quantum of Git — the blob
This is representation of the contents of a file within git. I cannot say this enough: it represents the contents of the file, not the name. not the permissions, not the location within your project tree. Nothing but the contents of the file.
This is the fundamental building block of the repository. Blobs are collected into trees, trees are collected into a top level tree which represents the top level of your project, and this tree is represented in a commit object which completely represents the stored state of your project at that point. We’ll cover this recursive object graph over the next few days.
But for now, let me prove to you that objects are unique to the contents of the file independently of the machine.
- Create a folder for your project. In my example I’ve used the name
blob, but you can call it what you like.
- Go into this directory.
- Create a git repository.
- Create a text file within this directory. I’ve called mine
blob.txt but you can call it what you want. In fact, you’re better off calling it something else.
echo "I'm a blob" > blob.txt to put content into your file. You can put your own file name after the
>. But the part between quotes should be the same because we want to have the same content in the file.
- Do a
cat to see that the contents are the same. This is what it looks like in my Terminal
- Add the file to the repository
- Commit the changes with a short commit message.
I need to make a slight diversion to introduce this handy command. It isn’t part of the generally used command set, but it’s useful for spelunking your repositories. In the same way that you can see the contents of files from the command line using the
cat command (on Mac OS and Linux at least) the
git cat-file command can show the contents of git objects. For now, we just need to know about the
-p flag which “pretty prints” the output.
Just to show that the same sha is generated on my machine as it will be on yours enter this in the terminal
git cat-file -p e80e4f2
Now I know that the sha for the file with the contents “I’m a blob” is
e80e4f2 on my machine, so I know that for a file with the same contents in your repo it will also have the same sha.
I’ve had to cheat a little, of course. The contents of the file are the same across machines, but all the other details will be different, so our commit shas won’t match, nor the trees. I’ll explain this over the next few days.