We've taken a huge step in SHA-256 support in GitLab: The Gitaly project now fully supports SHA-256 repositories. While there is still some work we need to do in other parts of the GitLab application before SHA-256 repositories can be used, this milestone is important.
What is SHA-256?
SHA-256 is a hashing algorithm. Given an input of data, it produces a fixed-length hash of 64 characters with hexadecimal digits. Git uses hashing algorithms to generate IDs for commits and other Git objects such as blobs,
trees, and tags.
Git uses the SHA-1 algorithm by default. If you've ever used Git, you know that
commit IDs are a bunch of hexademical digits. A git log
command yields
something like the following:
commit bcd64dba39c90daee2e1e8d9015809b992174e34 (HEAD -> main, origin/main, origin/HEAD)
Author: John Cai <[email protected]>
Date: Wed Jul 26 13:41:34 2023 -0400
Fix README.md
The bcd64dba39c90daee2e1e8d9015809b992174e34
is the ID of the commit and is a
40-character hash generated by using the SHA-1 hashing algorithm.
In SHA-256 repositories, everything is the same except, instead of a 40-character
ID, it's now a 64-character ID:
commit e60501431d52f6d06b4749cf205b0dd09141ea0b3155a45b9246df24eee9b97b (HEAD -> master)
Author: John Cai <[email protected]>
Date: Fri Jul 7 12:56:52 2023 -0400
Fix README.md
Why SHA-256?
SHA-1, which has been the algorithm that has been used until now in Git, is
insecure. In 2017, Google was able to produce a hash collision. While the Git project is not yet impacted by these kinds of attacks due to the
way it stores objects, it is only a matter of time until new attacks on SHA-1
will be found that would also impact Git.
Federal regulations such as NIST and CISA guidelines,
which FedRamp enforces, set a due date in 2030 to
stop using SHA-1, and encourage agencies to move away from it sooner if
possible.
In addition, SHA-256 has been labeled experimental in the Git project for a long time,
but as of Git 2.42.0, the project has decided to remove the experimental label.
What does this mean for developers?
From a usability perspective, SHA-256 and SHA-1 repositories really don't have a
significant difference. For personal projects, SHA-1 is probably fine. However,
companies and organizations are likely to switch to using SHA-256 repositories
for security reasons.
See SHA-256 in action
If you have sha256sum(1)
installed, you can generate such a hash on the command line:
> printf '%s' "please hash this data" | sha256sum
62f73749b40cc70f453320e1ffc37e405ba50474b5db68ad436e64b61fbb8cf0 -
We can also see this in action in a Git repository. Let's create a repository,
add an initial commit, and inspect the contents of the commit object. Note: If
you try this yourself, the commit IDs will be different because the date of the
commit is part of the hash calculation.
> git init test-repo
> cd test-repo
> echo "This is a README" >README.md
> git add .
> git commit -m "README"
[main (root-commit) 328b61f] README
1 file changed, 1 insertion(+)
create mode 100644 README.md
> zlib-flate -uncompress < ./git/objects/32/8b61f2449205870f69b5981f58bd8cdbb22f95
commit 159tree 09303be712bd8e923f9b227c8522257fa32ca7dc
author John Cai <[email protected]> 1688748132 -0400
committer John Cai <[email protected]> 1688748132 -0400
README
In the last step, we uncompress the actual commit file on disk. Git zlib compresses object
files before storing them on disk.
zlib-flate(1)
is a utility that comes packaed with qpdf
that uncompresses zlib compressed files.
Now, if we feed this data back into the SHA-1 algorithm, we get a predictable result:
> zlib-flate -uncompress < .git/objects/32/8b61f2449205870f69b5981f58bd8cdbb22f95 | sha1sum
328b61f2449205870f69b5981f58bd8cdbb22f95 -
As we can see, the result of this is the commit ID.
The recommendation by NIST was to replace SHA-1 with SHA-2 or SHA-3. The
Git project has undergone this effort,
and the current state of the feature is that it's fully usable in Git and no
longer deemed experimental.
In fact, you can create and use repositories with SHA-256 as the hashing algorithm
to see it in action on your local machine:
> git init --object-format=sha256 test-repo
> cd test-repo
> echo "This is a README" >README.md
> git add .
> git commit -m "README"
[main (root-commit) e605014] README
1 file changed, 1 insertion(+)
create mode 100644 README.md
> git log
commit e60501431d52f6d06b4749cf205b0dd09141ea0b3155a45b9246df24eee9b97b (HEAD -> master)
Author: John Cai <[email protected]>
Date: Fri Jul 7 12:56:52 2023 -0400
README