Blog Engineering How we prevented security fixes leaking into our public repositories
January 4, 2021
7 min read

How we prevented security fixes leaking into our public repositories

Working in the open makes it difficult to work on security vulnerabilities before they're disclosed, especially when that openness discloses them early!

security-leaks-unlocked.jpg

One of GitLab's core values is "public by default," which means we develop in
the open whenever possible. One notable exception to this is security fixes,
because developing security fixes in public discloses vulnerabilities before a
fix is available, exposing ourselves and our users to attacks.

In order to work on these security issues in private, public GitLab projects
have a security mirror that's accessible only to GitLab engineers. A design flaw in GitLab's mirroring feature would cause commits from the
Security repository to be exposed in the public repository before they were
intended for release.

In this post we'll describe what the problem was and how we finally resolved it.

Mirroring setup

To ensure that developers working on a security fix are working against the
latest code for a project, we utilize GitLab's push mirror feature to mirror
the public ("Canonical") repository to its private Security fork.

On every commit to the Canonical repository, the Security repository receives
the same commit. All of the mirroring is performed by the Gitaly
server, which handles all of the Git calls made by GitLab.

In order to know which Git objects in the source are missing on the destination,
GitLab would fetch the remote and then tell Gitaly to perform the push that
would bring the two in sync, which is where the trouble starts.

By performing a fetch, every Git object in the Security repository was now
known and stored on-disk by the Canonical repository
. If someone knew the SHA
of a commit in the private repository that contained a security fix, they
could view it in the public repository and discover the vulnerability we were
fixing before it had been publicly disclosed.

No guessing necessary

Thankfully, even a truncated Git commit SHA is difficult to guess, so at first
glance this might not look like a high-severity issue.

However, the GitLab help page shows exactly which
commit is currently running, and we always deploy security fixes to GitLab.com
for verification and to protect our users against the latest threats. Here's
what that might look like:

GitLab Enterprise Edition 13.7.0-pre 690e4bbfe94

When a security release was in progress, any logged-in user could click on the
running commit SHA and view the entire source code tree at that point, security
fixes included!

Experimenting with a fix

The mirroring setup was a crucial part of our development and release process,
and the existing fetch-based behavior was itself a crucial piece of what made
the mirroring functionality work. During our initial investigation, there was no
obvious fix. One proposed workaround was to simply remove the SHA from the Help
page, but that would only hide the problem and "security through obscurity"
isn't really security at all.

Another workaround, which we ended up implementing, was to
pause the mirroring as soon as a security fix was merged, and re-enable it
once the security release was published. This prevented the leak because the
fetch was no longer happening, but it would "stop the world" while we worked
on a security release. The Security mirror quickly fell behind public
development, which created a risk of new features causing merge conflicts
with the security fixes, or vice versa.

Staff engineer Jacob Vosmaer, who began the Gitaly project within GitLab,
pointed out that, strangely, we only used this fetch-based behavior for
branches; tags used Git's low-level ls-remote command.

Whereas Git's fetch command creates a local copy of every object from the
remote repository, the ls-remote command only prints the remote's available
references to the terminal. If we used ls-remote for branches like we did for tags, the commits from
the mirror would no longer be persisted on-disk, and thus wouldn't be
available in the public repository.

Because push mirroring is such a critical part of our own workflow as well as
our users', we didn't want to just make the change and hope for the best. We
set up an experiment, where the old functionality stayed exactly as it was,
but when a feature flag was enabled, we'd also gather the same commit
information using ls-remote, and compare the new results to the original,
logging any differences.

The experiment ran on GitLab.com for about a month without major discrepancies.
It looked like we had a solution!

Iterating on the experiment

Considering the experiment a success, but still being wary of breaking a key
piece of functionality, we proceeded with caution. Rather than replacing the old
behavior outright with the new, we split the two paths based on a feature
flag
.

When the flag was disabled the old, tried-and-true behavior would be used. With
the flag enabled, we'd use the new. We shipped this change and left the flag
enabled, watching for errors.

After two weeks without any reported mirroring errors, and with the security
leak no longer occurring, we were satisfied we had found our fix.

First we shipped a self-managed release with the feature flag enabled by
default
, to ensure that if something unexpectedly broke for those
installations it would be easy to revert to the previous behavior. Finally, after no errors reported from self-managed users, we removed the
feature flag along with the old behavior
, and closed out the
confidential issue.

An annoying bug emerges

Shortly after making the new behavior the default, we started getting
complaints from team members. They'd receive an automated email
telling them that a push mirror was broken, only to go check on the mirror and
be told everything was fine.

This went on for about two months due to the transient nature of the errors.
Every time we'd get an email and check to see if it was accurate, the mirroring
reported everything was fine.

As we began to implement a new piece of tooling that depended on
accurate status reporting from push mirroring, the problem became bigger than a
few annoying, seemingly inaccurate emails; it was causing our tooling to behave
erratically as well.

Because we had absolutely no idea what was happening or why, our first step was
to add logging when Gitaly was encountering an error that would mark the
mirror as failed. The logging revealed a weird anomaly where it
appeared that the Security repository – the one receiving updates – appeared
to be ahead of its source:

I, [2020-09-21T10:10:31] Divergent ref due to ancestry -- remote:f73bb2388a6, local:59812e04368
I, [2020-09-21T10:26:39] Divergent ref due to ancestry -- remote:8ddcb3333da, local:f73bb2388a6

In this pair, the first message is saying that the remote – the Security
repository – was showing its latest commit as f73bb2388a6, and that it wasn't
an ancestor of the local 59812e04368 commit, causing the error message. On the
next run, we see that the local repository has "caught up" to the Security
remote from the prior run.

It turned out that due to the number of branches and tags in this repository,
the ls-remote command was taking so long to complete that by the time the data
was returned, the local repository was updated by a new push.

Because we gathered the remote refs after the local ones, a network delay
created a window for new local commits to be written and invalidate our list
of local refs. Luckily there was a nice boring solution: all we had to do
was swap the order in which we gather references.

Wrapping up

As soon as we swapped the order for gathering references, the transient errors
went away and we finally got to close this long-standing issue. We were pleased
with how we were able to modify such a critical piece of functionality safely
and without any negative user impact.

Photo by iMattSmart on Unsplash

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

New to GitLab and not sure where to start?

Get started guide

Learn about what GitLab can do for your team

Talk to an expert