Update: Why GitLab uses a single codebase for Community and Enterprise editions

In "GitLab might move to a single Rails
codebase", we announced that GitLab
might move to using a single codebase for GitLab Community Edition (CE) and
GitLab Enterprise Edition (EE). Since then we have decided to continue moving
toward a single codebase. In this article, I highlight some of the challenges,
required work, and steps remaining to complete the switch.

What is codebase?

What is a codebase, I hear you ask? Well, a codebase (which is at times spelled as code base) is essentially the entire collection of source
code that is required for a program or application to function properly. This can include things like configuration
files, libraries, and other dependencies, in addition to the actual application code. The codebase is
typically stored in a single location, often within a source control repository, where multiple developers
can access and make contributions to it.

Multiple developers can use and contribute to a single codebase, which is generally retained within a source control
repository. As such, it can assist with the backup and versioning of overlapping code
modifications/alterations. This can be especially important for larger projects that require a lot of coordination
and communication between team members. With everyone working from the same codebase, it becomes easier
to ensure that changes are made consistently and in a way that does not break the application.

Why GitLab uses a single codebase?

Prior to using a single codebase, for years CE and EE used two different repositories for the Rails application.
By using separate repositories we could separate proprietary code from code that
is free software. On the surface this seems like a good idea for different
reasons (e.g., licensing), but over the years the drawbacks
began to outweigh the benefits.

We mention some of these drawbacks in a previous
article, but more or less they all
come down to the same core problem: It made the development process more complex
than necessary. For example, we ended up with around 150 merge requests spread
across CE and EE for a security release from several months ago. While the
process of merging these merge requests is automated, we ran into a variety of
issues (e.g. failing tests) that required manual intervention. We could have
reduced the number of merge requests by half if we used a single repository,
creating less work for developers and release managers.

Toward the end of 2018, I felt that we were running out of time and had to do
something about the separation of CE and EE. We had always tried to avoid
merging the two repositories due to the complexity and time involved, but it
started to become more and more clear we had no other option. Marin
Jankovski, Delivery engineering manager, and I made a
plan to merge the two repositories. Marin wrote a design
document
that outlined the details of it all. The design document showed what challenges
we faced, and gathered the critical support required for the largest engineering
projects at GitLab to date.

What is the difference between a codebase and a repository?

The basic difference between a codebase and a repository is that one is for old code and one is for new code.

But more specifically...

A codebase can be either a public or private place to store large amounts of code that is actively being iterated on in a version control system, and typically stored in a source control repository in a version control system.

A source code repository is where an archived version of the code being worked on is kept. It’s also a place to house documentation, notes, web pages, and other items in your repository.

Working toward a single codebase

Moving to a single codebase is not something we can do overnight for a project
the size of GitLab. Workflows must be adapted, developers need to adjust to the
new setup, and automation requires extensive changes.

One of the biggest challenges from an engineering perspective was to come up
with a way to transparently remove proprietary code from GitLab when building a
CE release. A naive approach might involve a script that removes known bits of
proprietary code. While this might work for small projects that don't change
often, this was not going to work for a project the size of GitLab.

Ruby provides us with a solution to this problem. In Ruby, you can create a
module and inject it into another module or class. Once injected, the
functionality of the module becomes available to the target module or class.
This is best illustrated with a simple example:

class Person
  def initialize(name)
    @name = name
  end

  def name
    @name
  end
end

module Greet
  def greet
    "Hello #{name}"
  end
end

Person.include(Greet)

alice = Person.new('Alice')

alice.greet # => "Hello Alice"

Here we define a class Person, followed by a module that is used to create a
message greeting a person. Next, we include it into the Person class, at which
point we can use the module's methods for instances of the Person class. The
result is the message "Hello Alice."

While this example is not exciting, using a setup like this allows us to
move proprietary code to separate modules, and inject these modules when GitLab
EE is used. For GitLab CE, we would remove these modules, and the code injecting
these modules would have to disable itself transparently and automatically.

GitLab EE has been using this setup since late 2016 with all EE modules residing
in a separate "ee" directory, but in a limited number of places. This meant that
in some places EE and CE code got mixed together, while in other places the two
are separate. For example, we had code like this:

 def lfs_upload_access?
   return false unless project.lfs_enabled?
   return false unless has_authentication_ability?(:push_code)
+  return false if project.above_size_limit? || objects_exceed_repo_limit?

   lfs_deploy_token? || can?(user, :push_code, project)
 end

Here EE added a line into an existing method without using a separate module,
making it difficult to remove the EE-specific code when for CE.

Before we could move to a single codebase, we had to separate EE-specific code from code shared between CE and EE. Due to the amount
of work necessary, we divided the work into two departments: backend and
frontend. For every department we created issues outlining the work to do for
the various parts of the codebase. We even included the exact lines of code
that had to change directly in the created
issues, making it simple
to see what one had to do. Each department also had an engineer assigned as the
lead engineer, responsible for taking on the most difficult challenges. Filipa
Lacerda, senior frontend engineer of Verify (CI)
and Delivery, was in charge of frontend code. As the Delivery backend engineer,
I myself was in charge of backend code.

Some changes were small and took a short amount of time, with others were big
and took weeks. One of my big challenges was to make sure CE and EE use the same
database schema,
changing just under 24,000 lines of code over a two-month period.

In total the work involved 55
different engineers submitting more than 600 merge requests, closing just under
400 issues, and changing nearly 1.5 million lines of code

Filipa spent a lot of time creating 168 frontend issues outlining specific tasks
as well as submitting 124 merge requests to address the majority of these
issues. Resolving some of these issues required getting rid of some
technical debt first, such as breaking up large chunks of code into smaller
chunks, and
coming up with a way to create EE-specific Vue.js
templates.

While Filipa and I took on the biggest challenges, in total the work involved 55
different engineers submitting more than 600 merge requests, closing just under
400 issues, and changing nearly 1.5 million lines of code.

Moving toward a single codebase

With most of the work done, we could start looking into what project setup we
would use for a single codebase. We came up with three different approaches:

1. Single codebase: moving all development into gitlab-ce

All code and development is moved into the gitlab-ce repository. The gitlab-ee
repository is archived, and a separate repository is set up as a mirror of
gitlab-ce, called gitlab-foss. Proprietary code is removed from this mirror
automatically.

Since most of GitLab's development takes place in the current gitlab-ce
repository, this setup would reduce the number of issues to move as well as merge requests to close. A downside of this approach is that clones of
the gitlab-ce repository will include proprietary code.

2. Single codebase: moving all development into gitlab-ee

All code and development is moved into the gitlab-ee repository. The gitlab-ce
repository remains as is in terms of code, and will become a mirror of gitlab-ee. Like
the first option, proprietary code is removed from this mirror automatically.

This setup means that users cloning gitlab-ce don't end up with proprietary code
in their copy of gitlab-ce.

3. Single codebase: moving all development into a new repository

We set up an entirely new repository called "gitlab," and move all code and
development into this repository. The gitlab-ce and gitlab-ee repositories will
become read-only. A mirror is set up (called "gitlab-foss") that mirrors the new
"gitlab" repository, without including proprietary code.

Deciding which single codebase approach to take

Having evaluated all the benefits and
drawbacks, we decided to go with
option two: move development into gitlab-ee. This approach has several benefits:

The code of the gitlab-ce repository remains as is, and won't include any
proprietary code.
We do not need a separate mirror repository that does not include proprietary
code. Instead, we rename the gitlab-ce repository to "gitlab-foss." We are
renaming the repository since having "gitlab" and "gitlab-ce" as project
names could be confusing.
Users building CE from source don't end up with proprietary code in their
copy of the gitlab-ce repository.
We keep the Git logs of both gitlab-ce and gitlab-ee, instead of losing the
logs (this depends a bit on how we'd move repositories around).
It requires the least amount of changes to our workflow and tooling.
Using a single project and issue tracker for both CE and EE makes it easier
to search for issues.

Issues created in the gitlab-ce project will move to the gitlab-ee project,
which we will rename to just "gitlab" (or "gitlab-org/gitlab" if you include the
group name). This project then becomes the single source of truth, and is used
for creating issues for both the CE and EE distributions.

Moving merge requests across projects is not possible, so we will close any open
merge requests. Authors of these merge requests will have to resubmit them to
the "gitlab" (called "gitlab-ee" before the rename) project.

When moving issues or closing merge requests, a bot will also post a comment
explaining why this is done, what steps the author of a merge request has to
take, and where one might find more information about these procedures.

Prior to the single codebase setup, GitLab community contributions would be submitted
to the gitlab-ce repository. In the single codebase, contributions are instead
submitted to the new gitlab repository ("gitlab-org/gitlab"). EE-specific code
resides in a "ee" directory in the repository. Code outside of this directory
will be free and open source software, using the same license as the gitlab-ce
repository currently uses. This means that as long as you do not change anything
in this "ee" directory, the only change for GitLab community contributions is the use
of a different repository.

Our current plan is to have a single codebase the first week of September. GitLab 12.3 will be the first release based on a single codebase.

Users that clone GitLab EE and/or GitLab CE from source should update their Git
remote URLs after the projects are renamed. This is not strictly necessary as
GitLab will redirect Git operations to the new repository. For users of our
Omnibus packages and Docker images nothing changes.

Those interested in learning more about what went on behind the scenes can refer
to the following resources:

Cover image from Unsplash

Update: Why GitLab uses a single codebase for Community and Enterprise editions

What is codebase?

Why GitLab uses a single codebase?

What is the difference between a codebase and a repository?

Working toward a single codebase

Moving toward a single codebase

1. Single codebase: moving all development into gitlab-ce

2. Single codebase: moving all development into gitlab-ee

3. Single codebase: moving all development into a new repository

Deciding which single codebase approach to take

More to explore

Introducing the GitLab CI/CD Catalog Beta

How user research transformed GitLab Runner Fleet dashboard visibility and metrics

Why and how we rearchitected Git object database maintenance for scale

We want to hear from you

Ready to get started?

Update: Why GitLab uses a single codebase for Community and Enterprise editions

What is codebase?

Why GitLab uses a single codebase?

What is the difference between a codebase and a repository?

Working toward a single codebase

Moving toward a single codebase

1. Single codebase: moving all development into gitlab-ce

2. Single codebase: moving all development into gitlab-ee

3. Single codebase: moving all development into a new repository

Deciding which single codebase approach to take

Sign up for GitLab’s newsletter

More to explore

Introducing the GitLab CI/CD Catalog Beta

How user research transformed GitLab Runner Fleet dashboard visibility and metrics

Why and how we rearchitected Git object database maintenance for scale

We want to hear from you

Ready to get started?