The GitLab Rails application runs on Puma, a multi-threaded Rack application server written in the new Ruby.
We recently updated Puma to major version 5, which introduced a number of important
changes,
including support for compaction, a technique to reduce memory fragmentation in the
Ruby heap.
In this post we will describe what Puma's "nakayoshi fork" does, what compaction is,
and some of the challenges we faced when first deploying it.
Nakayoshi: A friendlier fork
Puma 5 added a new configuration switch: nakayoshi_fork
. This switch affects Puma's behavior when
forking new workers from the primary process. It is largely based on a Ruby gem of the same name
but adds new functionality. More specifically, enabling nakayoshi_fork
in Puma will result in two additional
steps prior to forking into new workers:
-
Tenuring objects. By running several minor garbage collection cycles ahead of a
fork
, Ruby can promote survivors
from the young to the old generation (referred to as "tenuring"). These objects are often classes, modules, or long-lived
constants that are unlikely to change.
This process makes forking copy-on-write friendly because tagging an object as "old" implies a write
to the underlying heap page. Doing this prior to forking means the OS won't have
to copy this page from the parent to the worker process later. We won't be discussing copy-on-write in detail but
this blog post offers a good introduction to the topic and how it relates to Ruby and pre-fork servers. -
Heap compaction. Ruby 2.7 added a new method
GC.compact
, which
will reorganize the Ruby heap to pack objects closer together when invoked.GC.compact
reduces Ruby heap fragmentation and
potentially frees up Ruby heap pages so that the physical memory consumed can be reclaimed by the OS.
This step only happens whenGC.compact
is available in the version of Ruby that is in use (for MRI, 2.7 or newer).
In the remainder of this post, we will look at:
- How
GC.compact
works and its potential benefits. - Why using C-extensions can be problematic when using compaction.
- How we resolved a production incident that crashed GitLab.
- What to look out for before enabling compaction in your app, via
nakayoshi_fork
or otherwise.
How compacting garbage collection works
The primary goal of a compacting garbage collector (GC) is to use allocated memory more
effectively, which increases the likelihood of the application using less memory over time.
Compaction is especially important when processes can share memory, as is the case with Ruby pre-fork
servers such as Puma or Unicorn. But how does Ruby accomplish this?
Ruby manages its own object heap by allocating chunks of memory from the operating system called pages
(a confusing term since Ruby heap pages are distinct from the smaller memory pages managed by the OS itself).
When an application asks to create a new object, Ruby will try to find a free object slot in one of these
pages and fill it. As objects are allocated and deallocated over the lifetime of the application,
this can lead to fragmentation, with pages being neither entirely full nor entirely empty. This is the
primary cause for Ruby's infamous runaway memory problem: Since the available space isn't optimally used,
pages will rarely be entirely empty and become "tomb pages" which means it is necessary for the pages to be empty for them to be deallocated.
Ruby 2.7 added a new method, GC.compact
, which aims to address this problem by walking the entire
Ruby heap space and moving objects around to obtain tightly packed pages. This process will ideally make
some pages unused, and unused memory can be reclaimed by the OS. Watch this video from RubyConf 2019 where Aaron Patterson, the author of this feature, gave a good introduction to compacting GC.
Compaction is a fairly expensive task since Ruby needs to stop-the-world for a complete heap reorganization so
its best to perform this task before forking a new worker process, which is why Puma 5 included this step when performing nakayoshi_fork
. Moreover, running compaction before forking
into worker processes increases the chance of workers being able to share memory.
We were eager to enable this feature on GitLab to see if it would reduce memory consumption, but things didn't entirely go as planned.
Inside the incident
After extensive testing via our automated performance test suite and in preproduction
environments, we felt ready to explore compaction on production nodes. We kept a
detailed, public record of what happened
during this production incident, but the key details are summarized below:
- The deployment passed the canary stage, meaning workers who had their heaps compacted were serving traffic
successfully at this point. - Sometime during the full fleet rollout, problems emerged: Error rates started spiking but not
across the entire fleet. This phenomenon is odd because errors tend to spread across all workers due to load balancing. - The error messages surfacing in Sentry were mysterious at best:
ActionView::Template::Error uninitialized constant #<Class:#GrapePathHelpers::DecoratedRoute:0x00007f95f10ea5b8>::UNDERSCORE
. Remember this error message for later. - We discovered the affected workers were segfaulting in
hamlit
,
a high-performance HAML compiler. Hamlit uses a C-extension to achieve better performance. The segfaulting and the fact
that we were rolling out an optimization that touches GC-internal structures was a tell-tale sign that
compaction was likely to be the cause. - We rolled back the change to quickly recover from the outage.
How we diagnosed the problem
We were disappointed by this setback and wanted to understand why the outage occurred. Fortunately,
Ruby provides detailed stack traces when crashing in C-extensions. The most effective way
to quickly analyze these is to look for transitions where a C-extension calls into the Ruby VM
or vice versa. These lines therefore caught our attention:
...
/opt/gitlab/embedded/lib/libruby.so.2.7(sigsegv+0x52) [0x7f9601adb932] signal.c:946
/lib/x86_64-linux-gnu/libc.so.6(0x7f960154c4c0) [0x7f960154c4c0]
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_id_table_lookup+0x1) [0x7f9601b15e11] id_table.c:227
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_const_lookup+0x1e) [0x7f9601b4861e] variable.c:3357
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_const_get+0x39) [0x7f9601b4a049] variable.c:2339
# ^