Blog Company How we plan to build more observability tools on GitLab monitoring
August 29, 2019
7 min read

How we plan to build more observability tools on GitLab monitoring

Get the scoop on our plan to close the DevOps loop.

monitoring-update-feature-image.jpg

The product team at GitLab is working to close the DevOps loop by accelerating development
on new monitoring products that will offer more observability into application performance and
the health of your deployments.

Where does monitoring fit into the DevOps lifecycle?

Monitoring is the final Ops stage of the DevOps loop, coming up after the
production environment is configured and the application deployed. No developer should really
ship code and forget it. Monitoring is essential to proactively respond to simple and complex
problems, and helps GitLab customers uphold the expectations outlined in their service
level objectives (SLOs) with their users.

Our vision for monitoring at GitLab

We outlined big plans for building out our Ops capabilities in our 2018 GitLab product vision:
“A big milestone for GitLab will be when operations people log into GitLab every day and consider
it their main interface for getting work done.”

Since then, GitLab has been working diligently to build out our monitoring products to close the
DevOps loop. The goal is to build instrumentation that allows developers to proactively identify
SLO degradation and observe the impacts of code changes across multiple deployments in real-time.
The "North Stars" that guide product development in the monitoring stage include:

  • Instrument with ease: GitLab is set up so teams have generic observability into their
    application performance.
  • Resolve like a pro: GitLab correlates incoming observability data with CI/CD events and
    source code information so troubleshooting is easy.
  • Gain insights seamlessly: Our use of container-based deployments make it simpler to
    continuously collect insights into production SLOs, incidents, and observability sources across
    complex projects and multiple applications.

One of our core principles at GitLab is to dogfood everything
after all, if it doesn’t work for us, how can it work for our customers? We begin by
setting up our own infrastructure teams at GitLab.com
to use the incident management system
we’re developing, and also building out GitLab self-monitoring
so our administrators can monitor their self-managed GitLab instance the same way their
developers use GitLab to monitor their applications.

We also are committed to closing the DevOps loop by prioritizing cloud native first,
and building tooling designed to provide more insight in to application performance and the
health of deployments for Ops professionals.

Kenny Johnston, director of product (Ops) at GitLab, gave me an
overview of some of the new products the monitoring team is working on to help make this
vision a reality. Watch the full video of our conversation below and check out
the monitoring product roadmap
for an in-depth look at our goals and timeline.

Building an observability suite to close out the DevOps loop

The top priority for the monitoring team is to close the DevOps feedback loop for GitLab customers.
This means that if SLOs are degraded in any way, an alert is triggered and an incident is created
in GitLab allowing for an immediate response.

Our priority product categories at this stage are metrics, cluster monitoring, and incident management,
says Kenny.

“First I want to make sure that we can provide our customers with the instrumentation so that they
can define an SLO, and when their application exceeds or fails to achieve that SLO, that they can
respond in an instant,” says Kenny. “Once we have them doing that, we'll get a lot of good
feedback, and immediate feedback from users about what tools they need for diagnostic purposes.”

Measure your performance with enhanced metrics

We already have a successful integration
with open source metrics tool, Prometheus, which we use to collect and display performance metrics
for applications deployed on Kubernetes. The integration is sophisticated enough that developers
do not have to leave GitLab to collect important information on the impact of a merge request or
to monitor production systems. Our product category for metrics is “viable,” meaning customers
are using the instrumentation we’ve developed to solve real problems, bringing us a step closer to
closing out the DevOps loop.

Diagnostic tooling in product categories such as logging, tracing, and error tracking for monitoring
application performance (APM) is currently at the MVC stage, though the team has made plans to
accelerate development on logging in future GitLab deployments.

Kenny notes that our observability suite is one of the primary ways GitLab provides value for
operators that are thinking of making the move to cloud native.

“GitLab out-of-the-box keeps up with new cloud native technologies because we're constantly
adopting the newest versions, and our whole convention of configuration means we don't
leave it to you to figure it out, we've figured it out for you as a default,” explains Kenny.

Simplify Kubernetes management using GitLab

There is quite a bit of overlap between product category metrics and cluster monitoring at this
stage, as Prometheus is used to collect metrics on applications deployed using Kubernetes.
By offering out-of-the-box cluster monitoring on Kubernetes, we make it possible for operators
to monitor the health of their deployed environments all in one place.

One of the high-value cluster monitoring features
we’ve set up on GitLab is memory usage and capacity metrics (CPU) administration,
so users can be automatically alerted if either of those numbers are out of bounds on their deployed environments.

“We'd like to start adding capabilities for
cluster cost optimization, so
informing users not just when they're hitting capacity but when they're significantly under
capacity and should probably size down,” says Kenny. “That helps users who've configured a
Kubernetes cluster to not end up wasting it because it's being underutilized and not end up wasting money.”

Cluster monitoring was brought to “viable” stages in earlier GitLab releases as we transition to
Kubernetes, but the product team is building out alerting
and other cluster monitoring features in upcoming releases.

Dogfooding our new incident management system on GitLab

Creating an incident management system is key to a robust observability suite on monitoring:
“The features we've prioritized are oriented towards getting the right person the right information
to enable them to restore the services they are responsible for as quickly as possible,” according to
the category vision for an incident management system.

Because we recognize the urgency of building a functional incident management system,
GitLab is leveraging issues
as the base for creating a viable platform. The goal is to stress the capacity of our existing
tooling by focusing on integrations with communications tools such as Slack, Zoom, etc., so we can
accelerate time-to-market and iterate as we go, while also focusing on building out new functionality.

The infrastructure team on GitLab.com is dogfooding the incident management system
so we can put the tooling through its paces, making improvements as we go.

Outside the loop: Getting GitLab administrators to monitor GitLab using GitLab

Kenny says the product team has a strategy for creating more exposure to the monitoring capabilities
GitLab has in development: putting our monitoring capabilities front and center
for administrators of the GitLab self-managed instance.

“Today you can create a project for your application that's an e-commerce app, and get the
instrumentation to know whether the Kubernetes cluster is experiencing pain, whether SLOs that
you custom define have alerts and respond to that with incidents,” says Kenny. “We'd like you to have
that exact same experience, or expose you to that same experience with your GitLab self-managed
instance, so that as an administrator you're using the same tools to monitor and respond to
the GitLab instance as your developers would use to monitor and respond to their applications.”

By essentially setting up administators to dogfood the monitoring features we are providing to
developers for application management, we can ensure that they're battle-tested on a larger application.

The core challenge of the observability suite

While the product team at GitLab has a vision and roadmap for building a comprehensive suite of
observability instrumentation, there isn’t a clear consensus among monitoring experts as to what
is required for a robust observability suite in this new, cloud native world.

“There's varied opinion in the new world that's Kubernetes-based about what an observability
system looks like,” says Kenny. “There's a legacy view that seems to be evolving. So, we need to keep up
with that and of the industry's evolution of what we consider required. We as a company just
need to stay focused on what our users are asking for, and that's why I think
completing that DevOps loop is important first, because then we'll start getting immediate user feedback.”

Keep an eye out for these new monitoring updates in our 12.2 and 12.3 releases.

Cover photo by Glen . on Unsplash.

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

New to GitLab and not sure where to start?

Get started guide

Learn about what GitLab can do for your team

Talk to an expert