Blog Security How to write and continuously test vulnerability detection rules for SAST
September 8, 2021
12 min read

How to write and continuously test vulnerability detection rules for SAST

Interns with the Google Summer of Code helped GitLab transition from our old SAST tools to Semgrep.

anomaly-detection-cover.png

In summer 2021, the Vulnerability Research and Static Analysis
teams launched the Google Summer of Code (GSoC) project: Write vulnerability detection rules for SAST.

For this project, we built and implemented a framework to helps transition GitLab away from our current SAST tools over to Semgrep. Semgrep is a language-agnostic SAST tool that is gaining popularity in CI/CD environments.
Before replacing an analyzer with the corresponding Semgrep configuration (called rule-sets), we need to ensure that they are equivalent – in that they yield the same set of findings.

For this purpose, we built a testing framework that helps us assess the quality of a Semgrep rule-set. This framework has been used to guide the replacement of flawfinder, a C/C++ analyzer with a corresponding Semgrep rule-set. This new testing framework leverages the power of GitLab CI/CD.

Preliminaries

GitLab and the Google Summer Of Code (GSoC)

The Google Summer of Code (GSoC) is a 10-week program that enlists student interns to work on an open source project in collaboration with open source organizations. For GSoC 2021, GitLab offered 4 GSoC projects to the GSoC interns. The interns completed each of project under the guidance of a GitLab team member who serves as their mentor and provides regular feedback and assistance when needed.

[Read reflections from the Google Summer of Code interns about what it was like working with GitLab]

About Semgrep

Semgrep is a language-agnostic static-analysis (SAST) tool that is powered by tree-sitter. Tree-sitter is a robust parser-generator tool that supports parsing a variety of languages.

Semgrep supports a rule-syntax which can be used to formulate detection rules in a configuration-as-code YAML format. A Semgrep rule determines the findings that Semgrep is supposed to detect. These rules are combined together to create a rule-set.

About GitLab SAST

GitLab is a complete DevSecOps platform and integrates a variety of static analysis tools that help developers find vulnerabilities as early as possible in the software development lifecycle (SDLC).

Since all the integrated SAST tools are very different in terms of implementation as well as tech stack they depend on, the SAST tools are all wrapped in Docker images. The wrappers translate the native vulnerability reports to a generic, common report format which is made available by means of the gl-sast-report.json artifact. This generic report is GitLab's common interface between analyzers and the GitLab Rails backend.

Write vulnerability detection rules

Some background on our SAST tools

Over time, the growing number of integrated SAST tools has become a maintenance burden for GitLab due to two major contributing factors.

  1. Integration cost: All SAST tools have different release cycles – new releases have to be pulled in immediately so that our users can benefit from them. Given the large amount of integrated SAST tools, the time spent to monitor the SAST tools for new releases, integrating and testing them is expensive in terms of engineering effort/time.

  2. Inflexibility: Adapting or modifying SAST tools behavior is non-trivial because each tool is based on different technologies. Also, upstream contributions to the original analyzer repositories are not guaranteed to be included by the maintainers. In these cases, they require us to fork a project which is not a scalable solution with regards to maintenance effort.

GitLab is in the process of replacing various SAST tools with a single, language-agnostic SAST tool, called Semgrep, to fix these problems. Semgrep can be configured by means of rules that are used to define what Semgrep is supposed to find. These rules are provided as YAML configuration files so it is fairly easy to modify the behavior of Semgrep to different use cases.
Semgrep's configuration-as-code approach paired with its language support enables us to replace multiple analyzers, which effectively reduces the maintenance burden.

However, the SAST tool replacement itself is a challenging process. For the majority of use cases we have to assume that there is already a large amount of historic vulnerability data recorded and acted upon using GitLab's vulnerability management features. Users may also have grown accustomed to working with certain analyzers and may even have a certain level of expectation with regards to the findings produced by the analyzer.

A smooth transition from a language-specific analyzer to a corresponding Semgrep rule-set must be guaranteed by meeting a certain level of quality assurance. A rule-set should be at least as good as the results produced by the original analyzers, also known as parity. In turn, parity required we build test-suites to be used to measure the gap (in terms of rule coverage) between the original analyzer and the rule-set that is to replace it. A good quality rule-set is expected to perform at least as well as the SAST tool it aims to replace (zero gap, full parity).

There are cases where the original SAST tool may falsely report vulnerabilities. In these situations, we aim to improve our rule-set in a controlled manner by explicitly documenting our improvements. However, before improving a rule-set, we want to start from a position of complete parity so that we have a holistic view of the impact incurred by single rule improvements. This documentation of applied improvements is important so we can justify changes with regard to reported findings to the customer.

There are three challenges we tried to address with this project:

  1. Rule management: Provide a central rule repository to store, distribute and track changes applied to rules as well as test-cases.
  2. Rule testing: Every change applied to a rule in the rule repository triggers an automated gap-analysis that measure the quality of the rules in comparison to the original analyzers.
  3. Analyzer replacement: Replace at least one SAST tool (in our case flawfinder) with a corresponding rule-set – use the testing framework to ensure that the rule-set is on par with the original SAST tool.

We unpack each of these challenges in the next section.

How we approached these challenges

The architecture of the rule-testing framework is depicted in the code snippets below. All the Semgrep rules and the corresponding test-cases are stored in a central rule repository. Changes that are applied to the rules trigger the execution of our rule testing framework that uses the rules and test-cases to perform an automated gap analysis.

flowchart LR

crr[GitLab Rule Repository]

bandit(GitLab bandit)
bx[gl-sast-report.json]
sbx[gl-sast-report.json]
breport[bandit gap analysis report]

subgraph bandit comparison
   banditsemgrep(GitLab Semgrep)
   banditcompare(compare)
   bandit --> |run analyzer on test-cases| bx;
   banditsemgrep --> |run analyzer on test-cases| sbx;
   bx --> banditcompare
   sbx --> banditcompare
end
crr -->|bandit rules + rule id mappings| banditsemgrep;
banditcompare --> breport

fx[gl-sast-report.json]
fbx[gl-sast-report.json]
freport[flawfinder gap analysis report]
flawfinder(GitLab flawfinder)

subgraph flawfinder comparison
   flawfindersemgrep(GitLab Semgrep)
   flawfindercompare(compare)
   flawfinder --> |run analyzer on test-cases| fx;
   flawfindersemgrep --> |run analyzer on test-cases| fbx;
   fx --> flawfindercompare
   fbx --> flawfindercompare
end
crr -->|flawfinder rules + rule id mappings| flawfindersemgrep;
flawfindercompare --> freport

The rule testing framework is a compass that guides us through the rule development process by automatically measuring the efficacy of rules that are stored in the central rule (git) repository. This measurement happens during a comparison step that validates the findings reported by the original analyzer against the corresponding Semgrep rule-set. For the comparisons we cross-validate the SAST
reports (gl-sast-report.json) that adhere to the GitLab security report format. Since the main goal is to achieve parity between the original analyzer and our corresponding Semgrep rules, we treat the original analyzer as the baseline. The code snippet above depicts two example comparison steps for bandit and flawfinder. The gap analysis is explained in more detail in the "rule testing" section below.

Using a central rule git repository allows us to manage and easily track changes that are applied to rules and their corresponding test-cases in a central location. By means of GitLab CI/CD, we have a mechanism to automatically run tests that enforce constraints and guidelines on the rules and test-cases. Upon rule changes, we automatically trigger the rule-testing framework which enables us to spot gaps in our rules instantly. The structure of the central rule repository is detailed in the "rule management" section below.

How we addressed rule management challenges

The central rule repository is used to store, keep track of changes applied to rules/test-cases for a variety of different languages. By having a separate rule repository we can add CI jobs to test, verify, and enforce syntax guidelines.

The structure we use for the central rule repository is depicted below and follows the structure: <language>/<ruleclass>/{rule-<rulename>.yml, test-<rulename>.*} where language denotes the target programming language, <ruleclass> is a descriptive name for the class of issues the rule aims to detect and <rulename> is a descriptive name for the actual rule. We can have multiple test cases per rule (all prefixed with test-) and rule files rule-<rulename>.yml that are prefixed with rule- – a rule file contains a single Semgrep rule.

.
├── mappings
│   └── analyzer.yml
├── c
│   ├── buffer
│   │   ├── rule-strcpy.yml
│   │   ├── test-strcpy.c
│   │   ├── rule-memcpy.yml
│   │   └── test-memcpy.c
│   └── ...
└── javascript
│   └── ...
└── python
│    ├── assert
│    │   ├── rule-assert.yml
│    │   └── test-assert.py
│    └── exec
│    │   ├── rule-exec.yml
│    │   ├── test-exec.yml
│    │   ├── rule-something.yml
│    │   └── test-something.yml
│    └── permission
│    │   ├── rule-chmod.yml
│    │   └── test-chmod.py
│    └── ...
└── ...

In addition to the rules, we also store mapping files (in the mappings subdirectory). The mappings directory in this repository contains YAML configuration/mapping files that map native analyzer IDs to the corresponding Semgrep rules. An analyzer ID uniquely identifies the type of finding. The information in the mapping files helps us to correlate the finding from the original analyzer with their corresponding Semgrep findings and vice versa.

The mapping files are digested by the testing framework to perform an automated gap analysis. The goal of this analysis is to check if there is an unexpected deviation between Semgrep (with the rules in this repository) and a given analyzer.

A mapping file groups distinct rules into rule-sets and, thus, can be used to bundle different rules based on a certain domain. An excerpt from a mapping file is depicted below – it maps bandit rules (identified by bandit IDs) to Semgrep rules from the central rule repository.

bandit:
  - id: "B101"
    rules:
      - "python/assert/rule-assert_used"
  - id: "B102"
    rules:
      - "python/exec/rule-exec_used"
  - id: "B103"
    rules:
      - "python/file_permissions/rule-general_bad_permission"
  - id: "B104"
    rules:
      - "python/bind_all_interfaces/rule-general_bindall_interfaces"

How the rule testing framework works

The test-oracle/baseline is provided by the original analyzer when executed on the test-files. The rules in the central rule repository are compared and evaluated against this baseline. The execution of the testing framework is triggered by any change applied to the rule repository.

We run all analyzers (flawfinder, bandit, etc.) and their corresponding Semgrep rule-sets (as defined by the mapping files) on the test-files from the GitLab rule repository. The resulting gl-sast-reports.json reports that are produced by the original analyzer and by the Semgrep analyzer are then compared in a pairwise manner. To identify identical findings in both reports, we leverage the information from the mapping files that maps the rule-ids of the baseline analyzer to the corresponding Semgrep rule-ids for the rules stored in the central rule repository.

As output, we produce a gap analysis report (in markdown format). The gap analysis lists all the findings that have been reported by the original analyzers and groups them into different tables (based on the native rule-ids). The screenshot below shows a single table from the gap analysis report.

Gap Analysis Report
An example table from the gap analysis report.

The X symbols indicate whether the analyzers (in the example, flawfinder and Semgrep) were able to detect a given finding. The concrete findings as well as the rule files are linked in the table. To reach full coverage, flawfinder as well as Semgrep have to cover the same findings for all the rules that are reported by the baseline.

The analyzer replacement

To build a Semgrep rule-set that is on par with the capabilities of the original/baseline analyzer we leveraged the newly created testing framework. Flawfinder, a C/C++ analyzer, was the first analyzer we fully migrated to Semgrep using the testing framework as a compass.

First, we checked the flawfinder implementation to identify the implemented rules. Given that flawfinder is a Python script and that the rules are essentially stored in a dictionary/hash data-structure, we were able to semi-automatically extract the rules and generate the corresponding Semgrep rule files. We were also able to source the test-files from the flawfinder source code repository.

After the initial import of the first set of rules-files and test-cases, we used the information provided by the testing-framework to see which rules needed refinement.

We responded to the information provided by our testing framework in the following way:

  1. Findings covered by Baseline and covered by our rule-set: Nothing to be done.
  2. Findings covered by Baseline but not covered by our rule-set: This denotes an incomplete ruleset. In this case we extended the rule-file by providing additional pattern entries.
  3. Findings not covered by Baseline but covered by our rule-set: This usually denotes that some rules are too vaguely formulated. In this case, we refined our rules by using exclusions, e.g., by using pattern-not or by adding more detail to an already existing pattern.

The rule design was an iterative process where we closed the gaps between our semgrep rule-set and the flawfinder baseline in an iterative manner using the testing framework as an oracle to ultimately achieve 100% parity.

How the GSoC project helped GitLab

In this GSoC project we successfully built an automated rule/configuration testing framework that is driven by GitLab CI/CD capabilities and that provided the data we needed to replace flawfinder reliably and quickly with a corresponding Semgrep rule-set.

If you are interested in finding out more information about this GSoC project, please check out the following repositories:

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

New to GitLab and not sure where to start?

Get started guide

Learn about what GitLab can do for your team

Talk to an expert