Blog Engineering Tips for productive DevOps workflows: JSON formatting with jq and CI/CD linting automation
April 21, 2021
15 min read

Tips for productive DevOps workflows: JSON formatting with jq and CI/CD linting automation

Learn how to filter in JSON data structures and interact with the REST API. Use the GitLab API to lint your CI/CD configuration and dive into Git hooks speeding up your workflows.

gert-boers-unsplash.jpg

What is JSON linting?

To understand JSON linting, let’s quickly break down the two concepts of JSON and linting.

JSON is an acronym for JavaScript Object Notation, which is a lightweight, text-based, open standard format designed specifically for representing structured data based on the JavaScript object syntax. It is most commonly used for transmitting data in web applications. It parses data faster than XML and is easy for humans to read and write.

Linting is a process that automatically checks and analyzes static source code for programming and stylistic errors, bugs and suspicious constructs.

JSON has become popular because it is human-readable and doesn’t require a complete markup structure like XML. It is easy to analyze into logical syntactic components, especially in JavaScript. It also has many JSON libraries for most programming languages.

Benefits of JSON linting

Finding an error in JSON code can be challenging and time-consuming. The best way to find and correct errors while simultaneously saving time is to use a linting tool. When Json code is copied and pasted into the linting editor, it validates and reformats Json. It is easy to use and supports a wide range of browsers, so applications development with Json coding don’t require a lot of effort to make them browser-compatible.

JSON linting is an efficient way to reduce errors and it improves the overall quality of the JSON code. This can help accelerate development and reduce costs because errors are discovered earlier.

Some common JSON linting errors

In instances where a JSON transaction fails, the error information is conveyed to the user by the API gateway. By default, the API gateway returns a very basic fault to the client when a message filter has failed.

One common JSON linting error is parsing. A “parse: unexpected character" error occurs when passing a value that is not a valid JSON string to the JSON. parse method, for example, a native JavaScript object. To solve the error, make sure to only pass valid JSON strings to the JSON.

Another common error is NULL or inaccurate data errors, not using the right data type per column or extension for JSON files, and not ensuring every row in the JSON table is in the JSON format.

How to fix JSON linting errors

If you encounter a NULL or inaccurate data error in parsing, the first step is to make sure you use the right data type per column. For example, in the case of “age,” use 12 instead of twelve.

Also make sure you are using the right extension for JSON files. When using a compressed JSON file, it must end with “json” followed by the extension of the format, such as “.gz.”

Next, make sure the JSON format is used for every row in the JSON table. Create a table with a delimiter that is not in the input files. Then, run a query equivalent to the return name of the file, row points and the file path for the null NSON rows.

Sometimes you may find files that are not your source code files, but ones generated by the system when compiling your project. In that instance, when the file has a .js extension, the ESLint needs to exclude that file when searching for errors. One method of doing this is by using ‘IgnorePatterns:’ in .eslintrc.json file either after or before the “rules” tag.

“ignorePatterns”: [“temp.js”, “**/vendor/*.js”],

“rules”: {

Alternatively, you can create a separate file named‘.eslintignore’ and incorporate the files to be excluded as shown below :
**/*.js
If you opt to correct instead of ignore, look for the error code in the last column. Correct all the errors in one fule and rerun ‘npx eslint . >errfile’ and ensure all the errors of that type are cleared. Then look for the next error code and repeat the procedure until all errors are cleared.

Of course, there will be instances when you won’t understand an error, so in that case, open https://eslint.org/docs/user-guide/getting-started and type the error code in the ‘Search’ field on the top of the document. There you will find very detailed instructions as to why that error is raised and how to fix it.

Finally, you can forcibly fix errors automatically while generating the error list using:

Npx eslintrc . — fix

This is not recommended until you become more well-versed with lint errors and how to fix them. Also, you should keep a backup of the files you are linting because while fixing errors, certain code may get overwritten, which could cause your program to fail.

JSON linting best practices

Here are some tips for helping your consumers use your output:

First, always enclose the Key : Value pair within double quotes. It may be convenient (not sure how) to generate with Single quotes, but JSON parser don’t like to parse JSON objects with single quotes.

For numerical values, quotes are optional but it is a good idea to enclose them in double quotes.

Next, don’t ever use hyphens in your key fields because it breaks python and scala parser. Instead use underscores (_).

It’s a good idea to always create a root element, especially when you’re creating a complicated JSON.

Modern web applications come with a REST API which returns JSON. The format needs to be parsed, and often feeds into scripts and service daemons polling the API for automation.

Starting with a new REST API and its endpoints can often be overwhelming. Documentation may suggest looking into a set of SDKs and libraries for various languages, or instruct you to use curl or wget on the CLI to send a request. Both CLI tools come with a variety of parameters which help to download and print the response string, for example in JSON format.

The response string retrieved from curl may get long and confusing. It can require parsing the JSON format and filtering for a smaller subset of results. This helps with viewing the results on the CLI, and minimizes the data to process in scripts. The following example retrieves all projects from GitLab and returns a paginated result set with the first 20 projects:

$ curl "https://gitlab.com/api/v4/projects"

Raw JSON as API response

The GitLab REST API documentation guides you through the first steps with error handling and authentication. In this blog post, we will be using the Personal Access Token as the authentication method. Alternatively, you can use project access tokens for automated authentication that avoids the use of personal credentials.

REST API authentication

Since not all endpoints are accessible with anonymous access they might require authentication. Try fetching user profile data with this request:

$ curl "https://gitlab.com/api/v4/user"
{"message":"401 Unauthorized"}

The API request against the /user endpoint requires to pass the personal access token into the request, for example, as a request header. To avoid exposing credentials on the terminal, you can export the token and its value into the user's environment. You can automate the variable export with ZSH and the .env plugin in your shell environment. You can also source the .env once in the existing shell environment.

$ vim ~/.env

export GITLAB_TOKEN=”...”

$ source ~/.env

Scripts and commands being run in your shell environment can reference the $GITLAB_TOKEN variable. Try querying the user API endpoint again, with adding the authorization header into the request:

$ curl -H "Authorization: Bearer $GITLAB_TOKEN" "https://gitlab.com/api/v4/user"

A reminder that only administrators can see the attributes of all users, and the individual can only see their user profile – for example, email is hidden from the public domain.

How to request responses in JSON

The GitLab API provides many resources and URL endpoints. You can manage almost anything with the API that you’d otherwise configure using the graphic user interface.

After sending the API request, the response message contains the body as string, for example as a JSON content type. curl can provide more information about the response headers which is helpful for debugging. Multiple verbose levels enable the full debug output with -vvv:

$ curl -vvv "https://gitlab.com/api/v4/projects"
[...]
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=gitlab.com
*  start date: Jan 21 00:00:00 2021 GMT
*  expire date: May 11 23:59:59 2021 GMT
*  subjectAltName: host "gitlab.com" matched cert's "gitlab.com"
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
[...]
> GET /api/v4/projects HTTP/2
> Host: gitlab.com
> User-Agent: curl/7.64.1
> Accept: */*
[...]
< HTTP/2 200
< date: Mon, 19 Apr 2021 11:25:31 GMT
< content-type: application/json
[...]
[{"id":25993690,"description":"project for adding issues","name":"project-for-issues-1e1b6d5f938fb240","name_with_namespace":"gitlab-qa-sandbox-group / qa-test-2021-04-19-11-13-01-d7d873fd43cd34b6 / project-for-issues-1e1b6d5f938fb240","path":"project-for-issues-1e1b6d5f938fb240","path_with_namespace":"gitlab-qa-sandbox-group/qa-test-2021-04-19-11-13-01-d7d873fd43cd34b6/project-for-issues-1e1b6d5f938fb240"

[... JSON content ...]

"avatar_url":null,"web_url":"https://gitlab.com/groups/gitlab-qa-sandbox-group/qa-test-2021-04-19-11-12-56-7f3128bd0e41b92f"}}]
* Closing connection 0

The curl command output provides helpful insights into TLS ciphers and versions, the request lines starting with > and response lines starting with <. The response body string is encoded as JSON.

How to see the structure of the returned JSON

To get a quick look at the structure of the returned JSON file, try these tips:

  • Enclose square brackets to identify an array [ …. ].
  • Enclose curly brackets identify a dictionary { … }. Dictionaries are also called associative arrays, maps, etc.
  • ”key”: value indicates a key-value pair in a dictionary, which is identified by curly brackets enclosing the key-value pairs.

The values in JSON consist of specific types - a string value is put in double-quotes. Boolean true/false, numbers, and floating-point numbers are also present as types. If a key exists but its value is not set, REST APIs often return null.

Verify the data structure by running "linters". Python's JSON module can parse and lint JSON strings. The example below misses a closing square bracket to showcase the error:

$ echo '[{"key": "broken"}' | python -m json.tool
Expecting object: line 1 column 19 (char 18)

jq – a lightweight and flexible CLI processor – can be used as a standalone tool to parse and validate JSON data.

$ echo '[{"key": "broken"}' | jq
parse error: Unfinished JSON term at EOF at line 2, column 0

jq is available in the package managers of most operating systems.

$ brew install jq
$ apt install jq
$ dnf install jq
$ zypper in jq
$ pacman -S jq
$ apk add jq

Dive deep into JSON data structures

The true power of jq lies in how it can be used to parse JSON data:

jq is like sed for JSON data. It can be used to slice, filter, map, and transform structured data with the same ease that sed, awk, grep etc., let you manipulate text.

The output below shows how it looks to run the request against the project API again, but this time, the output is piped to jq.

$ curl "https://gitlab.com/api/v4/projects" | jq
[
  {
    "id": 25994891,
    "description": "...",
    "name": "...",

[...]

    "forks_count": 0,
    "star_count": 0,
    "last_activity_at": "2021-04-19T11:50:24.292Z",
    "namespace": {
      "id": 11528141,
      "name": "...",

[...]

    }
  }
]

The first difference is the format of the JSON data structure, so-called pretty-printed. New lines and indents in data structure scopes help your eyes and allow you to identify the inner and outer data structures involved. This format is needed to determine which jq filters and methods you want to apply next.

About arrays and dictionaries

The set of results from an API often is returned as a list (or "array") of items. An item itself can be a single value or a JSON object. The following example mimics the response from the GitLab API and creates an array of dictionaries as a nested result set.

$ vim result.json
[
  {
    "id": 1,
    "name": "project1"
  },
  {
    "id": 2,
    "name": "project2"
  },
  {
    "id": 3,
    "name": "project-internal-dev",
    "namespace": {
      "name": "🦊"
    }
  }
]

Use cat to print the file content on stdout and pipe it into jq. The outer data structure is an array – use -c .[] to access and print all items.

$ cat result.json | jq -c '.[]'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}

How to filter data structures with jq

Filter items by passing | select (...) to jq. The filter takes a lambda callback function as a comparator condition. When the item matches the condition, it is returned to the caller.

Use the dot indexer . to access dictionary keys and their values. Try to filter for all items where the name is project2:

$ cat result.json | jq -c '.[] | select (.name == "project2")'
{"id":2,"name":"project2"}

Practice this example by selecting the id with the value 2 instead of the name.

Filter with matching a string

During tests, you may need to match different patterns instead of knowing the full name. Think of projects that match a specific path or are located in a group where you only know the prefix. Simple string matches can be achieved with the | contains (...) function. It allows you to check whether the given string is inside the target string – which requires the selected attribute to be of the string type.

For a filter with the select chain, the comparison condition needs to be changed from the equal operator == to checking the attribute .name with | contains ("dev").

$ cat result.json | jq -c '.[] | select (.name | contains ("dev") )'
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}

Simple matches can be achieved with the contains function.

Filter with matching regular expressions

For advanced string pattern matching, it is recommended to use regular expressions. jq provides the test function for this use case. Try to filter for all projects which end with a number, represented by \d+. Note that the backslash \ needs to be escaped as \\ for shell execution. ^ tests for beginning of the string, $ is the ending check.

$ cat result.json | jq -c '.[] | select (.name | test ("^project\\d+$") )'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}

Tip: You can test and build the regular expression with regex101 before test-driving it with jq.

Access nested values

Key value pairs in a dictionary may have a dictionary or array as a value. jq filters need to take this factor into account when filtering or transforming the result. The example data structure provides project-internal-dev which has the key namespace and a value of a dictionary type.

  {
    "id": 3,
    "name": "project-internal-dev",
    "namespace": {
      "name": "🦊"
    }
  }

jq allows the user to specify the array and dictionary types as [] and {} to be used in select chains with greater and less than comparisons. The [] brackets select filters for non-empty dictionaries for the namespace attribute, while the {} brackets select for all null (raw JSON) values.

$ cat result.json | jq -c '.[] | select (.namespace >={} )'
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}

$ cat result.json | jq -c '.[] | select (.namespace <={} )'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}

These methods can be used to access the name attribute of the namespace, but only if the namespace contains values. Tip: You can chain multiple jq calls by piping the result into another jq call. .name is a subkey of the primary .namespace key.

$ cat result.json | jq -c '.[] | select (.namespace >={} )' | jq -c '.namespace.name'
"🦊"

The additional select command with non-empty namespaces ensures that only initialized values for .namespace.name are returned. This is a safety check, and avoids receiving null values in the result you would need to filter again.

$ cat result.json| jq -c '.[]' | jq -c '.namespace.name'
null
null
"🦊"

By using the additional check with | select (.namespace >={} ), you only get the expected results and do not have to filter empty null values.

How to expand the GitLab endpoint response

Save the result from the API projects call and retry the examples above with jq.

$ curl "https://gitlab.com/api/v4/projects" -o result.json 2&>1 >/dev/null

Validate CI/CD YAML with jq for Git hooks

While writing this blog post, I learned that you can escape and encode YAML into JSON with jq. This trick comes in handy when automating YAML linting on the CLI, for example as a Git pre-commit hook.

Let’s take a look at the simplest way to test GitLab CI/CD from our community meetup workshops. A common mistake with the first steps of the process can be missing the two spaces indent or missing whitespace between the dash and following command. The following examples use .gitlab-ci.error.yml as a filename to showcase errors and .gitlab-ci.main.yml for working examples.

$ vim .gitlab-ci.error.yml

image: alpine:latest

test:
script:
  -exit 1

Committing the change and waiting for the CI/CD pipeline to validate at runtime can be time-consuming. The GitLab API provides a resource endpoint /ci/lint. A POST request with JSON-encoded YAML content will return a linting result faster.

Parse CI/CD YAML into JSON with jq

You can use jq to parse the raw YAML string into JSON:

$ jq --raw-input --slurp < .gitlab-ci.error.yml
"image: alpine:latest\n\ntest:\nscript:\n  -exit 1\n"

The /ci/lint API endpoint requires a JSON dictionary with content as key, and the raw YAML string as a value. You can use jq to format the input by using the arg parser:

§ jq --null-input --arg yaml "$(<.gitlab-ci.error.yml)" '.content=$yaml'
{
  "content": "image: alpine:latest\n\ntest:\nscript:\n  -exit 1"
}

Send POST request to /ci/lint

The next building block is to send a POST request to the /ci/lint. The request needs to specify the Content-Type header for the body. With using the pipe | character, the JSON-encoded YAML configuration is fed into the curl command call.

$ jq --null-input --arg yaml "$(<.gitlab-ci.error.yml)" '.content=$yaml' \
| curl "https://gitlab.com/api/v4/ci/lint?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @-
{"status":"invalid","errors":["jobs test config should implement a script: or a trigger: keyword","jobs script config should implement a script: or a trigger: keyword","jobs config should contain at least one visible job"],"warnings":[],"merged_yaml":"

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

New to GitLab and not sure where to start?

Get started guide

Learn about what GitLab can do for your team

Talk to an expert