The Art of Metric Relabeling in Prometheus

In this guide I help demystify the often misunderstood relabel_configs and metric_relabel_configs in Prometheus to better monitor your website and APIs. We do a thorough break down of these powerful configuration blocks with practical, real world examples.

Author

Humberto Evans

Prometheus, an open-source systems monitoring and alerting toolkit, has emerged as a popular choice for infrastructure and service monitoring. Prometheus has a clever data model for holding time series data that makes it extremely flexible and scalable. There are many powerful configuration options that, if properly wielded, can result in a very lean monitoring solution for a website that is lightweight, flexible, and relatively easy to operate.

This is not a getting started guide for Prometheus. This post covers advanced configuration after you get a basic setup running. See our guide for getting started monitoring with a minimal Prometheus configuration.

One aspect of Prometheus configuration that often puzzles users is the relabel_configs and metric_relabel_configs blocks. In this guide, we'll demystify these powerful and essential tools for Prometheus configuration.

Prometheus basics

Before we dive deep into relabel_configs and metric_relabel_configs we first need to cover a bit of the basics of the Prometheus data model.

Prometheus stores all data as time series, which is a stream of timestamped values belonging to the same metric and the same set of labeled dimensions. This robust and straightforward data model allows for powerful queries and gives Prometheus the ability to handle high cardinality data and dimensions. Let's break this down a bit more:

Targets: In Prometheus, a target is an entity that can be scraped for metrics. It really boils down to a URL to scrape. A target could be a service, an API, a server, or any other entity that exposes Prometheus metrics. Typically, in a modern distributed system, targets are discovered using some sort of service discovery mechanism. Prometheus ships with a bunch of predefined service discovery options for common platforms like Kubernetes, EC2, Digital Ocean, etc.
Metrics: A metric is identified by its name and helps describe a particular system's property. For instance, the metric http_requests_total could be used to track the total number of HTTP requests a server has received.
Metric Labels: These are key-value pairs that provide more context and dimensionality to the metric. For example, a label method="GET" could be attached to the http_requests_total metric to track the number of GET requests specifically. Labels are what make Prometheus data a multi-dimensional time series data model.
Time Series: Each unique combination of a metric and its labels represents a separate time series. A time series is identified by the metric name and a set of labels (key-value pairs). For example, http_requests_total{method="GET", url="/api/books/19", handler="/api/books"} and http_requests_total{method="POST", url,="/api/books/19" handler="/api/books"} would be two different time series.

Targets

Targets are just an address to scrape and a set of labels. Targets can be set statically in the static_configs block like this:

# The targets specified by the static config.
targets:
  - myserver1:8000

# Labels assigned to all metrics scraped from the targets.
labels:
  env: production

However in modern systems you will typically have some service discovery mechanism that finds the targets to scrape. This service discovery mechanism will likely be adding several meta labels to every target, generally prefixed with two underscores. See for example the kubernetes service discovery documentation.

In addition to any meta labels added by service discovery, the __address__ label is set to the : address of the target, and __metrics_path__ is set to the configured path for the job (__metrics_path__ is /metrics by default). After relabeling, the instance label is set to the value of __address__ if it was not set during relabeling. Any target labels that start with two underscores get dropped after the relabel step. Any remaining labels are added to every metric scraped from the target.

All this information present in each target can be leveraged via relabel_configs to achieve your desired configuration.

Metrics

Metrics are scraped as plaintext lines from every target in the format:

metric_name{label_name=label_value,...} metric_value [timestamp]

metric_name always exists, there can be 0 or more label_name/label_value pairs, metric_value is always a number, and there can be an optional timestamp.

For example, if you are using Kubernetes service discovery to scrape metrics about a node, each node is a single target.

When you scrape the target endpoint you will see many metrics that look like this:

…
container_memory_working_set_bytes{container="heii-redis",id="/kubepods/burstable/…",image="...",name="...",namespace="heii",pod="heii-redis-statefulset-0"} 1.1132928e+07 1684525306058
container_memory_working_set_bytes{container="heii-web",id="/kubepods/burstable/…",image="...",name="66f82d9c5c62422c…",namespace="heii",pod="heii-5cb694bb8c-59t6c"} 3.76041472e+08 1684525296583
container_memory_working_set_bytes{container="heii-worker",id="/kubepods/burstable/…",image="...",name="9751d47cd41ae7c…",namespace="heii",pod="heii-worker-7b9f9977d7-kctpp"} 2.6681344e+08 1684525295414
…

In the above snippet, we are looking at one metric, with three separate time series. Notice how the metric name is the same for all three lines container_memory_working_set_bytes. The time series are unique because they have a different set of label_name/label_value combinations. A useful way to think about label_name/label_value combinations is to think of dimensions across which you might want to filter or aggregate the metric later. If you want the total bytes being consumed you can sum all the time series, or you can view them separately by the container label.

Understanding how time series work in Prometheus is critical to administrators because the resources consumed by Prometheus are proportional to the amount of time series it is storing. If you are proficient with relabel_configs you can dramatically cut down on the amount of time series stored, and be able to pull and display metrics according to any business need that arises.

Understanding this data model is critical as it forms the basis for how Prometheus collects, stores, and allows you to query your metrics.

What are relabel_configs and metric_relabel_configs?

If you have static targets, with complete control of the metric exporters on each target, you can usually get away with simply importing every target manually. However, when you are working in an existing platform like Kubernetes you are likely using service discovery. Once Prometheus starts discovering and scraping metrics automatically the default ingest-all configuration can quickly get out of hand, ingesting and storing hundreds (or even thousands) of time series you don't really need. relabel_configs is an array of configuration settings applied to every target discovered by Prometheus before the target's metrics are scraped.

The relabel_configs and metric_relabel_configs blocks is where most of the magic happens in tailoring Prometheus to your monitoring needs. It offers the flexibility to filter, modify, or otherwise manipulate targets and labels before they are scraped.

A typical relabel_configs block may look like this:

relabel_configs:
  - source_labels: [label_name,...]
    separator: ;
    regex: (.+)
    target_label: label_name
    replacement: $1
    action: replace

Let's break down what each of the keys in this block means:

source_labels: These are the input labels to the transformation. The values of these labels are concatenated using the separator and matched against the regex.
separator: A string to join multiple source label values together. Default value is ;.
regex: The regular expression against which the source_labels are matched. Default value is (.*), which matches everything.
target_label: The label to be updated or created based on the replacement.
replacement: The value to use to replace the value of the target_label. $1 refers to the first matching group from the regex. Default value is $1.
action: The operation to be performed. There are a few actions. Common ones are replace (default), keep, drop, and labelmap. See the documentation for more options.

Not all of the fields are required, and most of them have a default setting. Often when you see examples in guides you will see the fields that are not changed from a default value simply left out. This can be very confusing to beginners, especially when things like regex and separator are not included, and not explained.

The two sections relabel_configs and metric_relabel_configs have the same syntax, so they are often explained at the same time. However they cover two different things.

relabel_configs lets you manipulate targets. So you can drop specific targets and rearrange the targets labels before they are ingested into Prometheus.

metric_relabel_configs lets you manipulate individual metrics and can let you drop, or metrics and metric labels before they are saved.

Examples with relabel_configs

Now let's go through some common examples, starting with relabel_configs:

Scrape Targets Based on Specific Labels

Suppose you want to scrape only targets that have a specific label. Let's say we want to scrape only targets that have the label env=production. Here's how you can configure it:

relabel_configs:
  - source_labels: [env]
    action: keep
    regex: production

Let's take the same block but fill in the default values

relabel_configs:
  - source_labels: [env]
    action: keep
    separator: ;
    regex: production
    target_label: label_name
    replacement: $1

In this example, the keep action is used. It only keeps targets for which the input matches the regex. This means that you could, for example, in your Kubernetes configuration make sure that every target you want to scrape is tagged with the production label, and then you can safely ignore other environments.

Change the Endpoint Scraped Based on Target Labels

Let's say you want to change the endpoint URL path being scraped based on a label on the target. This is very common for service discovery, where you don't necessarily want to scrape the default /metrics endpoint. Here's how you can do it:

relabel_configs:
  - source_labels: [path]
    target_label: __metrics_path__
    replacement: /$1/metrics

Here is the same block with the defaults filled in:

relabel_configs:
  - source_labels: [path]
    action: replace
    regex: (.+)
    target_label: __metrics_path__
    replacement: /$1/metrics

Here we are grabbing the value of the path label on the target, matching the whole thing with (.+), and substituting the regex match into /$1/metrics. So if our target had a label path=/mypath/scrape, the actual path Prometheus would scrape would be /mypath/scrape/metrics. The power of this configuration is that we can set labels at "runtime" on our infrastructure resources, have service discovery read those as labels, and then use those settings to manipulate the scrape behavior.

Renaming a Label

Sometimes you want to standardize naming of certain parts of your infrastructure in Prometheus, so it is often necessary to change a label name. Suppose you have a label instance and you want to rename it to node (very common in Kubernetes). Here's how you can do it:

relabel_configs:
  - source_labels: [instance]
    target_label: node

Again example but with relevant defaults filled in

relabel_configs:
  - source_labels: [instance]
    action: replace
    regex: (.+)
    target_label: node
    replacement: $1

This block takes the value of the label named instance, matches the whole thing with regex (.+), and replaces the label node with the regex match on replacement, which in this case is just the entire matched value $1.

Examples with metric_relabel_configs

While relabel_configs can help us manipulate targets, metric_relabel_configs help us manipulate individual metrics.

Note: Prometheus creates the label __name__ for every metric with the name of the metric as the value.

Retain Only One Specific Metric

Suppose you want to drop all metrics except for http_requests_total. Here's how you can do it:

metric_relabel_configs:
  - source_labels: [__name__]
    action: keep
    regex: http_requests_total

The unlisted defaults are not relevant in this case. Here, we use the keep action again to retain only metrics that match the regex.

Note that relabel configs are applied from top to bottom and only the resulting set of labels from each block are passed to the next block. This means that if you keep one single label, all your subsequent config steps will only see that single label.

Scrape Only 3 Specific Metrics

If you want to scrape only a specific set of metrics, you can list them in the regex separated by a |. Here's an example for http_requests_total, http_request_duration_seconds, and http_request_size_bytes:

metric_relabel_configs:
  - source_labels: [__name__]
    action: keep
    regex: (http_requests_total|http_request_duration_seconds|http_request_size_bytes)

Notice that we put all three metrics into a single keep block. This is required because as discussed before, the manipulated set of metrics gets passed to subsequent blocks. This can cause the keep step to get unwieldy if you are keeping lots of metrics. If you are keeping most metrics in a target scrape consider using the drop action to drop specific metrics.

Scrape a Metric For Specific Labels Only

Sometimes you are only interested in a subset of a certain metric. For example, you may only be interested in scraping http_request_duration_seconds for the /api/books/ URL because you know it is a mission critical URL for your users:

metric_relabel_configs:
  - source_labels: [__name__,url]
    action: keep
    regex: (http_request_duration_seconds;/api/books/)

Here is the same block with relevant defaults

metric_relabel_configs:
  - source_labels: [__name__,url]
    action: keep
    regex: (http_request_duration_seconds;/api/books/)
    regex: (.+)
    separator: ;

This is our first example with more than one item in the source_labels array. Remember, source labels concatenate the value of every matching label with a separator between them. This means that the regex we want is http_request_duration_seconds;/api/books/ to keep only our desired label.

Drop One or More Labels

Another very powerful (but potentially dangerous) manipulation is adding or dropping labels. This can be extremely powerful when you are absolutely sure a metric can remain unique after dropping the given label. For example in our cadvisor metric from earlier

container_memory_working_set_bytes{container="heii-worker",id="/kubepods/burstable/…",image="...",name="9751d47cd41ae7c…",namespace="heii",pod="heii-worker-7b9f9977d7-kctpp"} 2.6681344e+08 1684525295414

In this case the image label is completely redundant and unnecessary. The metrics are already unique based on the id given by cadvisor, and we don't care what image id created a particular metric for our alerts. So it's safe to drop it with

metric_relabel_configs:
  - regex: image
    action: labeldrop

And to drop multiple labels, you can separate them by |:

metric_relabel_configs:
  - regex: (id|namespace)
    action: labeldrop

Conclusion

Both relabel_configs and metric_relabel_configs provide powerful ways to control the behavior of Prometheus. Understanding these configurations allows you to fine-tune your monitoring setup according to your specific needs, and lets you set yourself up for success as your needs grow. They are just a small set of the configuration options so make sure you read through the official documentation carefully.

For getting started with Prometheus, see our guide about website monitoring using a minimal Prometheus configuration.

Happy Monitoring!