In this guide I help demystify the often misunderstood relabel_configs and metric_relabel_configs in Prometheus to better monitor your website and APIs. We do a thorough break down of these powerful configuration blocks with practical, real world examples.
Prometheus, an open-source systems monitoring and alerting toolkit, has emerged as a popular choice for infrastructure and service monitoring. Prometheus has a clever data model for holding time series data that makes it extremely flexible and scalable. There are many powerful configuration options that, if properly wielded, can result in a very lean monitoring solution for a website that is lightweight, flexible, and relatively easy to operate.
This is not a getting started guide for Prometheus. This post covers advanced configuration after you get a basic setup running. See our guide for getting started monitoring with a minimal Prometheus configuration.
One aspect of Prometheus configuration that often puzzles users is the relabel_configs
and metric_relabel_configs
blocks. In this guide, we'll demystify these powerful and essential tools for Prometheus configuration.
Before we dive deep into relabel_configs
and metric_relabel_configs
we first need to cover a bit of the basics of the Prometheus data model.
Prometheus stores all data as time series, which is a stream of timestamped values belonging to the same metric and the same set of labeled dimensions. This robust and straightforward data model allows for powerful queries and gives Prometheus the ability to handle high cardinality data and dimensions. Let's break this down a bit more:
Targets: In Prometheus, a target is an entity that can be scraped for metrics. It really boils down to a URL to scrape. A target could be a service, an API, a server, or any other entity that exposes Prometheus metrics. Typically, in a modern distributed system, targets are discovered using some sort of service discovery mechanism. Prometheus ships with a bunch of predefined service discovery options for common platforms like Kubernetes, EC2, Digital Ocean, etc.
Metrics: A metric is identified by its name and helps describe a particular system's property. For instance, the metric http_requests_total
could be used to track the total number of HTTP requests a server has received.
Metric Labels: These are key-value pairs that provide more context and dimensionality to the metric. For example, a label method="GET"
could be attached to the http_requests_total
metric to track the number of GET requests specifically. Labels are what make Prometheus data a multi-dimensional time series data model.
Time Series: Each unique combination of a metric and its labels represents a separate time series. A time series is identified by the metric name and a set of labels (key-value pairs). For example, http_requests_total{method="GET", url="/api/books/19", handler="/api/books"}
and http_requests_total{method="POST", url,="/api/books/19" handler="/api/books"}
would be two different time series.
Targets are just an address to scrape and a set of labels. Targets can be set statically in the static_configs
block like this:
# The targets specified by the static config. targets: - myserver1:8000 # Labels assigned to all metrics scraped from the targets. labels: env: production
However in modern systems you will typically have some service discovery mechanism that finds the targets to scrape. This service discovery mechanism will likely be adding several meta labels to every target, generally prefixed with two underscores. See for example the kubernetes service discovery documentation.
In addition to any meta labels added by service discovery, the __address__
label is set to the __metrics_path__
is set to the configured path for the job (__metrics_path__
is /metrics
by default). After relabeling, the instance
label is set to the value of __address__
if it was not set during relabeling. Any target labels that start with two underscores get dropped after the relabel step. Any remaining labels are added to every metric scraped from the target.
All this information present in each target can be leveraged via relabel_configs
to achieve your desired configuration.
Metrics are scraped as plaintext lines from every target in the format:
metric_name{label_name=label_value,...} metric_value [timestamp]
metric_name
always exists, there can be 0 or more label_name/label_value pairs, metric_value
is always a number, and there can be an optional timestamp.
For example, if you are using Kubernetes service discovery to scrape metrics about a node, each node is a single target
.
When you scrape the target endpoint you will see many metrics
that look like this:
… container_memory_working_set_bytes{container="heii-redis",id="/kubepods/burstable/…",image="...",name="...",namespace="heii",pod="heii-redis-statefulset-0"} 1.1132928e+07 1684525306058 container_memory_working_set_bytes{container="heii-web",id="/kubepods/burstable/…",image="...",name="66f82d9c5c62422c…",namespace="heii",pod="heii-5cb694bb8c-59t6c"} 3.76041472e+08 1684525296583 container_memory_working_set_bytes{container="heii-worker",id="/kubepods/burstable/…",image="...",name="9751d47cd41ae7c…",namespace="heii",pod="heii-worker-7b9f9977d7-kctpp"} 2.6681344e+08 1684525295414 …
In the above snippet, we are looking at one metric, with three separate time series. Notice how the metric name is the same for all three lines container_memory_working_set_bytes
. The time series are unique because they have a different set of label_name/label_value combinations. A useful way to think about label_name/label_value combinations is to think of dimensions across which you might want to filter or aggregate the metric later. If you want the total bytes being consumed you can sum all the time series, or you can view them separately by the container
label.
Understanding how time series work in Prometheus is critical to administrators because the resources consumed by Prometheus are proportional to the amount of time series it is storing. If you are proficient with relabel_configs
you can dramatically cut down on the amount of time series stored, and be able to pull and display metrics according to any business need that arises.
Understanding this data model is critical as it forms the basis for how Prometheus collects, stores, and allows you to query your metrics.
If you have static targets, with complete control of the metric exporters on each target, you can usually get away with simply importing every target manually. However, when you are working in an existing platform like Kubernetes you are likely using service discovery. Once Prometheus starts discovering and scraping metrics automatically the default ingest-all configuration can quickly get out of hand, ingesting and storing hundreds (or even thousands) of time series you don't really need. relabel_configs
is an array of configuration settings applied to every target discovered by Prometheus before the target's metrics are scraped.
The relabel_configs
and metric_relabel_configs
blocks is where most of the magic happens in tailoring Prometheus to your monitoring needs. It offers the flexibility to filter, modify, or otherwise manipulate targets and labels before they are scraped.
A typical relabel_configs
block may look like this:
relabel_configs: - source_labels: [label_name,...] separator: ; regex: (.+) target_label: label_name replacement: $1 action: replace
Let's break down what each of the keys in this block means:
source_labels: These are the input labels to the transformation. The values of these labels are concatenated using the separator and matched against the regex
.
separator: A string to join multiple source label values together. Default value is ;
.
regex: The regular expression against which the source_labels
are matched. Default value is (.*)
, which matches everything.
target_label: The label to be updated or created based on the replacement
.
replacement: The value to use to replace the value of the target_label
. $1
refers to the first matching group from the regex
. Default value is $1
.
action: The operation to be performed. There are a few actions. Common ones are replace
(default), keep
, drop
, and labelmap
. See the documentation for more options.
Not all of the fields are required, and most of them have a default setting. Often when you see examples in guides you will see the fields that are not changed from a default value simply left out. This can be very confusing to beginners, especially when things like regex
and separator
are not included, and not explained.
The two sections relabel_configs
and metric_relabel_configs
have the same syntax, so they are often explained at the same time. However they cover two different things.
relabel_configs
lets you manipulate targets
. So you can drop specific targets and rearrange the targets labels before they are ingested into Prometheus.
metric_relabel_configs
lets you manipulate individual metrics and can let you drop, or metrics and metric labels before they are saved.
Now let's go through some common examples, starting with relabel_configs
:
Suppose you want to scrape only targets that have a specific label. Let's say we want to scrape only targets that have the label env=production
. Here's how you can configure it:
relabel_configs: - source_labels: [env] action: keep regex: production
Let's take the same block but fill in the default values
relabel_configs: - source_labels: [env] action: keep separator: ; regex: production target_label: label_name replacement: $1
In this example, the keep
action is used. It only keeps targets for which the input matches the regex
. This means that you could, for example, in your Kubernetes configuration make sure that every target you want to scrape is tagged with the production label, and then you can safely ignore other environments.
Let's say you want to change the endpoint URL path being scraped based on a label on the target. This is very common for service discovery, where you don't necessarily want to scrape the default /metrics
endpoint. Here's how you can do it:
relabel_configs: - source_labels: [path] target_label: __metrics_path__ replacement: /$1/metrics
Here is the same block with the defaults filled in:
relabel_configs: - source_labels: [path] action: replace regex: (.+) target_label: __metrics_path__ replacement: /$1/metrics
Here we are grabbing the value of the path
label on the target, matching the whole thing with (.+)
, and substituting the regex match into /$1/metrics
. So if our target had a label path=/mypath/scrape
, the actual path Prometheus would scrape would be /mypath/scrape/metrics
. The power of this configuration is that we can set labels at "runtime" on our infrastructure resources, have service discovery read those as labels, and then use those settings to manipulate the scrape behavior.
Sometimes you want to standardize naming of certain parts of your infrastructure in Prometheus, so it is often necessary to change a label name. Suppose you have a label instance
and you want to rename it to node
(very common in Kubernetes). Here's how you can do it:
relabel_configs: - source_labels: [instance] target_label: node
Again example but with relevant defaults filled in
relabel_configs: - source_labels: [instance] action: replace regex: (.+) target_label: node replacement: $1
This block takes the value of the label named instance, matches the whole thing with regex (.+), and replaces the label node
with the regex match on replacement
, which in this case is just the entire matched value $1.
While relabel_configs
can help us manipulate targets, metric_relabel_configs
help us manipulate individual metrics.
Note: Prometheus creates the label __name__
for every metric with the name of the metric as the value.
Suppose you want to drop all metrics except for http_requests_total
. Here's how you can do it:
metric_relabel_configs: - source_labels: [__name__] action: keep regex: http_requests_total
The unlisted defaults are not relevant in this case. Here, we use the keep
action again to retain only metrics that match the regex
.
Note that relabel configs are applied from top to bottom and only the resulting set of labels from each block are passed to the next block. This means that if you keep one single label, all your subsequent config steps will only see that single label.
If you want to scrape only a specific set of metrics, you can list them in the regex
separated by a |
. Here's an example for http_requests_total
, http_request_duration_seconds
, and http_request_size_bytes
:
metric_relabel_configs: - source_labels: [__name__] action: keep regex: (http_requests_total|http_request_duration_seconds|http_request_size_bytes)
Notice that we put all three metrics into a single keep block. This is required because as discussed before, the manipulated set of metrics gets passed to subsequent blocks. This can cause the keep step to get unwieldy if you are keeping lots of metrics. If you are keeping most metrics in a target scrape consider using the drop
action to drop specific metrics.
Sometimes you are only interested in a subset of a certain metric. For example, you may only be interested in scraping http_request_duration_seconds
for the /api/books/
URL because you know it is a mission critical URL for your users:
metric_relabel_configs: - source_labels: [__name__,url] action: keep regex: (http_request_duration_seconds;/api/books/)
Here is the same block with relevant defaults
metric_relabel_configs: - source_labels: [__name__,url] action: keep regex: (http_request_duration_seconds;/api/books/) regex: (.+) separator: ;
This is our first example with more than one item in the source_labels
array. Remember, source labels concatenate the value of every matching label with a separator between them. This means that the regex we want is http_request_duration_seconds;/api/books/
to keep only our desired label.
Another very powerful (but potentially dangerous) manipulation is adding or dropping labels. This can be extremely powerful when you are absolutely sure a metric can remain unique after dropping the given label. For example in our cadvisor metric from earlier
container_memory_working_set_bytes{container="heii-worker",id="/kubepods/burstable/…",image="...",name="9751d47cd41ae7c…",namespace="heii",pod="heii-worker-7b9f9977d7-kctpp"} 2.6681344e+08 1684525295414
In this case the image
label is completely redundant and unnecessary. The metrics are already unique based on the id
given by cadvisor, and we don't care what image id created a particular metric for our alerts. So it's safe to drop it with
metric_relabel_configs: - regex: image action: labeldrop
And to drop multiple labels, you can separate them by |
:
metric_relabel_configs: - regex: (id|namespace) action: labeldrop
Both relabel_configs
and metric_relabel_configs
provide powerful ways to control the behavior of Prometheus. Understanding these configurations allows you to fine-tune your monitoring setup according to your specific needs, and lets you set yourself up for success as your needs grow. They are just a small set of the configuration options so make sure you read through the official documentation carefully.
For getting started with Prometheus, see our guide about website monitoring using a minimal Prometheus configuration.
Happy Monitoring!