Prometheus and Grafana Integration with Heii On-Call

In this step-by-step guide I cover how to integrate Heii On-Call with a Prometheus Alertmanager instance. Integration takes a matter of minutes, and you can start leveraging Prometheus's extensive set of features in your website monitoring and on-call rotations.

Author
Humberto Evans

Prometheus is a popular open source monitoring solution that lets you aggregate metrics exposed by your infrastructure and applications. If you are willing to put in the work to configure and maintain it, it can be a great alternative to expensive Application Performance Monitoring solutions like Datadog and Better Uptime. At Heii On-Call we are big fans of keeping our operations as lean as possible, so we are big fans of Prometheus and Grafana for monitoring. If you are new to Prometheus I suggest you follow our guide to a minimal Prometheus setup.

Heii On-Call natively integrates with Prometheus Alertmanager through their provided webhooks. Once set up, you can route Prometheus alerts to a Heii On-Call trigger, and Prometheus will even automatically resolve a triggering alert if the alerting condition goes back below the alerting threshold.

Set up Heii On-Call

The first step is to set up the Heii On-Call trigger that will receive a webhook from your Prometheus Alertmanager instance. Create a new trigger and choose Prometheus from the mechanism dropdown.

Prometheus Heii On-Call Trigger

We also need to create an API key so that Prometheus can authenticate with Heii On-Call. Click on API Keys from an Organization's home page, and create a new API key for Prometheus.

Prometheus Heii On-Call Key Created

Note the API key and the Trigger ID. We will use both of these in the Prometheus configuration coming up.

Create a new receiver in Prometheus Alertmanager

Now head over to where you keep your Prometheus configuration. If you are using Kubernetes this is likely a ConfigMap definition, or a file on the Prometheus Alertmanager instance if you are running it directly. Somewhere in the configuration file you should have a receivers: key that consists of a list of receivers that are available to be routed to. Add a new receiver that looks like this:

receivers:
  - name: heii-on-call
    webhook_configs:
      - url: "https://api.heiioncall.com/triggers/YOUR-TRIGGER-ID-HERE/prometheus"
        http_config:
          follow_redirects: false
          authorization:
            credentials: "YOUR-HEII-ON-CALL-API-KEY-HERE" 

The block follows the specification for a webhook_config in Alertmanager. See their documentation for further customization. If your configuration lives in version control, we recommend using credentials_file instead of credentials to store the Heii On-Call API key.

Route an alert to your new receiver

Now we need to set up a route: to send alerts to the receiver we just created. Prometheus uses a "tree" of routes where the first route is the root node, all alerts go through the root node and travel down the tree of routes, and will be delivered to any receiver that matches the matchers directive. Like many configuration options in Prometheus this is often overkill for small to medium sized teams. In the example below we set the root route to our heii-on-call receiver, so that every alert gets sent to the heii-on-call trigger.

route:
  receiver: heii-on-call
  group_wait: 10s
  repeat_interval: 30m
  routes: []

If you had multiple Heii-On Call receivers you could set up additional leaf routes, one for each Heii On-Call trigger, and then you would be able to match a label on the alert to route to the right team.

Done

That's it! With this simple configuration alerts will be routed to the individual currently on call in the Service you set up in Heii On-Call. If you have a more complicated use case that our current integration does not cover, don't hesitate to reach out to us at heii@heiioncall.com. We will be happy to help.

Remember that monitoring through metrics is only part of a comprehensive uptime and monitoring plan. In addition to gathering metrics from your infrastructure you should also be monitoring uptime from an external source.

Happy Monitoring!