Prometheus Operator – Interactions Between the kube-prometheus-stack Kubernetes Resources

The aim of Prometheus Operator is to provide Kubernetes native deployment and management of Prometheus and related monitoring components (official Github repo). The kube-prometheus-stack helm chart (formerly named prometheus-operator) contains just one helm command to set everything up. After going through the guide and deploying it on minikube, I learnt that a lot of heavy lifting goes on behind the scenes to achieve its level of effortlessness.

The Prometheus Operator’s “Getting Started” guide provides a high level abstraction of its architecture and internal components for readers to quickly understand how things work. However, it leaves out specific details of how the various Kubernetes workload resources and Prometheus monitoring components work together to stand up a Prometheus monitoring stack. In this post, I’ll take a deeper look at what happens under the hood when the kube-prometheus-stack helm chart is installed in a cluster.

Setup

Here is my sandbox configuration:

  • Host OS: Ubuntu Server 20.04
  • Docker Server/Client: Docker CE 20.10.5
  • kubectl: v1.20.4
  • minikube: v1.18.1
  • kube-prometheus-stack helm chart version: kube-prometheus-stack-14.0.1 (github repo tag)

Prerequisites

  • You are familiar with Prometheus and its components (e.g. AlertManager, Exporters)
  • You have read the Prometheus Operator’s Getting Started guide

Step 1: Understanding the Kubernetes Workload Resources Deployed

After doing a vanilla install of the kube-prometheus-stack helm chart, you can view the workload resources deployed as follows:

$ helm install myprometheus prometheus-community/kube-prometheus-stack
...

$ kubectl get all
NAME                                                         READY   STATUS    RESTARTS   AGE
pod/alertmanager-myprometheus-kube-promethe-alertmanager-0   2/2     Running   0          94m
pod/myprometheus-grafana-65c7f7fdbc-vwwzj                    2/2     Running   0          95m
pod/myprometheus-kube-promethe-operator-74586c46f-5nztj      1/1     Running   0          95m
pod/myprometheus-kube-state-metrics-6c44b87757-r6v6q         1/1     Running   0          95m
pod/myprometheus-prometheus-node-exporter-s6qw7              1/1     Running   0          95m
pod/prometheus-myprometheus-kube-promethe-prometheus-0       2/2     Running   1          94m

NAME                                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                     ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   94m
service/kubernetes                                ClusterIP   10.96.0.1        <none>        443/TCP                      13h
service/myprometheus-grafana                      ClusterIP   10.108.166.249   <none>        80/TCP                       95m
service/myprometheus-kube-promethe-alertmanager   ClusterIP   10.107.235.48    <none>        9093/TCP                     95m
service/myprometheus-kube-promethe-operator       ClusterIP   10.97.26.166     <none>        443/TCP                      95m
service/myprometheus-kube-promethe-prometheus     ClusterIP   10.97.242.84     <none>        9090/TCP                     95m
service/myprometheus-kube-state-metrics           ClusterIP   10.98.143.217    <none>        8080/TCP                     95m
service/myprometheus-prometheus-node-exporter     ClusterIP   10.101.198.247   <none>        9100/TCP                     95m
service/prometheus-operated                       ClusterIP   None             <none>        9090/TCP                     94m

NAME                                                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/myprometheus-prometheus-node-exporter   1         1         1       1            1           <none>          95m

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/myprometheus-grafana                  1/1     1            1           95m
deployment.apps/myprometheus-kube-promethe-operator   1/1     1            1           95m
deployment.apps/myprometheus-kube-state-metrics       1/1     1            1           95m

NAME                                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/myprometheus-grafana-65c7f7fdbc                 1         1         1       95m
replicaset.apps/myprometheus-kube-promethe-operator-74586c46f   1         1         1       95m
replicaset.apps/myprometheus-kube-state-metrics-6c44b87757      1         1         1       95m

NAME                                                                    READY   AGE
statefulset.apps/alertmanager-myprometheus-kube-promethe-alertmanager   1/1     94m
statefulset.apps/prometheus-myprometheus-kube-promethe-prometheus       1/1     94m

From the output, we can see that there are several components installed:

  • AlertManager – handles alerts sent by Prometheus server
  • Grafana – visualization dashboard for data in Prometheus
  • Prometheus Operator – custom logic to deploy and monitor a Prometheus server/cluster
  • kube-state-metrics – a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects for Prometheus to scrape
  • prometheus-node-exporter – Prometheus exporter for hardware and OS metrics of each EKS worker node

Step 2: Discovering All Kubernetes Resources Deployed by kube-prometheus-stack Helm Chart

In addition to the workload resources, there are a lot more resources that are created – ConfigMaps, ServiceMonitors, Roles, ServiceAccounts and RoleBindings just to name a few. To see the full list (caution: approximately 40K lines), use the following command:

$ helm template prometheus-community/kube-prometheus-stack

Step 3: The Interactions Between the kube-prometheus-stack Resources

Based on the workload resources in Step 1, we get a glimpse of the various software components installed. Coupled with the Prometheus Operator’s Getting Started guide, we can gain a surface understanding of how things work. However, there are some missing pieces to the puzzle – for example, why are the StatefulSets created (but not contained in the kube-prometheus-stack helm chart), and who uses the Custom Resource Definitions (CRDs) that were created?

By taking a closer look at the outputs of the previous two steps, I came up with the following interaction diagram between the key resources of the kube-prometheus-stack chart:

Interaction diagram between the key resources of the kube-prometheus-stack chart
(Click to enlarge)

Note: There are other resources (like ConfigMaps and Secrets) that are created but not shown in the above diagram.

I believe the Prometheus Operator’s code to create the StatefulSets based on the AlertManager and Prometheus CRDs can be found here and here respectively.

Step 4: Making Changes to Prometheus and AlertManager

To make changes to the Prometheus and AlertManager components, we would simply edit the respective CRDs. The Prometheus Operator will pick up the new changes and modify its workload resource (in blue) spec accordingly.

Step 5: Add Monitoring For Your Applications

Based on the image above, we now know that Prometheus uses the ServiceMonitors for discovering and configuring the endpoints to scrape. If you want Prometheus to include your application in its monitoring scope, you would need to:

  1. Deploy your application in the Kubernetes cluster
  2. If necessary, deploy a Prometheus Exporter (or create one if necessary)
  3. Create a Service that sits in front of the metrics endpoint (be it from #1 or #2)
  4. Create a ServiceMonitor that points at the Service from the previous step
  5. Prometheus will automatically pick up the new endpoint and begin scraping it

As every application and Prometheus stack is different, I will not be able to cover all combinations in this post. However, you can find more information from the following links:

Summary

Hope this post helps to give a deeper insight into the workings behind the “magical” setup of the kube-prometheus-stack chart.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s