Note: If you have missed my previous articles on Docker and Kubernetes, you can find them here:
Application deployment models evolution.
Getting started with Docker.Docker file and images.
Publishing images to Docker Hub and re-using them.
Docker- Find out what's going on.
Docker Networking- Part 1.
Docker Networking- Part 2.
Docker Swarm-Multi-Host container Cluster.Docker Networking- Part 3 (Overlay Driver).
Introduction to Kubernetes.Kubernetes- Diving in (Part 1)-Installing Kubernetes multi-node cluster.
Kubernetes-Diving in (Part2)- Services.
Kubernetes- Infrastructure As Code with Yaml (part 1).
Kubernetes- Infrastructure As Code Part 2- Creating PODs with YAML.
Kubernetes Infrastructure-as-Code part 3- Replicasets with YAML.
Kubernetes Infrastructure-as-Code part 4 - Deployments and Services with YAML.
Deploying a microservices APP with Kubernetes.
Kubernetes- Time based scaling of deployments with python client.
Kubernetes Networking - The Flannel network explained.
Kubernetes- Installing and using kubectl top for monitoring nodes and PoDs
Kubernetes Administration- Scheduling
Kubernetes Administration- Storage
Kubernetes Administration- Users
Kubernetes Administration - Network Policies with Calico network plugin
Kubernetes Administration - Managing Kubernetes Clusters with Rancher
Kubernetes Administration - Package Management with Helm
Monitoring, health check, and reporting are required to maintain service availability in any infrastructure. They have become more important in the microservices era driven by the popularity of containers and Kubernetes. Container-based infrastructures change on-the-fly (think about replicas crashing and another coming up automatically) and traditional monitoring methods do not work well. Add to this, the decentralized debugging methodology involved in CI/CD i.e developers need log access to production deployments for faster bug fixes and merges. Prometheus has become the de facto monitoring tool for Kubernetes which solves these issues.
Prometheus Overview
From Prometheus Docs
"
a multi-dimensional data model with time series data identified by metric name and key/value pairs
PromQL, a flexible query language to leverage this dimensionality
no reliance on distributed storage; single server nodes are autonomous
time series collection happens via a pull model over HTTP
pushing time series is supported via an intermediary gateway
targets are discovered via service discovery or static configuration
multiple modes of graphing and dashboarding support
"
The above means, Prometheus data model is based on key-value pairs which is similar to how Kubernetes manages its infra (using labels). Further data can be queried with PromQl (or custom python code) and/or exported to various other tools like Grafana, Elasticsearch, etc. Here is the basic architecture of Prometheus:
Image Src: https://prometheus.io/
Installing Prometheus
Prometheus can be installed with various methods- the most simple way is to use helm. If you haven't installed Helm or want a quick intro, refer my post on Helm.
# Add Prometheus repo to helm
root@sathish-vm2:/home/sathish# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories
root@sathish-vm2:/home/sathish# helm repo add stable https://charts.helm.sh/stable
"stable" has been added to your repositories
root@sathish-vm2:/home/sathish# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈Happy Helming!⎈
#Install the chart
helm install --generate-name prometheus-community/prometheus
NAME: prometheus-1615095264
LAST DEPLOYED: Sun Mar 7 05:34:27 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-1615095264-server.default.svc.cluster.local
Get the Prometheus server URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9090
The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-1615095264-alertmanager.default.svc.cluster.local
Get the Alertmanager URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9093
#################################################################################
###### WARNING: Pod Security Policy has been moved to a global property. #####
###### use .Values.podSecurityPolicy.enabled with pod-based #####
###### annotations #####
###### (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #####
#################################################################################
The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
prometheus-1615095264-pushgateway.default.svc.cluster.local
Get the PushGateway URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9091
For more information on running Prometheus, visit:
https://prometheus.io/
Things wouldn't be exciting if this worked out of the box :)
root@sathish-vm2:/home/sathish/metrics-server/manifests/release# kubectl get pods
NAME READY STATUS RESTARTS AGE
prometheus-1615095264-alertmanager-865587bb4d-fwf9f 0/2 Pending 0 11m
prometheus-1615095264-kube-state-metrics-7b5b599dd9-mk9zt 1/1 Running 0 11m
prometheus-1615095264-node-exporter-c9wgj 1/1 Running 0 11m
prometheus-1615095264-node-exporter-jxglg 1/1 Running 0 11m
prometheus-1615095264-pushgateway-d4f7d7df4-9kwh5 1/1 Running 0 11m
prometheus-1615095264-server-6f58f9dff5-jqskr 0/2 Pending 0 6m15s
As we can see, server and alert manager pods are in the Pending state. Let's Find out why
root@sathish-vm2:~# kubectl get pod -l app=prometheus,component=server -o jsonpath='{.items[].status.conditions[].message}'
0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims.root@sathish-vm2:~#
root@sathish-vm2:~# kubectl get pod -l app=prometheus,component=alertmanager -o jsonpath='{.items[].status.conditions[].message}'
0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims
Both PoD's use PersistantVolumeClaimes and hence need PersistantVolumes.
root@sathish-vm2:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-1615095264-alertmanager Pending 17m
prometheus-1615095264-server Pending 17m
stable-prometheus-alertmanager Pending 20m
stable-prometheus-server Pending 20m
root@sathish-vm2:~# kubectl describe pvc prometheus-1615095264-alertmanage
Name: prometheus-1615095264-alertmanager
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app=prometheus
app.kubernetes.io/managed-by=Helm
chart=prometheus-13.4.0
component=alertmanager
heritage=Helm
release=prometheus-1615095264
Annotations: meta.helm.sh/release-name: prometheus-1615095264
meta.helm.sh/release-namespace: default
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Mounted By: prometheus-1615095264-alertmanager-865587bb4d-fwf9f
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 2m20s (x63 over 17m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
Let's create host directories and PersistentVolumes
# Create Directries ( -p stands for parents, which means create parent dirs if they dont exist)
root@sathish-vm2:~# mkdir -p /mnt/kubernetes/pv1
root@sathish-vm2:~# mkdir -p /mnt/kubernetes/pv2
# Create PV's
root@sathish-vm2:~# kubectl create -f - <<EOF
> kind: PersistentVolume
> apiVersion: v1
> metadata:
> name: pv1
> spec:
> storageClassName:
> capacity:
> storage: 2Gi
> accessModes:
> - ReadWriteOnce
> hostPath:
> path: "/mnt/kubernetes/pv1"
> ---
> kind: PersistentVolume
> apiVersion: v1
> metadata:
> name: pv2
> spec:
> storageClassName:
> capacity:
> storage: 8Gi
> accessModes:
> - ReadWriteOnce
> hostPath:
> path: "/mnt/kubernetes/pv2"
> EOF
persistentvolume/pv1 created
persistentvolume/pv2 created
root@sathish-vm2:~# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv1 2Gi RWO Retain Bound default/prometheus-1615095264-alertmanager 9m23s
pv2 8Gi RWO Retain Bound default/prometheus-1615095264-server 9m23s
root@sathish-vm2:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-1615095264-alertmanager Bound pv1 2Gi RWO 39m
prometheus-1615095264-server Bound pv2 8Gi RWO 39m
Let's check out pods
root@sathish-vm2:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
prometheus-1615095264-alertmanager-865587bb4d-fwf9f 2/2 Running 0 39m
prometheus-1615095264-kube-state-metrics-7b5b599dd9-mk9zt 1/1 Running 0 39m
prometheus-1615095264-node-exporter-c9wgj 1/1 Running 0 39m
prometheus-1615095264-node-exporter-jxglg 1/1 Running 0 39m
prometheus-1615095264-pushgateway-d4f7d7df4-9kwh5 1/1 Running 0 39m
prometheus-1615095264-server-6f58f9dff5-8fs4j 1/2 CrashLoopBackOff 6 8m
How exciting :) what's going on
root@sathish-vm2:~# kubectl logs prometheus-1615095264-server-6f58f9dff5-8fs4j prometheus-server
level=info ts=2021-03-07T06:12:23.131Z caller=main.go:364 msg="Starting Prometheus" version="(version=2.24.0, branch=HEAD, revision=02e92236a8bad3503ff5eec3e04ac205a3b8e4fe)"
level=info ts=2021-03-07T06:12:23.132Z caller=main.go:369 build_context="(go=go1.15.6, user=root@d9f90f0b1f76, date=20210106-13:48:37)"
level=info ts=2021-03-07T06:12:23.132Z caller=main.go:370 host_details="(Linux 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 prometheus-1615095264-server-6f58f9dff5-8fs4j (none))"
level=info ts=2021-03-07T06:12:23.132Z caller=main.go:371 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-03-07T06:12:23.132Z caller=main.go:372 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2021-03-07T06:12:23.132Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/data/queries.active err="open /data/queries.active: permission denied"
panic: Unable to create mmap-ed active query log
goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7fff33b6f1cf, 0x5, 0x14, 0x31b88e0, 0xc0008c04b0, 0x31b88e0)
/app/promql/query_logger.go:117 +0x4cf
main.main()
/app/cmd/prometheus/main.go:400 +0x53ec
Seems like some kind of permission issue.Dumping PoD YAML file
root@sathish-vm2:~# kubectl get pod prometheus-1615095264-server-6f58f9dff5-8fs4j -o yaml > prometheous-server.yaml
I found the following section
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
Aha...probably the POD is not able to access PersistantVolumes in the host. I will just be lazy and change the directory ownership and permissions. The correct fix would be to update Helm's values.YAML and other Helm charts for Prometheus.
root@sathish-vm2:~# chown -R 65534:65534 /mnt/kubernetes/pv1
root@sathish-vm2:~# chown -R 65534:65534 /mnt/kubernetes/pv2
#
root@sathish-vm2:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
prometheus-1615095264-alertmanager-865587bb4d-fwf9f 2/2 Running 0 54m
prometheus-1615095264-kube-state-metrics-7b5b599dd9-mk9zt 1/1 Running 0 54m
prometheus-1615095264-node-exporter-c9wgj 1/1 Running 0 54m
prometheus-1615095264-node-exporter-jxglg 1/1 Running 0 54m
prometheus-1615095264-pushgateway-d4f7d7df4-9kwh5 1/1 Running 0 54m
prometheus-1615095264-server-6f58f9dff5-qhqss 1/2 Running 0 16s
Quick and dirty fixes always work i.e till the next time :)
Accessing Prometheus web interface from other hosts
By default, Prometheus web interface is deployed as ClusterIP service
root@sathish-vm2:~# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 143d
my-webservice NodePort 10.103.164.119 <none> 80:30007/TCP 78d
prometheus-1615095264-alertmanager ClusterIP 10.97.79.106 <none> 80/TCP 57m
prometheus-1615095264-kube-state-metrics ClusterIP 10.103.137.189 <none> 8080/TCP 57m
prometheus-1615095264-node-exporter ClusterIP None <none> 9100/TCP 58m
prometheus-1615095264-pushgateway ClusterIP 10.103.54.127 <none> 9091/TCP 57m
prometheus-1615095264-server ClusterIP 10.103.99.247 <none> 80/TCP 58m
stable-kube-state-metrics ClusterIP 10.102.109.184 <none> 8080/TCP 61m
stable-prometheus-node-exporter ClusterIP None <none> 9100/TCP 61m
stable-prometheus-server ClusterIP 10.106.23.124 <none> 80/TCP 61m
I will delete this service and expose it as NodePort
root@sathish-vm2:~# kubectl expose deployment prometheus-1615095264-server --type=NodePort
Error from server (AlreadyExists): services "prometheus-1615095264-server" already exists
root@sathish-vm2:~# kubectl delete svc prometheus-1615095264-server
service "prometheus-1615095264-server" deleted
root@sathish-vm2:~# kubectl expose deployment prometheus-1615095264-server --type=NodePort
service/prometheus-1615095264-server exposed
root@sathish-vm2:~# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 143d
my-webservice NodePort 10.103.164.119 <none> 80:30007/TCP 79d
prometheus-1615095264-alertmanager ClusterIP 10.97.79.106 <none> 80/TCP 69m
prometheus-1615095264-kube-state-metrics ClusterIP 10.103.137.189 <none> 8080/TCP 69m
prometheus-1615095264-node-exporter ClusterIP None <none> 9100/TCP 69m
prometheus-1615095264-pushgateway ClusterIP 10.103.54.127 <none> 9091/TCP 69m
prometheus-1615095264-server NodePort 10.111.254.84 <none> 9090:31449/TCP 6s
stable-kube-state-metrics ClusterIP 10.102.109.184 <none> 8080/TCP 73m
stable-prometheus-node-exporter ClusterIP None <none> 9100/TCP 73m
stable-prometheus-server ClusterIP 10.106.23.124 <none> 80/TCP 73m
And it works!!
Monitoring Kubernetes Nodes
To monitor nodes, we can use Prometheus-node-exporter which can be installed by Helm.
root@sathish-vm2:~# helm install --generate-name prometheus-community/prometheus-node-exporter
NAME: prometheus-node-exporter-1615100641
LAST DEPLOYED: Sun Mar 7 07:04:03 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the application URL by running these commands:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus-node-exporter,release=prometheus-node-exporter-1615100641" -o jsonpath="{.items[0].metadata.name}")
echo "Visit http://127.0.0.1:9100 to use your application"
kubectl port-forward --namespace default $POD_NAME 9100
Once installed, stats can be queried from Prometheus UI:
Just start typing in the search box and it will show you various options that are available.
That's it for today folks. I wanted to cover Grafanna today but couldn't- will cover it in a later post. Till then Ciao and have a good weekend :)
Hello,
I have changed the owner as 65534:65534/mnt/kubernetes/pv1 and 65534:65534/mnt/kubernetes/pv2 but still the prometheus server pod is in crashloopbackoff state only