Kubernetes Administration- Storage

Note: If you have missed my previous articles on Docker and Kubernetes, you can find them here. 
Application deployment models evolution.
Getting started with Docker.
Docker file and images.
Publishing images to Docker Hub and re-using them.
Docker- Find out what's going on.
Docker Networking- Part 1.
Docker Networking- Part 2.
Docker Swarm-Multi-Host container Cluster.
Docker Networking- Part 3 (Overlay Driver).
Introduction to Kubernetes.
Kubernetes- Diving in (Part 1)-Installing Kubernetes multi-node cluster.
Kubernetes-Diving in (Part2)- Services.
Kubernetes- Infrastructure As Code with Yaml (part 1).
Kubernetes- Infrastructure As Code Part 2- Creating PODs with YAML.
Kubernetes Infrastructure-as-Code part 3- Replicasets with YAML.
Kubernetes Infrastructure-as-Code part 4 - Deployments and Services with YAML.
Deploying a microservices APP with Kubernetes.
Kubernetes- Time based scaling of deployments with python client.
Kubernetes Networking - The Flannel network explained.
Kubernetes- Installing and using kubectl top for monitoring nodes and PoDs
Kubernetes Administration- Scheduling

Compute, Storage, networking, and users (administrators/developers) are important pieces of IT infrastructure. Any orchestration system should provide ways to provision, manage, and monitor these pieces. From Kubernetes, perspective PoD is the most fundamental unit of deployment (PoDs are abstracted by replicaSets and deployments). That said:

PoDs consume CPU/Memory resources and should be scheduled on nodes that satisfy requirements: In, Kubernetes Administration- Scheduling I gave a brief overview of how PoDs can be scheduled.
PoDs need to communicate externally and with other PoDs, nodes in the cluster. Container Network Interface specifies requirements of such a solution. One such solution is Flannel. I have attempted to give an overview of the solution here: Kubernetes Networking - The Flannel network explained
The third piece of infrastructure is storage- A PoD runs on a cluster node and utilizes storage. In this article, I hope to give an overview of various storage administration.
I will cover User administration in a future article.

Volumes

To explain the concept of Volumes, I am going to create an HTTP deployment with 2 replicas first.

root@sathish-vm2:/home/sathish# kubectl create deployment mywebserver --image=httpd --replicas=2
deployment.apps/mywebserver created

root@sathish-vm2:/home/sathish# kubectl get deployment --show-labels
NAME          READY   UP-TO-DATE   AVAILABLE   AGE     LABELS
mywebserver   2/2     2            2           8m41s   app=mywebserver

root@sathish-vm2:/home/sathish# kubectl get nodes -o wide
NAME          STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
sathish-vm1   Ready    <none>   61d   v1.19.0   172.28.147.44   <none>        Ubuntu 20.04.1 LTS   5.4.0-47-generic   docker://19.3.11
sathish-vm2   Ready    master   61d   v1.19.0   172.28.147.38   <none>        Ubuntu 20.04.1 LTS   5.4.0-47-generic   docker://19.3.11

root@sathish-vm2:/home/sathish# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
mywebserver-66465b594f-jjvbg   1/1     Running   0          13m   10.244.0.7    sathish-vm2   <none>           <none>
mywebserver-66465b594f-pztcj   1/1     Running   0          14m   10.244.1.18   sathish-vm1   <none>           <none>

Now let's create a nodePort service to expose this with this YAML file

apiVersion: v1
kind: Service
metadata:
  name: my-webservice
spec:
  type: NodePort
  selector:
          app: mywebserver
  ports:
    - port: 80
      targetPort: 80
      nodePort: 30007

Accessing the server and it works.

root@sathish-vm2:/home/sathish# curl http://172.28.147.38:30007
<html><body><h1>It works!</h1></body></html>

"It works" is the default HTML page from apache.

Let's say I have a custom webpage somewhere on my local storage and I want to display that when the webserver is accessed. The HTML for the webpage is given below

<html>
<body>
<p>Hi from Sathish</p>
</body>
</html>

How can I make sure the container serves this webpage instead of the default "it works" page?

Let's look at the Docker Hub page for apache

One of the options is this

docker run -dit --name my-apache-app -p 8080:80 -v "$PWD":/usr/local/apache2/htdocs/ httpd:2.4

This essentially mounts contents of current directory of host in /usr/local/apache2/htdocs/ within the container. So, as long I have my custom file somewhere on local storage it is possible to serve this during the container.

So let's recreate the deployment to make this happen.

root@sathish-vm2:/home/sathish/mypage# kubectl  get deployment mywebserver -o yaml > mywebserver.yaml
#Deleting existing deployment
root@sathish-vm2:/home/sathish/mypage# kubectl delete deployment mywebserver
deployment.apps "mywebserver" deleted
#Create skeletal YAML file
kubectl create deployment mywebserver --image=httpd --replicas=2 --dry-run=client -o yaml  > mywebserver.yaml
#Updated content of mywebserver.yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mywebserver
  name: mywebserver
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mywebserver
  strategy: {}
  template:
    metadata:
      labels:
        app: mywebserver
    spec:
      containers:
      - image: httpd
        name: httpd
        volumeMounts:
        - name: htmlpath
          mountPath: /usr/local/apache2/htdocs/
      volumes:
      - name: htmlpath
        hostPath:
          path: /home/sathish/mypage/index.html
          type: File
#Creating the deployment
root@sathish-vm2:/home/sathish# kubectl create -f mywebserver.yaml
deployment.apps/mywebserver created

Let's look at

volumeMounts: - name: htmlpath mountPath: /usr/local/apache2/htdocs/ volumes: - name: htmlpath hostPath: path: /home/sathish/mypage/index.html type: File

Volume Mount under the container section tells Kubernetes to look for volume by a specific name - "htmlpath" in this case. The contents of the volume will be mounted in "mountPath" which is /usr/local/apache2/htdocs/
Volume section specifies the parameters for Volume which in this case is hostPath. "/home/sathish/mypage/index.html" will be mounted inside the container.

Now when I try to access the webservice it displays my custom webpage

root@sathish-vm2:/home/sathish# curl http://172.28.147.38:30007
<html><body><p>Hi from Sathish</hp></body></html>

There is a problem with this- I have to ensure that the mountpath directory is created and the HTML file is available on all nodes where replicas run. This can be solved by having a network mount on all nodes and placing the file there. The network mount could be NFS or any of the cloud file services like Amazon EFS.

The above example is that of a "consumer" PoD that consumes something from the file system and serves it to users.

Let's talk about a different requirement- What about PoDs that require storage for storing and retrieving data (say dynamic data created by server-side scripts). We could create VolumeMounts on nodes and ensure that PoDs are run on nodes with sufficient storage. However, this approach becomes cumbersome when managing 100's or 1000's of PoDs. Each PoD might need a specific amount of storage (say 100 MB) with certain access criteria. Would it be possible to create a bunch of volumes from available storage (say 10 pieces of 100 Mb data) and claim them dynamically as required? This is possible with PersitantVolume and PersistantVolume Claims.

Persistent Volumes and Persistent Volume Claims

Persistent volumes and claims are described here

A persistent volume is a certain amount of physical storage provisioned on a cluster by the administrator. Persistent volume claims can "claim" these volumes. PoDs can use Persistent Volume Claims.

Let's create a PersistentVolume. Here is Yaml file for the same


apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-volume
spec:
  capacity:
    storage: 100Mi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/home/sathish/mypage"


root@sathish-vm2:/home/sathish# kubectl get pv
NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
my-volume   100Mi      RWO            Retain           Available                                   6m5s

"Available" indicates that this volume is available and can be claimed. The different accessModes are described here. As we are using hostPath, read-write-Once is supported mode. If you NFS or cloud storage, other options are supported.

Let's create a PersistantVolumeClaim object and "claim" this volume.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pv-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Mi

root@sathish-vm2:/home/sathish# kubectl create -f pvc.yaml
persistentvolumeclaim/my-pv-claim created

root@sathish-vm2:/home/sathish# kubectl get  pv
NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                 STORAGECLASS   REASON   AGE
my-volume   100Mi      RWO            Retain           Bound    default/my-pv-claim                           5m1s

root@sathish-vm2:/home/sathish# kubectl get  pvc
NAME          STATUS   VOLUME      CAPACITY   ACCESS MODES   STORAGECLASS   AGE
my-pv-claim   Bound    my-volume   100Mi      RWO                           17s

Few things to note:

Even though the claim required 50Mb, there were no 50MB volumes available. So claim for bound to a volume that satisfied its storage requirement even though the volume itself larger (100 MB)
The access mode between PV and PVC should match (ReadWriteOnce in this case).

Let's recreate the earlier deployment, but this time have it use the PVC. Here is the YAML file


apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mywebserver
  name: mywebserver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mywebserver
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: mywebserver
    spec:
      volumes:
        - name: webcontent
          persistentVolumeClaim:
            claimName: my-pv-claim
      containers:
      - image: httpd
        name: httpd
        volumeMounts:
          - mountPath: "/usr/local/apache2/htdocs/"
            name: webcontent
      nodeName: sathish-vm2

Notes:
"nodeName" selector is used as this is where I have created the directory for PV. In my case, this happens to be the master node. I have labelled my master node like so:

root@sathish-vm2:/home/sathish# kubectl label node sathish-vm2 name=sathish-vm2
node/sathish-vm2 labeled

By defaults PoDs cannot be scheduled on master node due to the presence  of NoSchedule taint. This can be removed with:

root@sathish-vm2:/home/sathish# kubectl taint node sathish-vm2  node-role.kubernetes.io/master:NoSchedule-

Let's check things out

#Rewriting index.html webpage with custom page
root@sathish-vm2:/home/sathish/mypage# kubectl exec mywebserver-6d6c47b597-jqtfc -- sh -c "echo 'Hi from sathish-vm2 pvc' > /usr/local/apache2/htdocs/index.html"

#Checking access
root@sathish-vm2:/home/sathish/mypage# curl http://172.28.147.38:30007
Hi from sathish-vm2 pvc

#The webpage is created on local storage  referenced by PV
root@sathish-vm2:/home/sathish/mypage# pwd
/home/sathish/mypage
root@sathish-vm2:/home/sathish/mypage# cat index.html
Hi from sathish-vm2 pvc

That's it for today folks. Hope this was useful. Thanks for your time and have a good weekend.

Quick intro to Docker, Kubernetes and latest network tech

Kubernetes Administration- Storage

Recent Posts

Comments

Never Miss a Post. Subscribe Now!