Note: If you have missed my previous articles on Docker, you can find them here.
Application deployment models evolution.
Getting started with Docker.
Docker file and images.
Publishing images to Docker Hub and re-using them
Docker- Find out what's going on
Docker Networking- Part 1
Docker Networking- Part 2
Docker Swarm-Multi-Host container Cluster
In the previous article, I gave an introduction to Docker Swarm. Docker swarm enables users to run microservices (containers) across multiple hosts, thereby providing redundancy and protection in case of hardware failure. Manager host in a swarm cluster ensures the required number of replicas are always running- if a host reboots, manager nodes spins up the required number of replicas on hosts that are healthy.
So, how do containers running in different swarm hosts communicate with each other? In this article, I will try to answer this question.
Nodes in a swarm cluster need not share the same subnet and can be connected anywhere. To become a member of the swarm cluster all a node needs is manager node IP and auth keys. Given this, if we were to deploy a webservice in a cluster following network requirements come into play:
The webservice should be accessible to the external world with any of swarm node host IP address. This could be used in load balancing.
Cluster nodes should be able to communicate with each other.
Requirement "1" is taken care of by docker swarm implementation. When a service is created, the docker swarm automatically opens the exposed port on all docker swarm nodes. The service can be accessed with any swarm node IP address. Further, docker implements something called "routing mesh" along with a load balancer- this internally load balances traffic between containers running the service in different nodes.
Docker provides a network driver called Overlay, which makes the requirement "2" possible.
Before creating an overlay network, we need to ensure the docker swarm cluster is created. You can look at this article for a brief overview of how to create a swarm cluster.
I have already created a 2-node cluster
root@sathish-vm1:/home/sathish# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
dtbd1uwpw2q958e9jkf5nnd52 * sathish-vm1 Ready Active Leader 19.03.12
9ea38z6jayedmyh813ekzoo80 sathish-vm2 Ready Active 19.03.12
Let's create the overlay network
root@sathish-vm1:/home/sathish# docker network create web --driver overlay
s7uhj2h4m6talob0z6yxl0l4t
root@sathish-vm1:/home/sathish# docker network ls
NETWORK ID NAME DRIVER SCOPE
01f45c79f691 bridge bridge local
bf422f628443 docker_gwbridge bridge local
a2eeae8a2473 host host local
szmt721xjbw5 ingress overlay swarm
62ca27e1244c none null local
s7uhj2h4m6ta web overlay swarm
The overlay driver option in the above command creates an overlay network and as we can see its scope is Docker swarm.
Let's create 6 web replicas in swarm cluster attaching the newly created web overlay network
root@sathish-vm1:/home/sathish# docker service create --replicas 6 --network web --name web -p 80:80 httpd
thb128sv9vx5u5ln26bwiohoy
overall progress: 0 out of 6 tasks
overall progress: 6 out of 6 tasks
1/6: running [==================================================>]
2/6: running [==================================================>]
3/6: running [==================================================>]
4/6: running [==================================================>]
5/6: running [==================================================>]
6/6: running [==================================================>]
verify: Service converged
root@sathish-vm1:/home/sathish# docker service ps web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
jaliyp3uy95b web.1 httpd:latest sathish-vm1 Running Running about an hour ago
mwhpczmpiy6n web.2 httpd:latest sathish-vm2 Running Running about an hour ago
jnayg3jrhvmi web.3 httpd:latest sathish-vm1 Running Running about an hour ago
93j50f81fp2d web.4 httpd:latest sathish-vm2 Running Running about an hour ago
h1ovrklo2ras web.5 httpd:latest sathish-vm1 Running Running about an hour ago
ntp9u8ljk9ij web.6 httpd:latest sathish-vm2 Running Running about an hour ago
Now that 6 instances are created and distributed across VM1 and VM2, let's check out how they are attached to be "web" network with inspect.
On VM1
root@sathish-vm1:/home/sathish# docker network inspect web
[
{
"Name": "web",
"Id": "uecxc14ur67nvolfqtsimuodz",
"Created": "2020-08-26T05:10:35.625292416Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.1.0/24",
"Gateway": "10.0.1.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"57720052786d5181f833898285c50ec95df57a55bb74814b30eb2ca16ba0f01a": {
"Name": "web.5.h1ovrklo2ras0chpvku37qbe8",
"EndpointID": "ea9a043b4c6e27cad92be2542a66b7779ba70a9d7ad5a1e39d7f751c52a8aba3",
"MacAddress": "02:42:0a:00:01:04",
"IPv4Address": "10.0.1.4/24",
"IPv6Address": ""
},
"613d727ca1069fb557dd96f3b8a5aec50ba6891890221a331e2e9e3fe09bea3b": {
"Name": "web.1.jaliyp3uy95bae70y07rrcsjw",
"EndpointID": "9153d6dcd59b832d40630e2d71f1eef2ad150cc2d274fd25130bd98a35047d1f",
"MacAddress": "02:42:0a:00:01:06",
"IPv4Address": "10.0.1.6/24",
"IPv6Address": ""
},
"effff3309e01f9cf7d93964fd9e9c7353a3667c701db24d3c91e552e61708aee": {
"Name": "web.3.jnayg3jrhvmio4t7q25ksv5yp",
"EndpointID": "16cad75311929bf0d19d9ba1ad529dc8b0ee6ff3fdb86812f5c521ccac4339c1",
"MacAddress": "02:42:0a:00:01:08",
"IPv4Address": "10.0.1.8/24",
"IPv6Address": ""
},
"lb-web": {
"Name": "web-endpoint",
"EndpointID": "c05a2798b05513f6195a93b28eec2ce7c3b12e8866210a3ab45005baa7aa56bb",
"MacAddress": "02:42:0a:00:01:0a",
"IPv4Address": "10.0.1.10/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "1fc3633ec5ac",
"IP": "192.168.68.109"
},
{
"Name": "7147a50520de",
"IP": "192.168.68.110"
}
]
}
]
and on VM2
root@sathish-vm2:/home/sathish# docker network inspect web
[
{
"Name": "web",
"Id": "uecxc14ur67nvolfqtsimuodz",
"Created": "2020-08-26T05:10:35.621205724Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.1.0/24",
"Gateway": "10.0.1.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"5f8954fd3645e29252d77b6324290ad5c43d449a53d12a8f300e1ac85680a0eb": {
"Name": "web.4.93j50f81fp2deru56arz0uhjx",
"EndpointID": "2af0d83e92cb33bcb167120973f0838602785287ca1a7d6cb0d4cdb7696c64e6",
"MacAddress": "02:42:0a:00:01:03",
"IPv4Address": "10.0.1.3/24",
"IPv6Address": ""
},
"857e10dde48fe938f466d581d732ed6282723c5cc53f0a96272acb7dc4d2f61d": {
"Name": "web.6.ntp9u8ljk9ijj0ld96382k627",
"EndpointID": "b31559d19d6de83534730b2b3fa33d557d308d9ed74d0eddb4b644d619877217",
"MacAddress": "02:42:0a:00:01:05",
"IPv4Address": "10.0.1.5/24",
"IPv6Address": ""
},
"9a7e34cca22ad78ec9850970cda50ac0f93b364fbd0b4340ca391f8d69c5f285": {
"Name": "web.2.mwhpczmpiy6nvrqdhrgxcgwr4",
"EndpointID": "e339bb448daaf5c5efb5040c7db481122e7b48c811cc51510a5a66aac83fc4a4",
"MacAddress": "02:42:0a:00:01:07",
"IPv4Address": "10.0.1.7/24",
"IPv6Address": ""
},
"lb-web": {
"Name": "web-endpoint",
"EndpointID": "b1115a371c6141db627677fada4b6f1104bc616a5b3722f44131d39e0fef0c5a",
"MacAddress": "02:42:0a:00:01:09",
"IPv4Address": "10.0.1.9/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "7147a50520de",
"IP": "192.168.68.110"
},
{
"Name": "1fc3633ec5ac",
"IP": "192.168.68.109"
}
]
}
]
From the output we can see, there are 2 nodes in the swarm cluster- 192.168.68.109, 192.168.68.110. Each of the containers has a name and an IP address. For example, web.2.mwhpczmpiy6nvrqdhrgxcgwr4 running on VM2 has an IP of 10.0.1.7. Containers running in VM1 can talk to web2 containers with either IP address or the name.
Name resolution is possible because docker runs an internal name resolution service.
root@sathish-vm1:/home/sathish# cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 127.0.0.11
options edns0
This resolv.conf of docker host is "mounted" inside the container
Let's get a shell inside the container and check this out
root@sathish-vm1:/home/sathish# docker container exec -it effff3309e01 /bin/bash
root@effff3309e01:/usr/local/apache2# cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
root@effff3309e01:/usr/local/apache2# mount | grep resolv.conf
/dev/mapper/ubuntu--vg-ubuntu--lv on /etc/resolv.conf type ext4 (rw,relatime)
Now let's test our deployment
The web service can be accessed with any docker swarm host's IP address.
root@sathish-vm1:/home/sathish# curl 192.168.68.109
<html><body><h1>It works!</h1></body></html>
root@sathish-vm1:/home/sathish# curl 192.168.68.110
<html><body><h1>It works!</h1></body></html>
root@sathish-vm1:/home/sathish#
Let's get a shell inside the container and try to ping containers running on other hosts. Note that, you must install ping using apt-get from container shell.
root@sathish-vm1:/home/sathish# docker container exec -it effff3309e01 /bin/bash
root@effff3309e01:/usr/local/apache2# apt-get update
....output deleted for clarity........
root@effff3309e01:/usr/local/apache2# apt-get install iputils-ping
....output deleted for clarity........
Now let's ping web4 container running on VM2
root@effff3309e01:/usr/local/apache2# ping web.4.93j50f81fp2deru56arz0uhjx
PING web.4.93j50f81fp2deru56arz0uhjx (10.0.1.3) 56(84) bytes of data.
64 bytes from web.4.93j50f81fp2deru56arz0uhjx.web (10.0.1.3): icmp_seq=1 ttl=64 time=0.428 ms
64 bytes from web.4.93j50f81fp2deru56arz0uhjx.web (10.0.1.3): icmp_seq=2 ttl=64 time=3.59 ms
As we can see name resolution works- this is due to resolv.conf pointing to DNS service running on the docker host. But how does ping from one container to another one work?
Traffic between containers on different swarm hosts uses VxLAN. VxLAN is UDP in IP encapsulation protocol. Hosts belonging to the same VxLAN network are identified by a 24-bit identifier called VNI- Virtual Network Identifier. The VNI used for "web" network service is 4097 as indicted by "com.docker.network.driver.overlay.vxlanid_list": "4097" line in docker network inspect web output.
With the ping running on the container in VM1, let's capture packets on VM2 and examine its contents.
root@sathish-vm2:/home/sathish# tcpdump -eni eth0 -w vxlan.pcap
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C1571 packets captured
1574 packets received by filter
0 packets dropped by kernel
As we can see, the ICMP request from container (10.0.1.8) was encapsulated with a VxLAN header with a VNI ID of 4097. The outer IP header uses the IP address of VM's as source and destination IP. The destination host will decapsulate the packet and hand it over to the docker container with an IP address of 10.0.1.3. The ping response (ICMP reply) will follow a similar path to reach 10.0.1.8.
Note: I was running docker swarm hosts as VM's and they share the same subnet (Hyper-V V-Switch). This is neither a requirement with docker swarm or VxLAN. You can run swarm cluster with overlay network across different ip subnets.
That's all for today folks, thanks for your time.
Comentarios