If you have read Docker Swarm documentation (https://docs.docker.com/swarm/), you will know that Docker Swarm is used to perform clustering across multiple Docker Hosts. This provides the user a single virtual Docker Host to interact with the cluster.
There are two ways to use Docker Swarm – the first is to use the official Swarm image, while the second is to run the Swarm binary on the host OS. For the former method, it is possible to use Docker Machine/Toolbox as a way to quickly set up and deploy clusters on VMs locally and/or in the cloud.
For this walk-through:
- I have three machines on the same network that are able to communicate with each other (e.g. ping) with names host-one, host-two, host-three
- I will be using the official Swarm image (at time of publication, tag 1.2.3)
- I will not be using Docker Machine/Toolbox
You should have minimally read the Docker Swarm documentation, and gotten a rough understanding of what it does. The aforementioned documentation can be found here: https://docs.docker.com/swarm/
Part 1: Understanding Docker Swarm Architecture
If you are familiar with terms like the Discovery Service, Swarm Manager, and Swarm Agent, then you can skip this part.
The Swarm Manager is responsible for receiving all cluster commands (like spinning up new containers), and distributing the work across the nodes. Potentially, this can be a single point of failure, so Docker recommends setting it up in a High Availability (HA) configuration.
The Swarm Agent is responsible for communicating with the Docker Manager, as well as executing cluster commands on the local Docker Engine.
The Discovery Service maintains an updated list of cluster members and shares that list with the Swarm manager(s) that connect to it. The Swarm Manager uses this list of nodes to distribute cluster workloads across the nodes (via the Swarm Agent).
This section provides a high-level overview of what each component does. For more details, please see the official documentation here: https://docs.docker.com/swarm/swarm_at_scale/deploy-infra/
Part 2: Set Docker Daemon to Allow TCP connections
In this part, we will configure all the Docker daemons to allow connections via a particular TCP port. The Swarm Manager communicates with the Swarm Agents via TCP. Therefore, the first step is to set the docker daemon to allow connections via a TCP port. There are two methods to do so:
Method #1: Configuring the Docker Daemon via Command Line Arguments
First, stop (any) Docker daemons with the following command on each node:
$ sudo service docker stop
Then, run the Docker Daemon with the following commands:
$ sudo /usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
This command instructs the Docker daemon to allow connections to it via 1) the Host’s IP address at port 2375, and 2) the local unix socket.
Method #2: Setting the DOCKER_OPTS Environment Variable
The advantage of this method is that even if the Docker Host restarts, the new Docker Engine daemon will still be configured to allow connections via the TCP port specified.
Open the /etc/default/docker file with your favourite text editor, and add the following line:
DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"
Save and close the file.
On non Debian/Ubuntu distros, the Docker Engine daemon reads the custom variables from the file at /etc/default/docker. So just restart the Docker Engine daemon:
$ sudo service docker restart
If you are using a Linux distro that uses systemd to manage the Docker Engine daemon (e.g. Debian/Ubuntu), you need to modify the systemd file directly. (More information can be found at https://docs.docker.com/engine/admin/systemd/)
Create the folder for the drop in file
$ mkdir /lib/systemd/system/docker.service.d
Create/Open the config file with your favourite editor. In the example below, I am using vi with privilege mode
$ sudo vi /lib/systemd/system/docker.service.d/docker.conf
Add the following text to the file:
[Service] ExecStart= ExecStart=/usr/bin/docker daemon -H fd:// -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
Save and close the file. Reload the systemd state by running the following command:
$ systemctl daemon-reload
Restart the Docker Engine daemon:
$ sudo service docker restart
Your Docker Engine daemon should now be accepting connections to it via TCP at the port specified.
Part 3: Start the Discovery Service
While Docker Swarm manages the cluster via communication between its Swarm Manager(s) and Agent(s), it needs a way to keep track of the state of the cluster (such as which node is in the cluster, and the state of the nodes). To achieve this, we use a key-value called consul to store information about the network state which includes discovery, networks, endpoints, IP addresses, and more. Docker supports Consul, Etcd, and ZooKeeper key-value stores. This example uses Consul.
For my setup, I have selected host-one to be running the consul container. On host-one, execute the following command to: 1) pull the latest image of progrium/consul if you don’t already have it, and 2) start a consul container running at port 8500
$ docker run -d --name consul -h consul -p 8500:8500 --restart=unless-stopped progrium/consul -server -bootstrap
Part 4: Start the Swarm Managers
In this step, we will start the Swarm Manager that will communicate with the consul container to get a list of nodes in the cluster, and manage the cluster for us. If you are not setting up the Swarm Managers in a HA configuration, you can omit the “–replication” part. The template of the command is as follows:
$ docker run -d --name <manager name> -p 4000:4000 --restart=unless-stopped swarm manage -H :4000 --replication --advertise <current host IP>:4000 consul://<consul host IP>:8500
For my setup, I have select host-one and host-two to run the swarm managers.
user@host-one$ docker run -d --name swarm-manager-hostone -p 4000:4000 --restart=unless-stopped swarm manage -H :4000 --replication --advertise <host 1 IP>:4000 consul://<host 1 IP>:8500 user@host-two$ docker run -d --name swarm-manager-hosttwo -p 4000:4000 --restart=unless-stopped swarm manage -H :4000 --replication --advertise <host 2 IP>:4000 consul://<host 1 IP>:8500
To check that the managers have been set up successfully, for both host-one and host-two, run the following command:
$ docker -H :4000 info
This prints some information about the cluster. One of the managers should be the primary (Role: primary) and the other should be the secondary (Role: replica)
Part 5: Start the Swarm Agents on each Node
For a Docker host to join the cluster, a Swarm Agent must be started on it. The template of the command is as follows:
$ docker run -d --name <agent name> swarm join --advertise <current host IP>:2375 consul://<consul host IP>:8500
For my setup, I want all three machines to participate in the cluster. So on each of them, I will run the above command.
Part 6: Check Swarm Cluster Status
Before we continue, let’s check the status of our cluster. It should look similar to the part below (note: I changed some of the values)
$ docker -H :4000 info Containers: 11 Running: 6 Paused: 0 Stopped: 5 Images: 6 Server Version: swarm/1.2.3 Role: replica Primary: 192.168.1.101:4000 Strategy: spread Filters: health, port, containerslots, dependency, affinity, constraint Nodes: 3 host-one: 192.168.1.101:2375 └ ID: ABC3:AB4C:C7AD:QWEE:ASDD:FJKT:WAV2:NH5Q:PGAT:YE6J:PMC2:TAS2 └ Status: Healthy └ Containers: 8 └ Reserved CPUs: 0 / 8 └ Reserved Memory: 0 B / 1024 MiB └ Labels: executiondriver=, kernelversion=3.10.0-327.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper └ UpdatedAt: 2016-06-01T03:29:09Z └ ServerVersion: 1.11.1 host-two: 192.168.1.102:2375 └ ID: XYZ3:XY4C:C7AD:XYZ6:XYZZ:FLOT:WXYZ:MH6Q:TAGG:YF7J:ABCD:TAS3 └ Status: Healthy └ Containers: 3 └ Reserved CPUs: 0 / 8 └ Reserved Memory: 0 B / 1024 MiB └ Labels: executiondriver=, kernelversion=3.10.0-327.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper └ UpdatedAt: 2016-06-01T03:29:26Z └ ServerVersion: 1.11.1 Plugins: Volume: Network: Kernel Version: 3.10.0-327.el7.x86_64 Operating System: linux Architecture: amd64 CPUs: 16 Total Memory: 2048 MiB Name: 3cf825a8217d Docker Root Dir: Debug mode (client): false Debug mode (server): false WARNING: No kernel memory limit support
Part 7: Running Container(s) on the Cluster
To run container(s) on the cluster, you would pass in commands via the Swarm Manager.
For my set up, I will run the following command on either host-one or host-two:
$ docker -H :4000 run hello-world $ docker -H :4000 ps -a
From the ps output, you can see that the hello-world container has been started (and stopped) on any one of the swarm cluster nodes. You can repeat the above two commands again to see if the Swarm Manager decides to assign a different cluster node to run the container.
Part 8: Observing the Swarm Manager HA In Action
To test the Swarm Manager HA configuration, stop the primary Swarm Manager container and print the cluster info of the secondary Swarm Manager. The output should show that it is now the primary Swarm Manager.
You may need to wait for a while before the fail over mechanism switches promotes a secondary Swarm Manager to be the primary one.