Setting Up a Cluster with Docker Swarm

Background

If you have read Docker Swarm documentation (https://docs.docker.com/swarm/), you will know that Docker Swarm is used to perform clustering across multiple Docker Hosts. This provides the user a single virtual Docker Host to interact with the cluster.

There are two ways to use Docker Swarm – the first is to use the official Swarm image, while the second is to run the Swarm binary on the host OS. For the former method, it is possible to use Docker Machine/Toolbox as a way to quickly set up and deploy clusters on VMs locally and/or in the cloud.

Assumptions

For this walk-through:

  • I have three machines on the same network that are able to communicate with each other (e.g. ping) with names host-one, host-two, host-three
  • I will be using the official Swarm image (at time of publication, tag 1.2.3)
  • I will not be using Docker Machine/Toolbox

 

Pre-Requisite

You should have minimally read the Docker Swarm documentation, and gotten a rough understanding of what it does. The aforementioned documentation can be found here: https://docs.docker.com/swarm/

 

Part 1: Understanding Docker Swarm Architecture

If you are familiar with terms like the Discovery Service, Swarm Manager, and Swarm Agent, then you can skip this part.

The Swarm Manager is responsible for receiving all cluster commands (like spinning up new containers), and distributing the work across the nodes. Potentially, this can be a single point of failure, so Docker recommends setting it up in a High Availability (HA) configuration.

The Swarm Agent is responsible for communicating with the Docker Manager, as well as executing cluster commands on the local Docker Engine.

The Discovery Service maintains an updated list of cluster members and shares that list with the Swarm manager(s) that connect to it. The Swarm Manager uses this list of nodes to distribute cluster workloads across the nodes (via the Swarm Agent).

This section provides a high-level overview of what each component does. For more details, please see the official documentation here: https://docs.docker.com/swarm/swarm_at_scale/deploy-infra/

 

Part 2: Set Docker Daemon to Allow TCP connections

In this part, we will configure all the Docker daemons to allow connections via a particular TCP port. The Swarm Manager communicates with the Swarm Agents via TCP. Therefore, the first step is to set the docker daemon to allow connections via a TCP port. There are two methods to do so:

Method #1: Configuring the Docker Daemon via Command Line Arguments

First, stop (any) Docker daemons with the following command on each node:

$ sudo service docker stop

Then, run the Docker Daemon with the following commands:

$ sudo /usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock

This command instructs the Docker daemon to allow connections to it via 1) the Host’s IP address at port 2375, and 2) the local unix socket.

Method #2: Setting the DOCKER_OPTS Environment Variable

The advantage of this method is that even if the Docker Host restarts, the new Docker Engine daemon will still be configured to allow connections via the TCP port specified.

Open the /etc/default/docker file with your favourite text editor, and add the following line:

DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"

Save and close the file.

On non Debian/Ubuntu distros, the Docker Engine daemon reads the custom variables from the file at /etc/default/docker. So just restart the Docker Engine daemon:

$ sudo service docker restart

If you are using a Linux distro that uses systemd to manage the Docker Engine daemon (e.g. Debian/Ubuntu), you need to modify the systemd file directly. (More information can be found at https://docs.docker.com/engine/admin/systemd/)

Create the folder for the drop in file

$ mkdir /lib/systemd/system/docker.service.d

Create/Open the config file with your favourite editor. In the example below, I am using vi with privilege mode

$ sudo vi /lib/systemd/system/docker.service.d/docker.conf

Add the following text to the file:

[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock

Save and close the file. Reload the systemd state by running the following command:

$ systemctl daemon-reload

Restart the Docker Engine daemon:

$ sudo service docker restart

Your Docker Engine daemon should now be accepting connections to it via TCP at the port specified.

 

Part 3: Start the Discovery Service

While Docker Swarm manages the cluster via communication between its Swarm Manager(s) and Agent(s), it needs a way to keep track of the state of the cluster (such as which node is in the cluster, and the state of the nodes). To achieve this, we use a key-value called consul to store information about the network state which includes discovery, networks, endpoints, IP addresses, and more. Docker supports Consul, Etcd, and ZooKeeper key-value stores. This example uses Consul.

For my setup, I have selected host-one to be running the consul container. On host-one, execute the following command to: 1) pull the latest image of progrium/consul if you don’t already have it, and 2) start a consul container running at port 8500

$ docker run -d --name consul -h consul -p 8500:8500 --restart=unless-stopped progrium/consul -server -bootstrap

 

Part 4: Start the Swarm Managers

In this step, we will start the Swarm Manager that will communicate with the consul container to get a list of nodes in the cluster, and manage the cluster for us. If you are not setting up the Swarm Managers in a HA configuration, you can omit the “–replication” part. The template of the command is as follows:

$ docker run -d --name <manager name> -p 4000:4000 --restart=unless-stopped swarm manage -H :4000 --replication --advertise <current host IP>:4000 consul://<consul host IP>:8500

For my setup, I have select host-one and host-two to run the swarm managers.

user@host-one$ docker run -d --name swarm-manager-hostone -p 4000:4000 --restart=unless-stopped swarm manage -H :4000 --replication --advertise <host 1 IP>:4000 consul://<host 1 IP>:8500

user@host-two$ docker run -d --name swarm-manager-hosttwo -p 4000:4000 --restart=unless-stopped swarm manage -H :4000 --replication --advertise <host 2 IP>:4000 consul://<host 1 IP>:8500

To check that the managers have been set up successfully, for both host-one and host-two, run the following command:

$ docker -H :4000 info

This prints some information about the cluster. One of the managers should be the primary (Role: primary) and the other should be the secondary (Role: replica)

 

Part 5: Start the Swarm Agents on each Node

For a Docker host to join the cluster, a Swarm Agent must be started on it. The template of the command is as follows:

$ docker run -d --name <agent name> swarm join --advertise <current host IP>:2375 consul://<consul host IP>:8500

For my setup, I want all three machines to participate in the cluster. So on each of them, I will run the above command.

 

Part 6: Check Swarm Cluster Status

Before we continue, let’s check the status of our cluster. It should look similar to the part below (note: I changed some of the values)

$ docker -H :4000 info 
Containers: 11
 Running: 6
 Paused: 0
 Stopped: 5
Images: 6
Server Version: swarm/1.2.3
Role: replica
Primary: 192.168.1.101:4000
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 3
host-one: 192.168.1.101:2375
 └ ID: ABC3:AB4C:C7AD:QWEE:ASDD:FJKT:WAV2:NH5Q:PGAT:YE6J:PMC2:TAS2
 └ Status: Healthy
 └ Containers: 8
 └ Reserved CPUs: 0 / 8
 └ Reserved Memory: 0 B / 1024 MiB
 └ Labels: executiondriver=, kernelversion=3.10.0-327.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper
 └ UpdatedAt: 2016-06-01T03:29:09Z
 └ ServerVersion: 1.11.1
host-two: 192.168.1.102:2375
 └ ID: XYZ3:XY4C:C7AD:XYZ6:XYZZ:FLOT:WXYZ:MH6Q:TAGG:YF7J:ABCD:TAS3
 └ Status: Healthy
 └ Containers: 3
 └ Reserved CPUs: 0 / 8
 └ Reserved Memory: 0 B / 1024 MiB
 └ Labels: executiondriver=, kernelversion=3.10.0-327.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper
 └ UpdatedAt: 2016-06-01T03:29:26Z
 └ ServerVersion: 1.11.1
Plugins:
 Volume:
 Network:
Kernel Version: 3.10.0-327.el7.x86_64
Operating System: linux
Architecture: amd64
CPUs: 16
Total Memory: 2048 MiB
Name: 3cf825a8217d
Docker Root Dir:
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support

 

Part 7: Running Container(s) on the Cluster

To run container(s) on the cluster, you would pass in commands via the Swarm Manager.

For my set up, I will run the following command on either host-one or host-two:

$ docker -H :4000 run hello-world
$ docker -H :4000 ps -a

From the ps output, you can see that the hello-world container has been started (and stopped) on any one of the swarm cluster nodes. You can repeat the above two commands again to see if the Swarm Manager decides to assign a different cluster node to run the container.

 

Part 8: Observing the Swarm Manager HA In Action

To test the Swarm Manager HA configuration, stop the primary Swarm Manager container and print the cluster info of the secondary Swarm Manager. The output should show that it is now the primary Swarm Manager.

You may need to wait for a while before the fail over mechanism switches promotes a secondary Swarm Manager to be the primary one.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s