Difficulty: intermediate
Estimated Time: 30 minutes

BoxBoat Logo

Docker Swarm

Docker Swarm is Docker's built-in orchestration framework. Docker Swarm has been around since 2015, and powers many production applications. It is a feature-rich, declarative system that helps make distributed application and cluster management simpler, with simple primitives to interact with your CI pipelines.

Please email feedback to: [email protected]

Docker Swarm is how you will primarily deploy your distributed applications. Next, we'll cover how to store images with Docker registries.

Docker Swarm

Step 1 of 4

Docker Swarm Architecture

Docker Swarm Lab Step #1 - Architecture

Docker Swarm simplifies distributed application and cluster management. There are 2 types of nodes:

Managers are the management plane, and maintain the distributed state store. Workers execute workloads.

In general, Managers should have 2-4 CPU cores (QA/Production) and 8-16 GB of memory (QA/Production). Workers typically start out with 4 cores and 16 GB of memory, but they will be changed to meet the specific use-cases of the application.

In addition, some organizations try to make extremely powerful systems to reduce the number of hosts in each cluster. This typically leads to a degraded high availability/failover capability, and can lead to networking issues as well.

The largest production systems we've seen work well have been 24 core (hyperthreaded) / 128 GB of memory. After that, you will improve stability by horizontally scaling your cluster, as opposed to vertically scaling node resources.

Creating your Swarm is easy, just execute:

docker swarm init

You'll then get output that looks like the following:

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-5fezd9ja545ab5bqjbecahxb381zsurdq44pn1uvnnuocrv2ry-45iiu51y1q8p6gyjxvfy67zvp 172.17.0.9:2377

To join any node to your Docker Swarm, just execute that command. This is called a join-token.

If you lose your join-token, you can always get it again with docker swarm join-token worker or docker swarm join-token manager.

Now, ssh to the other node, and execute the command to join it to your swarm:

ssh host02

Now, run your join-token command.

Docker Swarm allows you to promote worker nodes to become manager nodes in your Swarm cluster. You can promote/demote individual nodes with:

docker node promote [node] and docker node demote [node]

Finally, you shouldn't run Docker workloads on Managers. We will because our Swarm isn't particularly large, and isn't in production. However, you can "drain" specific nodes so they don't run workloads. To do this, you'd execute docker node update --availability drain <NODE-ID>. This "drains" a particular node so that workloads cannot be scheduled there.

The first thing we'll do is take a look at our nodes:

docker node ls

$ docker node ls
Error response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager.

In general, all Docker Swarm commands must be run from a Manager. If we think about security, this is a good thing. Only workers should be able to run workloads. If a container is overtaken by a bad actor, they cannot easily influence the Swarm.

Exit from host02 and run:

docker node ls

$ docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
jm994tbqxvugacxhug84m82oa *   host01              Ready               Active              Leader              18.09.5
hylu5ui6fgx99un1y4ke6n6ip     host02              Ready               Active                                  18.09.5

Next, we can inspect nodes:

docker node inspect host01

$ docker node inspect host01
[
    {
        "ID": "yxmhbnqr4xfemfjzwid7idfyu",
        "Version": {
            "Index": 15
        },
        "CreatedAt": "2019-04-03T16:30:07.973951559Z",
        "UpdatedAt": "2019-04-03T16:30:08.156145496Z",
        "Spec": {
            "Labels": {},
            "Role": "worker",
            "Availability": "active"
        },
        "Description": {
            "Hostname": "host01",
            "Platform": {
                "Architecture": "x86_64",
                "OS": "linux"
            },
            "Resources": {
                "NanoCPUs": 1000000000,
                "MemoryBytes": 773480448
            },
            "Engine": {
                "EngineVersion": "18.03.0-ce",
                "Plugins": [
                    {
                        "Type": "Log",
                        "Name": "awslogs"
                    },
                    {
                        "Type": "Log",
                        "Name": "fluentd"
                    },
                    {
                        "Type": "Log",
                        "Name": "gcplogs"
                    },
                    {
                        "Type": "Log",
                        "Name": "gelf"
                    },
                    {
                        "Type": "Log",
                        "Name": "journald"
                    },
                    {
                        "Type": "Log",
                        "Name": "json-file"
                    },
                    {
                        "Type": "Log",
                        "Name": "logentries"
                    },
                    {
                        "Type": "Log",
                        "Name": "splunk"
                    },
                    {
                        "Type": "Log",
                        "Name": "syslog"
                    },
                    {
                        "Type": "Network",
                        "Name": "bridge"
                    },
                    {
                        "Type": "Network",
                        "Name": "host"
                    },
                    {
                        "Type": "Network",
                        "Name": "macvlan"
                    },
                    {
                        "Type": "Network",
                        "Name": "null"
                    },
                    {
                        "Type": "Network",
                        "Name": "overlay"
                    },
                    {
                        "Type": "Volume",
                        "Name": "local"
                    }
                ]
            },
            "TLSInfo": {
                "TrustRoot": "-----BEGIN CERTIFICATE-----\nMIIBajCCARCgAwIBAgIUYE037l2Px1ELkezvbgyUu5Xi1PAwCgYIKoZIzj0EAwIw\nEzERMA8GA1UEAxMIc3dhcm0tY2EwHhcNMTkwNDAzMTYyNTAwWhcNMzkwMzI5MTYy\nNTAwWjATMREwDwYDVQQDEwhzd2FybS1jYTBZMBMGByqGSM49AgEGCCqGSM49AwEH\nA0IABKeYfsawjBITI8MUdpa1FbdTVyigdoX3A14+qNWCjbE5vdQkOlWZByqKPsj8\nOD6911F9MErIKjFWU/Htt8qFwG+jQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMB\nAf8EBTADAQH/MB0GA1UdDgQWBBR7UDjgW5wo52Tnc5/6CR1w15wDJjAKBggqhkjO\nPQQDAgNIADBFAiButN2ZFSWY6KtWnznfDCg7QxoQvCokYZ/UubqhHx+G7QIhAKIL\nzWqBsHsbItqaRXzDSC0XEG3d67VkOav/NsMA3Pzv\n-----END CERTIFICATE-----\n",
                "CertIssuerSubject": "MBMxETAPBgNVBAMTCHN3YXJtLWNh",
                "CertIssuerPublicKey": "MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEp5h+xrCMEhMjwxR2lrUVt1NXKKB2hfcDXj6o1YKNsTm91CQ6VZkHKoo+yPw4Pr3XUX0wSsgqMVZT8e23yoXAbw=="
            }
        },
        "Status": {
            "State": "ready",
            "Addr": "172.17.0.30"
        }
    }
]

Notice that there is an empty section for labels in the output above. Labels are a powerful and easy way to differentiate individual nodes and their characteristics.

Let's add a label specifying that node host01 has a fast disk:

docker node update --label-add disk=fast host01

This concept is powerful because you can specify scheduling constraints for your workloads in Docker Swarm. Using the disk label given above, we would be able to require that disk intensive applications only run on nodes that specify disk=fast, for example.

And with that, we've got a functional Swarm!