PostgreSQL 11 Cold Start Failover for High Availability in Docker

This post is part of a series of PostgreSQL Standby Failover in Docker:

The PostgreSQL documentation has a high level overview on how to set up various failover, replication, and load balancing solutions (link). I found the documentation rather helpful and wanted to see if I can implement the same solutions with a Docker twist.

In this post, I will document how to implement these solutions via Docker. I am using the following setup:

  • CentOS 7 (minimal installation)
  • Docker Community Edition 18.06.1-ce
  • Docker Compose 1.24.0
  • PostgreSQL 11

Note: This post assumes setup on a private lab environment, where most of the heavy lifting would need to be done by the reader. If this is done on a cloud provider, there might exist solutions that would do the job for us, such as via AWS Lambda and AWS Step Functions, or even full services like Azure Database for PostgreSQL.

Solution #1 – Shared Disk Failover

Overview

In this solution, the entire database is stored on the same physical medium, like a NAS with RAID 5 or 6 configuration that is mounted on the host(s). The advantage is that database recovery in the event of a PostgreSQL database server crash is quick, as there are no additional steps except for restarting the server process. The disadvantage is that the physical storage medium becomes a single point of failure.

This set up is likely suitable for small teams or research labs, where both “horizontal” and “vertical” data backup strategies are in place (not within scope of this post). Critical systems should not be using this set up as the the high IOPS increases the chances of a disk failures, and with no incremental backups in place, disaster recovery (DR) would be using the last database dump file with all intermediate transactions lost.

Setup and Testing

The setup would look something like this:

PostgreSQL Server 1  <---> Database Files on Storage Medium

PostgreSQL Server N-th (server inactive)

The moment “PostgreSQL Server 1” container experiences an abnormal termination of the server process, a new PostgreSQL Server container starts and takes over its place. There will only be one active instance of the database server at any point in time. This post on the PostgreSQL forum explains the traditional fail over process.

For this post’s implementation purposes, I will assume that the NAS has been mounted onto the host at /volumes/postgres-data-dir.

To achieve this, my docker compose YAML file looks like this:

version: "3.5"
services:
  db-one:
    container_name: db-one
    image: postgres:latest
    restart: on-failure
    networks:
      - postgres-db-network
    volumes:
      - type: bind
        source: /volumes/postgres-data-dir
        target: /postgres-data-dir
    environment:
      POSTGRES_PASSWORD: my-secure-password
      PGDATA: /postgres-data-dir
    deploy:
      mode: replicated
      replicas: 1
networks:
  postgres-db-network:
    driver: overlay
    name: app-db-network
    attachable: true

Note that:

  • Docker Compose version 3.5 is needed as I named the network
  • the replicas: 1 tells Docker that the desired state is having exactly 1 instance of this container running

Set up the application stack as follows:

Create a single node Swarm cluster
Terminal 1 $ docker swarm init --advertise-addr eth0
Swarm initialized: current node (...) is now a manager.
Deploy the application 
Terminal 1 $ docker stack deploy -c docker-compose.yml postgres-example-1
Creating network app-db-network 
Creating service postgres-example-1_db-one 
Verify that the stack and its services have been deployed successfully
Terminal 1 $ docker stack ps postgres-example-1
Terminal 1 $ docker stack services postgres-example-1

Now that the stack and PostgreSQL database service is set up, let’s test the failover:

Because we use Swarm, the container_name key is not respected. 
Do 'docker ps' to get the dynamic container id
Terminal 1 $ docker ps
(Observe the container ID for db-one)

Connect to the database via psql and login with password
Terminal 1 $ docker run -it --rm --network app-db-network postgres:latest psql -h (db-one container ID) -U postgres

Show the list of databases. Make a change by adding a new database.
postgres=# \l
...
(3 rows)
postgres=# CREATE DATABASE my_database;
postgres=# \l
...
(4 rows)
postgres=# exit

Forcefully remove the running database container to simulate a crash. Swarm will spin up new instance as the desired state is 1 
Terminal 1 $ docker rm -f (db-one container ID)

After a while, notice that failover has happened. Get the new container id 
Terminal 1 $ docker ps 
(Observe the new container ID for db-one)

Connect to the database via psql
Terminal 1 $ docker run -it --rm --network app-db-network postgres:latest psql -h (new db-one container ID) -U postgres

Show the list of databases. observe that our newly added database is still there in a new container. State is persistent
postgres=# \l
...
(4 rows)

What Did We Do

We created a new database instance, made some changes, and a “crash” happened. Docker Swarm (the failover mechanism) kicked in and created a new database container that contains the changes we made. The changes we made persisted between database server instances as they all share a disk from which they read the database from.

Due to the shared disk, only one database instance can be active at any time (otherwise race condition happens). The advantage of this solution is that we can make use of Docker Swarm to handle the failover process.

By now, we have successfully set up a Shared Disk Failover, and used Docker Swarm as the failover mechanism.

Now that you are done, let’s clean up after ourselves:

Terminal 1 $ docker stack rm postgres-example-1
Removing service postgres-stage-1_db-one
Removing network app-db-network

Solution #2 – File System (Block Device) Replication

Overview

In this solution, we are trying to avoid the issue of the physical storage medium being the single point of failure. As the heading suggests, solution #2 is achieved through file system replication.

As we start to increase the number of nodes in our cluster, the issue of data synchronization becomes important. But since this is a cold standby scenario, only one instance will be running and we do not need to be overly concerned about real-time synchronization of data across the various database instances.

The advantage of this approach is that we no longer have a single point of failure. However, data synchronization may cause the following problems if we are not careful:

  • databases with high volume of writes may lose some data if the application stack crashes between data synchronization calls
  • failover database folder may also be corrupted if synchronization mechanism replicates the primary database’s corrupted files onto it

Setup and Testing

With multiple disks, the setup would look something like this:

PostgreSQL Server 1    <--->  Database 1's Files*
PostgreSQL Server 2    <--->  Database 2's Files*
...
PostgreSQL Server N-th <--->  Database N-th's Files*

* All copies are kept in sync via some file system replication mechanism  

To keep all copies in sync, the PostgreSQL documentation hints/recommends to use the Distributed Replicated Block Device (DRBD) software. However, DRBD operates on the host and device level, requiring setup outside the Docker environment. Therefore, a more Docker friendly solution would be needed.

Some solutions (within Docker) that I considered:

And I eventually settled on lsyncd because it seemed simple enough to not require another blog post to describe how to set it up. I used a community container image that had everything set up and I would just bind the two containers to predetermined folder paths within the container.

I have two docker compose YAML files, and assume that the primary and failover database disks are mounted at  /volumes/postgres-data-dir and /volumes/postgres-data-dir-two respectively.

My primary database docker compose file is as follows:

version: "3.5"
services:
  db-one:
    container_name: db-one
    image: postgres:latest
    restart: on-failure
    user: '999'
    networks:
      - postgres-db-network
    volumes:
      - type: bind
        source: /volumes/postgres-data-dir
        target: /postgres-data-dir
    environment:
      POSTGRES_PASSWORD: my-secure-password
      PGDATA: /postgres-data-dir
  db-lsyncd:
    container_name: db-lsyncd
    image: chauffer/lsyncd:latest
    restart: on-failure
    user: '999'
    networks:
      - postgres-db-network
    volumes:
      - type: bind
        source: /volumes/postgres-data-dir
        target: /source
      - type: bind
        source: /volumes/postgres-data-dir-two
        target: /target
    depends_on:
      - db-one
networks:
  postgres-db-network: 
    driver: overlay 
    name: app-db-network 
    attachable: true

Notice the additional fields:

  • user: ‘999’
    • PostgreSQL process runs as the postgres user with UID 999 – it is important that both contains run as 999 so that lsyncd would not copy the new/modified database files with the default root permissions.
    • If this happens, the failover database server would not be able to start as it does not have read/write access to the required files
  • depends-on
    • the lsyncd container should only be started after db-one has been successfully created (as per Docker documentation, this means the container created, not when the PostgreSQL server is ready to accept connections)

And for the failover server, my docker compose file is as follows:

version: "3.5"
services:
  db-two:
    container_name: db-two
    image: postgres:latest
    restart: on-failure
    user: '999'
    networks:
      - postgres-db-network
    volumes:
      - type: bind
        source: /volumes/postgres-data-dir-two
        target: /postgres-data-dir
    environment:
      POSTGRES_PASSWORD: my-secure-password
      PGDATA: /postgres-data-dir
networks:
  postgres-db-network: 
    driver: overlay 
    name: app-db-network 
    attachable: true

Although possible, the above example does not have a lsyncd chain to the next failover database instance.

If you have done it earlier, you can skip the following steps to set up the primary database application stack:

Create a single node Swarm cluster
Terminal 1 $ docker swarm init --advertise-addr eth0
Swarm initialized: current node (...) is now a manager.
Deploy the application 
Terminal 1 $ docker stack deploy -c docker-compose.yml postgres-example-2
Creating network app-db-network 
Creating service postgres-example-2_db-one 
Creating service postgres-example-2_db-lsyncd 

Verify that the stack and its services have been deployed successfully
Terminal 1 $ docker stack ps postgres-example-2
Terminal 1 $ docker stack services postgres-example-2

Now that the primary database stack and has been deployed, let’s test the failover:

Because we use Swarm, the container_name key is not respected. 
Do 'docker ps' to get the dynamic container id
Terminal 1 $ docker ps
(Observe the container ID for db-one)

Connect to the database via psql and login with password
Terminal 1 $ docker run -it --rm --network app-db-network postgres:latest psql -h (db-one container ID) -U postgres

Show the list of databases. Make a change by adding a new database.
postgres=# \l
...
(3 rows)
postgres=# CREATE DATABASE my_database;
postgres=# \l
...
(4 rows)
postgres=# exit

Forcefully unload the stack to simulate a crash.
Terminal 1 $ docker stack rm postgres-example-2

Terminal 1 $ docker stack deploy -c docker-compose-two.yml postgres-example-2
Creating network app-db-network
Creating service postgres-example-2_db-two

Terminal 1 $ docker ps 
(Observe the container ID for db-two)
Connect to the database via psql
Terminal 1 $ docker run -it --rm --network app-db-network postgres:latest psql -h (new db-two container ID) -U postgres

Show the list of databases. observe that our newly added database is still there in a new container. State is persistent
postgres=# \l
...
(4 rows)

What Did We Do

We deployed the primary database’s application stack, comprising of one PostgreSQL container and a lsyncd container. We applied changes to the database’s underlying files, which triggered lsyncd to replicate said changes to the failover database folder.

When that the primary database’s application stack “crashed”, we easily spun up a new database application stack that read off the failover database folder containing our earlier changes.

While the above failover mechanism is manual, a simple shell script wrapper can be used to automatically deploy the failover database application stack once it detects that the primary database is no longer available.

By now, we have successfully set up the File System Replication Failover, using a manual failover process.

Now that you are done, let’s clean up after ourselves:

Terminal 1 $ docker stack rm postgres-example-2
Removing service postgres-stage-2_db-two
Removing network app-db-network

Summary

In this post, we have explored how to set up PostgreSQL server failovers (as per the PostgreSQL documentation) via cold standbys with a Docker twist. There are many ways to implement this, and I have only covered two possibilities.

The solutions above vary slightly between the use of a single shared disk, or multiple disks with synchronization between them.

In a follow up post, I hope to explore PostgreSQL failovers via warm standbys in Docker. Thanks for reading until the end 🙂

One thought on “PostgreSQL 11 Cold Start Failover for High Availability in Docker

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s