PostgreSQL 11 Warm Standby Failover (Log Shipping) for High Availability in Docker

This post is part of a series of PostgreSQL Standby Failover in Docker:

The PostgreSQL documentation has a high level overview on how to set up various failover, replication, and load balancing solutions (link). I found the documentation rather helpful and wanted to see if I can implement the same solutions with a Docker twist.

In case you have not read my earlier post about Postgres cold start failover, I am using the following setup:

  • CentOS 7 (minimal installation)
  • Docker Community Edition 18.06.1-ce
  • Docker Compose 1.24.0
  • PostgreSQL 11

Note: This post assumes setup on a private lab environment, where most of the heavy lifting would need to be done by the reader. If this is done on a SaaS provider, there might exist solutions that would do the job for us, such as via AWS Lambda and AWS Step Functions, or even full services like Azure Database for PostgreSQL.

Solution #3 – Write-Ahead Log (WAL) Shipping

Overview

In Solution #2 (of the previous post), when the primary server fails, the failover process included starting the cold standby server. Depending on the complexity of the database, the startup duration can be non-trivial and may be unacceptable for some applications.

To reduce the failover time, we can run the standby server’s process concurrently with the primary server to allow it to quickly take over when the primary has failed (i.e. a standby server). However, because the primary process is running, we cannot use file system (block device) replication as the two processes would cause write-write conflicts on the database server files.

Postgres offers several solutions for this problem, and depending on the type of solution, the standby Postgres server can be considered as a warm or hot standby server. In this post, I focus on the WAL Log Shipping approach, whereby the standby is considered warm.

Architecture and Setup

The setup would look something like this:

PostgreSQL Primary Server ---> WAL Archive <--- PostgreSQL Standby Server

The primary server will run with archive_mode='on' – what this means is that the server will use the command defined by archive_command to update the WAL Archive with new WAL files.

The standby server will run in recovery mode until it is promoted via the pg_ctl command, or when the trigger_file path is present. The presence of the recovery.conf file within the main postgresql folder will indicate that the standby server is running in recovery mode.

For this post’s implementation purposes, I will assume that an appropriate file storage has been mounted onto the host at /volumes/postgres-data-dir (primary) and /volumes/postgres-data-dir-two (standby), and that the folders are empty to begin with. When running a swarm cluster with multiple physical nodes, these mount points should be identical on every host, as the server processes may sit on different nodes.

My Docker Compose .yml files are as follows:

Primary PostgreSQL Server
[~/postgres-stage-3]$ cat docker-compose-primary.yml
version: "3.5"
services:
  db-primary:
    image: postgres:latest
    restart: on-failure
    user: '999'
    command: ["postgres",
              "-carchive_mode=on",
              "-cwal_level=replica",
              "-carchive_command=test ! -f /postgres-wal-dir/%f && cp %p /postgres-wal-dir/%f"]
    networks:
      - postgres-db-network
    volumes:
      - type: bind
        source: /volumes/postgres-data-dir
        target: /postgres-data-dir
      - type: bind
        source: /volumes/postgres-wal-dir
        target: /postgres-wal-dir
    environment:
      POSTGRES_PASSWORD: my-secure-password
      PGDATA: /postgres-data-dir

networks:
  postgres-db-network:
    driver: overlay
    name: app-db-network
    attachable: true

Do note the arguments passed to the postgres command:

  • archive_mode – enables archiving
  • wal_level – defines the level of details within the WAL
  • archive_command – command to use to archive a logfile segment
Standby PostgreSQL Server
[~/postgres-stage-3]$ cat docker-compose-standby.yml
version: "3.5"
services:
  db-standby:
    image: postgres:latest
    restart: on-failure
    user: '999'
    command: ["postgres"]
    networks:
      - postgres-db-network
    volumes:
      - type: bind
        source: /volumes/postgres-data-dir-two
        target: /postgres-data-dir
      - type: bind
        source: /volumes/postgres-wal-dir
        target: /postgres-wal-dir
    environment:
      POSTGRES_PASSWORD: my-secure-password
      PGDATA: /postgres-data-dir

networks:
  postgres-db-network:
    driver: overlay
    name: app-db-network
    attachable: true

Note that because my project folder’s name is “postgres-stage-3”, therefore my containers will have it as a prefix.

Prepare and Test the PostgreSQL Warm Standby Server

Step 1: Start the Primary PostgreSQL Server

$ docker-compose -f docker-compose-primary.yml up -d
Creating postgres-stage-3_db-primary_1 ... done

Ensure that the database has been created successfully:

$ docker logs postgres-stage-3_db-primary_1
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
...
PostgreSQL init process complete; ready for start up.
2019-06-01 23:49:56.232 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-06-01 23:49:56.232 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2019-06-01 23:49:56.235 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-06-01 23:49:56.250 UTC [39] LOG:  database system was shut down at 2019-06-01 23:49:56 UTC
2019-06-01 23:49:56.255 UTC [1] LOG:  database system is ready to accept connections

Step 2: Take a Base Backup of Primary PostgreSQL Server

There are several ways to do it, but essentially what we want to do is to take a base backup of the Primary PostgreSQL Server, and use it as a starting point for our Standby PostgreSQL Server to start from.

Here, I use the pg_basebackup client application to make “a binary copy of the database cluster files, while making sure the system is put in and out of backup mode automatically.” I use reused the WAL archive directory as the output folder for the backup.

$ docker exec -it postgres-stage-3_db-primary_1 /bin/bash -c 'pg_basebackup -h localhost -p 5432 -U postgres -D /postgres-wal-dir/base_backup'

pg_basebackup would create the destination folder for you if it does not exist. However, if it exists and is not empty, you will see the following error message:

  • pg_basebackup: directory “/postgres-wal-dir/base_backup” exists but is not empty

Step 3: Copy the Primary’s Base Backup into the Standby’s Data Directory

Make sure that your current user has r/w access to the folders

$ sudo cp -rp /volumes/postgres-wal-dir/base_backup/* /volumes/postgres-data-dir-two/

Step 4: Create the recovery.conf and put it into the Standby PostgresSQL Server Folder

Make sure your current user has write permissions to the intended postgres data folder, and that the recovery.conf file belongs to the postgres user (container default postgres UID/GID is 999)

$ sudo vi /volumes/postgres-data-dir-two/recovery.conf
standby_mode = 'on'
restore_command = 'cp /postgres-wal-dir/%f %p'
archive_cleanup_command = 'pg_archivecleanup -d /postgres-wal-dir %r'
recovery_target_timeline = 'latest'

$ sudo chown 999:999 /volumes/postgres-data-dir-two/recovery.conf
$ sudo chmod 600 /volumes/postgres-data-dir-two/recovery.conf

Step 5: Start the Standby PostgreSQL Server

$ docker-compose -f docker-compose-standby.yml up -d

To explain what is happening, we can omit the “-d” option to see the logs:

$ docker-compose -f docker-compose-standby.yml up
Creating postgres-stage-3_db-standby_1 ... done
Attaching to postgres-stage-3_db-standby_1
db-standby_1  | 2019-06-01 23:56:28.125 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db-standby_1  | 2019-06-01 23:56:28.125 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db-standby_1  | 2019-06-01 23:56:28.129 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db-standby_1  | 2019-06-01 23:56:28.144 UTC [11] LOG:  database system was interrupted; last known up at 2019-06-01 23:53:29 UTC
db-standby_1  | cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory
db-standby_1  | 2019-06-01 23:56:28.389 UTC [11] LOG:  entering standby mode
db-standby_1  | 2019-06-01 23:56:28.399 UTC [11] LOG:  restored log file "000000010000000000000002" from archive
db-standby_1  | 2019-06-01 23:56:28.445 UTC [11] LOG:  redo starts at 0/2000028
db-standby_1  | 2019-06-01 23:56:28.448 UTC [11] LOG:  consistent recovery state reached at 0/2000130
db-standby_1  | 2019-06-01 23:56:28.449 UTC [1] LOG:  database system is ready to accept read only connections
db-standby_1  | cp: cannot stat '/postgres-wal-dir/000000010000000000000003': No such file or directory
db-standby_1  | cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory
db-standby_1  | cp: cannot stat '/postgres-wal-dir/000000010000000000000003': No such file or directory
db-standby_1  | cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory
...

The line “consistent recovery state reached at 0/2000130” indicates that the Standby PostgreSQL Server has:

  • started successfully,
  • restored all the outstanding WALs in the WAL archive directory (filename: 000000010000000000000002), and
  • will continue to poll and restore new WALs until the server is stopped, or has been promoted to become the primary server

Step 6: Make changes to Primary PostgreSQL Server database

In this step, I will make some changes to the default postgres database, namely:

  • Create a new table
  • Add two rows into the new table
$ docker exec -it postgres-stage-3_db-primary_1 /bin/bash -c 'psql -U postgres postgres'
psql (11.2 (Debian 11.2-1.pgdg90+1))
Type "help" for help.

postgres=# CREATE TABLE my_table(col1 varchar, col2 varchar);
CREATE TABLE
postgres=# INSERT INTO my_table VALUES ('row1', 'this is row one'), ('row2', 'this is row2');
INSERT 0 2
postgres=# \q

Step 7: Check the Standby PostgreSQL Server (and understand why its a warm standby setup)

$ docker exec -it postgres-stage-3_db-standby_1 /bin/bash -c 'psql -U postgres postgres'
psql (11.2 (Debian 11.2-1.pgdg90+1))
Type "help" for help.

postgres=# \dt
Did not find any relations.

Here, we understand why the WAL Log Shipping approach is considered as a Warm Standby: changes in the primary database are not immediately reflected in the standby database. The standby is only updated when a new WAL is received in the WAL archive directory.

In the event of a crash of the primary server, the standby server will be behind by one WAL file. This may be acceptable for applications that are not critical.

Based on the default config values, a new WAL file will only be written when the existing WAL exceeds 16MB. However, we can speed up the process by shutting down the primary server:

$ docker-compose -f docker-compose-primary.yml down -d
Stopping postgres-stage-3_db-primary_1 ... done
Removing postgres-stage-3_db-primary_1 ... done
Removing network app-db-network ... done

If we check the logs of the standby, we will see something similar to:

$ docker logs postgres-stage-3_db-standby_1
...
cp: cannot stat '/postgres-wal-dir/000000010000000000000003': No such file or directory
cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory
2019-06-02 00:14:40.119 UTC [11] LOG: restored log file "000000010000000000000003" from archive
cp: cannot stat '/postgres-wal-dir/000000010000000000000004': No such file or directory
cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory
pg_archivecleanup: keeping WAL file "/postgres-wal-dir/000000010000000000000003" and later
pg_archivecleanup: removing file "/postgres-wal-dir/000000010000000000000001"
pg_archivecleanup: removing file "/postgres-wal-dir/000000010000000000000002"
cp: cannot stat '/postgres-wal-dir/000000010000000000000004': No such file or directory
cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory
..

The standby server has applied the WAL logs until “000000010000000000000002”, and has been periodically polling the WAL archive directory for the next WAL. When we shutdown the primary server, it wrote the current WAL file from memory to disk; it was then picked up and applied onto the standby server.

The standby server then archives the older WALs using the archive_cleanup_command defined in recovery.conf.

Now, if we check the standby’s default postgres database, we should be able to see the changes we made earlier in the previous step:

$ docker exec -it postgres-stage-3_db-standby_1 /bin/bash -c 'psql -U postgres postgres'
psql (11.2 (Debian 11.2-1.pgdg90+1))
Type "help" for help.

postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------+-------+----------
public | my_table | table | postgres
(1 row)

postgres=# SELECT * FROM my_table;
col1 | col2
------+-----------------
row1 | this is row one
row2 | this is row2
(2 rows)

At this stage, our application is now left without a database to persist its state. We can either recover the primary database, or promote the standby.

Step 8a: Recover the Primary Database

$ docker-compose -f docker-compose-primary.yml up -d

The primary server is brought back online and nothing has changed.

Step 8b: Promote the Standby Postgres Server to be the new Primary 

To promote the standby, we can either use the presence of a trigger file (trigger_file specified in recovery.conf), or the pg_ctl promote command.

Since we did not define a trigger_file, I’ll use the second method here:

$ docker exec -it postgres-stage-3_db-standby_1 /bin/bash -c 'pg_ctl promote'
waiting for server to promote.... done
server promoted

If we check the logs of the standby server, we can see the following:

$ docker logs postgres-stage-3_db-standby_1
cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory
cp: cannot stat '/postgres-wal-dir/000000010000000000000005': No such file or directory
2019-06-02 00:29:53.722 UTC [11] LOG: received promote request
2019-06-02 00:29:53.723 UTC [11] LOG: redo done at 0/40000D0
2019-06-02 00:29:53.723 UTC [11] LOG: last completed transaction was at log time 2019-06-02 00:05:24.20666+00
2019-06-02 00:29:53.751 UTC [11] LOG: restored log file "000000010000000000000004" from archive
cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory
2019-06-02 00:29:53.802 UTC [11] LOG: selected new timeline ID: 2
2019-06-02 00:29:53.860 UTC [11] LOG: archive recovery complete
cp: cannot stat '/postgres-wal-dir/00000001.history': No such file or directory
2019-06-02 00:29:53.871 UTC [1] LOG: database system is ready to accept connections

Congratulations – the Standby PostgreSQL Server has now taken over as the Primary PostgreSQL Server.

Improvements

I found the creation of the initial base backup and recovery.conf to be rather un-Docker friendly. However, the whole process can be automated via a shell script that will improve the sysadmin/devops UX.

In addition, the promotion of the standby server is rather manual – there are two possible ways I can think of to automate this without the use of complex software:

  1. On the Standby PostgreSQL Server, create a simple bash script that polls the Primary PostgreSQL Server’s heartbeat. When it has been unsuccessful for X number of times, promote the standby.
  2. From the primary server, configure to start the PostgreSQL Server via a shell script that will touch the standby’s trigger_file upon exiting with a non-zero status code. When the Standby PostgreSQL Server sees this, it will automatically promote itself to be the new primary.

Summary

In this post, I walked through the steps to create a primary and warm standby Postgres servers (using WAL Log Shipping) within Docker. After they have been set up, I did the following steps:

  • made changes to the primary server,
  • simulated a primary server crash,
  • observed standby server being updated with the primary’s WALs, and
  • promoted the standby server to be the new primary

There are many other config values that can be customized – some examples include how often a new WAL should be created, the max size of a WAL before a new one is created, the number of WALs to keep, and the level of information within the WALs. These additional configs can help reduce the negative impact of a primary database server crash.

The advantage of using WAL Log Shipping is to help reduce the amount of time needed to start the standby server. In addition, there is no direct connection needed between the primary and secondary servers – an intermediate WAL archive can be used (useful when security policy does not allow direct connections).

Lastly, I was able to understand this topic better thanks to the following articles:

Thanks for reading until the end – feel free to leave a comment if you require additional clarification.

In my next post of this series, I will explore a “warmer” setup – using WAL Log Streaming! 🙂

2 thoughts on “PostgreSQL 11 Warm Standby Failover (Log Shipping) for High Availability in Docker

  1. Couple of followup questions —
    1. “I will assume that an appropriate file storage has been mounted onto the host at /volumes/postgres-data-dir(primary) and /volumes/postgres-data-two-dir (standby)” — If these volumes were on a NAS which mounted onto the host — Could these be smb or nfs mounts?
    2. Could the primary and recovery postgres docker instances be only two different VM’s or two different physical machines?

    Like

    1. Hi kevdog

      1. As long as Docker is able to mount and read/write to these volumes, I don’t think there will be any issues

      2. The way I have done it in my post requires both of them to be on the same host. But I think you might be on to something if you can share a mount point between the two hosts. For e.g., the WAL Archive can be on a NAS that is accessible to both hosts running the primary and recovery containers. Off the top of my head, do look out for case sensitive naming conventions and permission issues when reading/writing to the shared mount point.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s