This post is part of a series of PostgreSQL Standby Failover in Docker:
- Cold Start Failover
- Warm Standby Failover (Log Shipping)
- Warm Standby Failover (Asynchronous Streaming Replication)
- Warm Standby Failover (Synchronous Streaming Replication)
The PostgreSQL documentation has a high level overview on how to set up various failover, replication, and load balancing solutions (link). I found the documentation rather helpful and wanted to see if I can implement the same solutions with a Docker twist.
In case you have not read my earlier post about Postgres cold start failover, I am using the following setup:
- CentOS 7 (minimal installation)
- Docker Community Edition 18.06.1-ce
- Docker Compose 1.24.0
- PostgreSQL 11
Note: This post assumes setup on a private lab environment, where most of the heavy lifting would need to be done by the reader. If this is done on a SaaS provider, there might exist solutions that would do the job for us, such as via AWS Lambda and AWS Step Functions, or even full services like Azure Database for PostgreSQL.
Solution #3 – Write-Ahead Log (WAL) Shipping
In Solution #2 (of the previous post), when the primary server fails, the failover process included starting the cold standby server. Depending on the complexity of the database, the startup duration can be non-trivial and may be unacceptable for some applications.
To reduce the failover time, we can run the standby server’s process concurrently with the primary server to allow it to quickly take over when the primary has failed (i.e. a standby server). However, because the primary process is running, we cannot use file system (block device) replication as the two processes would cause write-write conflicts on the database server files.
Postgres offers several solutions for this problem, and depending on the type of solution, the standby Postgres server can be considered as a warm or hot standby server. In this post, I focus on the WAL Log Shipping approach, whereby the standby is considered warm.
Architecture and Setup
The setup would look something like this:
PostgreSQL Primary Server ---> WAL Archive <--- PostgreSQL Standby Server
The primary server will run with
archive_mode='on' – what this means is that the server will use the command defined by
archive_command to update the WAL Archive with new WAL files.
The standby server will run in recovery mode until it is promoted via the
pg_ctl command, or when the
trigger_file path is present. The presence of the
recovery.conf file within the main postgresql folder will indicate that the standby server is running in recovery mode.
For this post’s implementation purposes, I will assume that an appropriate file storage has been mounted onto the host at
/volumes/postgres-data-dir (primary) and
/volumes/postgres-data-dir-two (standby), and that the folders are empty to begin with. When running a swarm cluster with multiple physical nodes, these mount points should be identical on every host, as the server processes may sit on different nodes.
My Docker Compose .yml files are as follows:
Primary PostgreSQL Server
[~/postgres-stage-3]$ cat docker-compose-primary.yml version: "3.5" services: db-primary: image: postgres:latest restart: on-failure user: '999' command: ["postgres", "-carchive_mode=on", "-cwal_level=replica", "-carchive_command=test ! -f /postgres-wal-dir/%f && cp %p /postgres-wal-dir/%f"] networks: - postgres-db-network volumes: - type: bind source: /volumes/postgres-data-dir target: /postgres-data-dir - type: bind source: /volumes/postgres-wal-dir target: /postgres-wal-dir environment: POSTGRES_PASSWORD: my-secure-password PGDATA: /postgres-data-dir networks: postgres-db-network: driver: overlay name: app-db-network attachable: true
Do note the arguments passed to the postgres command:
- archive_mode – enables archiving
- wal_level – defines the level of details within the WAL
- archive_command – command to use to archive a logfile segment
Standby PostgreSQL Server
[~/postgres-stage-3]$ cat docker-compose-standby.yml version: "3.5" services: db-standby: image: postgres:latest restart: on-failure user: '999' command: ["postgres"] networks: - postgres-db-network volumes: - type: bind source: /volumes/postgres-data-dir-two target: /postgres-data-dir - type: bind source: /volumes/postgres-wal-dir target: /postgres-wal-dir environment: POSTGRES_PASSWORD: my-secure-password PGDATA: /postgres-data-dir networks: postgres-db-network: driver: overlay name: app-db-network attachable: true
Note that because my project folder’s name is “postgres-stage-3”, therefore my containers will have it as a prefix.
Prepare and Test the PostgreSQL Warm Standby Server
Step 1: Start the Primary PostgreSQL Server
$ docker-compose -f docker-compose-primary.yml up -d Creating postgres-stage-3_db-primary_1 ... done
Ensure that the database has been created successfully:
$ docker logs postgres-stage-3_db-primary_1 The files belonging to this database system will be owned by user "postgres". This user must also own the server process. ... PostgreSQL init process complete; ready for start up. 2019-06-01 23:49:56.232 UTC  LOG: listening on IPv4 address "0.0.0.0", port 5432 2019-06-01 23:49:56.232 UTC  LOG: listening on IPv6 address "::", port 5432 2019-06-01 23:49:56.235 UTC  LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2019-06-01 23:49:56.250 UTC  LOG: database system was shut down at 2019-06-01 23:49:56 UTC 2019-06-01 23:49:56.255 UTC  LOG: database system is ready to accept connections
Step 2: Take a Base Backup of Primary PostgreSQL Server
There are several ways to do it, but essentially what we want to do is to take a base backup of the Primary PostgreSQL Server, and use it as a starting point for our Standby PostgreSQL Server to start from.
Here, I use the
pg_basebackup client application to make “a binary copy of the database cluster files, while making sure the system is put in and out of backup mode automatically.” I use reused the WAL archive directory as the output folder for the backup.
$ docker exec -it postgres-stage-3_db-primary_1 /bin/bash -c 'pg_basebackup -h localhost -p 5432 -U postgres -D /postgres-wal-dir/base_backup'
pg_basebackup would create the destination folder for you if it does not exist. However, if it exists and is not empty, you will see the following error message:
- pg_basebackup: directory “/postgres-wal-dir/base_backup” exists but is not empty
Step 3: Copy the Primary’s Base Backup into the Standby’s Data Directory
Make sure that your current user has r/w access to the folders
$ sudo cp -rp /volumes/postgres-wal-dir/base_backup/* /volumes/postgres-data-dir-two/
Step 4: Create the recovery.conf and put it into the Standby PostgresSQL Server Folder
Make sure your current user has write permissions to the intended postgres data folder, and that the
recovery.conf file belongs to the postgres user (container default postgres UID/GID is 999)
$ sudo vi /volumes/postgres-data-dir-two/recovery.conf standby_mode = 'on' restore_command = 'cp /postgres-wal-dir/%f %p' archive_cleanup_command = 'pg_archivecleanup -d /postgres-wal-dir %r' recovery_target_timeline = 'latest' $ sudo chown 999:999 /volumes/postgres-data-dir-two/recovery.conf $ sudo chmod 600 /volumes/postgres-data-dir-two/recovery.conf
Step 5: Start the Standby PostgreSQL Server
$ docker-compose -f docker-compose-standby.yml up -d
To explain what is happening, we can omit the “-d” option to see the logs:
$ docker-compose -f docker-compose-standby.yml up Creating postgres-stage-3_db-standby_1 ... done Attaching to postgres-stage-3_db-standby_1 db-standby_1 | 2019-06-01 23:56:28.125 UTC  LOG: listening on IPv4 address "0.0.0.0", port 5432 db-standby_1 | 2019-06-01 23:56:28.125 UTC  LOG: listening on IPv6 address "::", port 5432 db-standby_1 | 2019-06-01 23:56:28.129 UTC  LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" db-standby_1 | 2019-06-01 23:56:28.144 UTC  LOG: database system was interrupted; last known up at 2019-06-01 23:53:29 UTC db-standby_1 | cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory db-standby_1 | 2019-06-01 23:56:28.389 UTC  LOG: entering standby mode db-standby_1 | 2019-06-01 23:56:28.399 UTC  LOG: restored log file "000000010000000000000002" from archive db-standby_1 | 2019-06-01 23:56:28.445 UTC  LOG: redo starts at 0/2000028 db-standby_1 | 2019-06-01 23:56:28.448 UTC  LOG: consistent recovery state reached at 0/2000130 db-standby_1 | 2019-06-01 23:56:28.449 UTC  LOG: database system is ready to accept read only connections db-standby_1 | cp: cannot stat '/postgres-wal-dir/000000010000000000000003': No such file or directory db-standby_1 | cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory db-standby_1 | cp: cannot stat '/postgres-wal-dir/000000010000000000000003': No such file or directory db-standby_1 | cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory ...
The line “consistent recovery state reached at 0/2000130” indicates that the Standby PostgreSQL Server has:
- started successfully,
- restored all the outstanding WALs in the WAL archive directory (filename: 000000010000000000000002), and
- will continue to poll and restore new WALs until the server is stopped, or has been promoted to become the primary server
Step 6: Make changes to Primary PostgreSQL Server database
In this step, I will make some changes to the default postgres database, namely:
- Create a new table
- Add two rows into the new table
$ docker exec -it postgres-stage-3_db-primary_1 /bin/bash -c 'psql -U postgres postgres' psql (11.2 (Debian 11.2-1.pgdg90+1)) Type "help" for help. postgres=# CREATE TABLE my_table(col1 varchar, col2 varchar); CREATE TABLE postgres=# INSERT INTO my_table VALUES ('row1', 'this is row one'), ('row2', 'this is row2'); INSERT 0 2 postgres=# \q
Step 7: Check the Standby PostgreSQL Server (and understand why its a warm standby setup)
$ docker exec -it postgres-stage-3_db-standby_1 /bin/bash -c 'psql -U postgres postgres' psql (11.2 (Debian 11.2-1.pgdg90+1)) Type "help" for help. postgres=# \dt Did not find any relations.
Here, we understand why the WAL Log Shipping approach is considered as a Warm Standby: changes in the primary database are not immediately reflected in the standby database. The standby is only updated when a new WAL is received in the WAL archive directory.
In the event of a crash of the primary server, the standby server will be behind by one WAL file. This may be acceptable for applications that are not critical.
Based on the default config values, a new WAL file will only be written when the existing WAL exceeds 16MB. However, we can speed up the process by shutting down the primary server:
$ docker-compose -f docker-compose-primary.yml down -d Stopping postgres-stage-3_db-primary_1 ... done Removing postgres-stage-3_db-primary_1 ... done Removing network app-db-network ... done
If we check the logs of the standby, we will see something similar to:
$ docker logs postgres-stage-3_db-standby_1 ... cp: cannot stat '/postgres-wal-dir/000000010000000000000003': No such file or directory cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory 2019-06-02 00:14:40.119 UTC  LOG: restored log file "000000010000000000000003" from archive cp: cannot stat '/postgres-wal-dir/000000010000000000000004': No such file or directory cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory pg_archivecleanup: keeping WAL file "/postgres-wal-dir/000000010000000000000003" and later pg_archivecleanup: removing file "/postgres-wal-dir/000000010000000000000001" pg_archivecleanup: removing file "/postgres-wal-dir/000000010000000000000002" cp: cannot stat '/postgres-wal-dir/000000010000000000000004': No such file or directory cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory ..
The standby server has applied the WAL logs until “000000010000000000000002”, and has been periodically polling the WAL archive directory for the next WAL. When we shutdown the primary server, it wrote the current WAL file from memory to disk; it was then picked up and applied onto the standby server.
The standby server then archives the older WALs using the
archive_cleanup_command defined in
Now, if we check the standby’s default postgres database, we should be able to see the changes we made earlier in the previous step:
$ docker exec -it postgres-stage-3_db-standby_1 /bin/bash -c 'psql -U postgres postgres' psql (11.2 (Debian 11.2-1.pgdg90+1)) Type "help" for help. postgres=# \dt List of relations Schema | Name | Type | Owner --------+----------+-------+---------- public | my_table | table | postgres (1 row) postgres=# SELECT * FROM my_table; col1 | col2 ------+----------------- row1 | this is row one row2 | this is row2 (2 rows)
At this stage, our application is now left without a database to persist its state. We can either recover the primary database, or promote the standby.
Step 8a: Recover the Primary Database
$ docker-compose -f docker-compose-primary.yml up -d
The primary server is brought back online and nothing has changed.
Step 8b: Promote the Standby Postgres Server to be the new Primary
To promote the standby, we can either use the presence of a trigger file (
trigger_file specified in
recovery.conf), or the
pg_ctl promote command.
Since we did not define a
trigger_file, I’ll use the second method here:
$ docker exec -it postgres-stage-3_db-standby_1 /bin/bash -c 'pg_ctl promote' waiting for server to promote.... done server promoted
If we check the logs of the standby server, we can see the following:
$ docker logs postgres-stage-3_db-standby_1 cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory cp: cannot stat '/postgres-wal-dir/000000010000000000000005': No such file or directory 2019-06-02 00:29:53.722 UTC  LOG: received promote request 2019-06-02 00:29:53.723 UTC  LOG: redo done at 0/40000D0 2019-06-02 00:29:53.723 UTC  LOG: last completed transaction was at log time 2019-06-02 00:05:24.20666+00 2019-06-02 00:29:53.751 UTC  LOG: restored log file "000000010000000000000004" from archive cp: cannot stat '/postgres-wal-dir/00000002.history': No such file or directory 2019-06-02 00:29:53.802 UTC  LOG: selected new timeline ID: 2 2019-06-02 00:29:53.860 UTC  LOG: archive recovery complete cp: cannot stat '/postgres-wal-dir/00000001.history': No such file or directory 2019-06-02 00:29:53.871 UTC  LOG: database system is ready to accept connections
Congratulations – the Standby PostgreSQL Server has now taken over as the Primary PostgreSQL Server.
I found the creation of the initial base backup and
recovery.conf to be rather un-Docker friendly. However, the whole process can be automated via a shell script that will improve the sysadmin/devops UX.
In addition, the promotion of the standby server is rather manual – there are two possible ways I can think of to automate this without the use of complex software:
- On the Standby PostgreSQL Server, create a simple bash script that polls the Primary PostgreSQL Server’s heartbeat. When it has been unsuccessful for X number of times, promote the standby.
- From the primary server, configure to start the PostgreSQL Server via a shell script that will
trigger_fileupon exiting with a non-zero status code. When the Standby PostgreSQL Server sees this, it will automatically promote itself to be the new primary.
In this post, I walked through the steps to create a primary and warm standby Postgres servers (using WAL Log Shipping) within Docker. After they have been set up, I did the following steps:
- made changes to the primary server,
- simulated a primary server crash,
- observed standby server being updated with the primary’s WALs, and
- promoted the standby server to be the new primary
There are many other config values that can be customized – some examples include how often a new WAL should be created, the max size of a WAL before a new one is created, the number of WALs to keep, and the level of information within the WALs. These additional configs can help reduce the negative impact of a primary database server crash.
The advantage of using WAL Log Shipping is to help reduce the amount of time needed to start the standby server. In addition, there is no direct connection needed between the primary and secondary servers – an intermediate WAL archive can be used (useful when security policy does not allow direct connections).
Lastly, I was able to understand this topic better thanks to the following articles:
- Official Postgres 11 Documentation
Thanks for reading until the end – feel free to leave a comment if you require additional clarification.
In my next post of this series, I will explore a “warmer” setup – using WAL Log Streaming! 🙂