Backup Before Modifying a Production AWS RDS Database Managed by Terraform

Given time, it is inevitable that modifications will need to be made to production RDS databases. Periodic changes to production cloud resources should be expected as the cloud offers elasticity to scale in/out with demands. Although some changes are riskier than others, the AWS RDS processes for applying (and rolling back) these changes have been battle-tested. Despite this, it is always good for organizations to have their own backup and restore strategies before riskier changes are applied – after all, the data does belong to them.

RDS has the functionality to create DB Snapshots from running database instances. Unfortunately, it is not always point-and-click in the AWS Console – especially in production environments. Many organizations use some sort of Infrastructure as Code (IaC) tool to manage their production resources. This layer of encapsulation has added complexity considerations when deciding to do backups of production AWS RDS instances.

In this post, I’ll propose several methods to backup production AWS RDS databases that are managed via Terraform, as well as their considerations.

Method #1 – Using Automated Daily Backups and Modifying with Maintenance Window

With some smart scheduling of the backup and maintenance windows, it is possible to leverage the RDS automatic daily backup feature to do a snapshot right before changes are applied during the maintenance window. It is important to set the windows to coincide with the database’s off-peak hours (and to check periodically for changes in peak hours), otherwise there may be service outages when the backup or modification takes the database offline.

resource "aws_db_instance" "my-database" {
  ... 
  backup_window      = "00:00-03:00"
  maintenance_window = "sat:03:01-sat:06:00"
}

As the values are defined for UTC, do remember to factor in the time difference between your local time zone.

To backup before modifying the database instance: make changes to the terraform resource block and run terraform apply. Do not set the apply_immediately flag (it will default to false). On the day of week of maintenance as defined above, RDS will do an automated backup during the backup_window, and apply the modifications during the maintenance_window.

Advantages:

  • Less complexity via a hands-off approach to backup database and apply changes
  • Able to do schedule during off-peak hours with no manual operator step

Disadvantages:

  • Database’s off-peak hours can change over time – additional overhead to periodically check that the windows still coincide with the lull periods
  • Application of changes will not be immediate – may need to wait up to a week
  • Need on-call engineer for off hours escalation

Method #2 – Using the Existing Code Repository to Make a Backup and Modifying with Apply Immediately Flag

Using the same Terraform repository where the source database is defined, create an aws_db_snapshot resource (much like the example from the official repository).

To backup before modifying the database instance: declare an explicit dependency to ensure that the snapshot resource is evaluated/created before the source database resource. To ensure minimal gap between the backup and the database modification, set the apply_immediately flag to true.

Remember to unset the apply_immediately flag, or other engineers working on the repo may unexpectedly bring the database down with a terraform apply on their changes.

Advantages:

  • No added management overheads as everything uses the same repository
  • Will not risk using the wrong/invalid source database identifier as we can reference using the resource type (i.e. aws_db_instance.<RESOURCE_NAME>.id)

Disadvantages:

  • Not intuitive to manage different lifecycles of snapshot and source database resources in the same repo (e.g. if the backup needs to persist longer than the source database upon project completion)
  • Additional step to toggle the explicit dependency between different terraform apply executions
    • For example, on the first run, the snapshot should either not be created, or created after the source database is initialized
    • However, on subsequent runs, the snapshot should be created before any modifications are done to the source database instance. An explicit dependency needs to be defined only for subsequent runs, and new engineers working on the project need to be made aware of this intricacy.
  • If a rollback is required, there will be downtime to restore from the snapshot
  • If terraform destroy is run (e.g. end of project lifespan), it will remove the snapshot resource as well. This may not be ideal for post project data retention, especially if final snapshot is disabled on the source database.

Method #3 – Creating a Separate Code Repository to Manage Backups

Since a database snapshot may persist longer than its source, it might be logical to manage it outside the code repository that contains the database. For example, the manual snapshot resource in another repository would look something like this:

resource "aws_db_snapshot" "latest-sales-prod-snapshot" {
  db_instance_identifier = "${local.sales_prod_name}"
  db_snapshot_identifier = "${local.sales_prod_name}-${local.snapshot_timestamp}"
}
locals {
  sales_prod_name    = "sales-db-prod"
  snapshot_timestamp = "${replace(timestamp(), ":", "-")}"
}

In the above example, the snapshot will always be recreated due to the timestamp in db_snapshot_identifier. To keep several manual snapshots, define separate terraform resources with a static db_snapshot_identifier.

To backup before modifying the database instance: execute terraform apply in the backup repository first, before doing it on the source database repository. To ensure minimal gap between the backup and the modification, run the commands in quick succession, with the the apply_immediately flag set to true for the source database.

Remember to unset the apply_immediately flag, or other engineers working on the repo may unexpectedly bring the database down with a terraform apply on their changes.

Advantages:

  • Snapshot Terraform resource is managed outside the terraform lifecycle of the source database, reducing complexity in managing dependencies
  • Accidental terraform destroy will not remove manual backups – ideal for disaster recovery plans

Disadvantages:

  • More management overheads arising from an additional code repository
  • Can lead to forgotten snapshots (and unnecessary costs) if source database is removed without knowledge of the manual snapshots taken
  • When dealing with multiple databases in the same account, a bad case of copy-pasta may create a snapshot using the wrong/invalid source database
  • If a rollback is required, there will be downtime to restore from the snapshot

Method #4 – Using and Promoting Database Replicas

By using a replica RDS instance, we can achieve something similar to blue green deployments. Though we need to ensure that the modification is compatible with this approach. For example, upgrading the major engine version may not be compatible as depending on your database engine, it will either break the replication, or require that the upgrade be applied to all the database instances.

Assuming your change is compatible with this approach, you can set up a RDS replica by adapting this example Terraform code. Next, apply the change to the source database instance and observe for any negative impact. In the event that the change is incompatible, simply promote the replica and update your application configuration and/or database endpoint.

Advantages:

  • If a rollback is required, the downtime will be minimum as there is a hot standby

Disadvantages:

  • Not all changes are suitable for this approach
  • More management overheads arising from managing the replica database instances
  • Higher operating costs

Summary

There are multiple solutions to do a backup before modifying a RDS database – and this is not an exhaustive list. Two key factors that affect your decision are 1) the timeliness of the change, and 2) how much added overheads/complexity the team is willing to take on with the implemented solution.

Do share in the comments if you have other methods to back up production RDS databases 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s