How Amazon Aurora’s Architecture Solves Scale Out Bottlenecks of Traditional Relational Databases

Note: This is from a presentation I gave earlier year.

Amazon Aurora – A Primer

Before we dive into the details, let’s see what the official documentation says about Amazon Aurora:

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud, that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.

From https://aws.amazon.com/rds/aurora/

Its quite a mouthful, so let’s distill it:

  • Amazon Aurora is a proprietary cloud native relational database offering that is scalable, highly available, and cost-effective
  • It has two “flavors”, namely Aurora MySQL and Aurora PostgreSQL

Why Amazon Aurora?

AWS Relational Database Service (RDS) launched its MySQL and PostgreSQL offerings in 2009 and 2013 respectively. These are great products that bolstered the AWS platform with managed database offerings, allowing users to quickly spin up databases without installation and administration overheads. Five years later, RDS launched Aurora MySQL, followed by Aurora PostgreSQL soon after.

The casual observer would then wonder – if there are existing RDS offerings for both MySQL and PostgreSQL, why would Amazon Aurora be introduced? To understand its unique selling point, and claims for scalability and cost effectiveness, we need to look at how traditional relational databases handle scaling out.

Scaling Out Traditional Databases

Traditional Database Architectures

One of the key benefits why organizations and enterprises move to the cloud is for its scalability. However, traditional relational database architecture was not designed for the cloud – they were meant to be run as standalone processes on bare metal, (usually) given the full set of available resources on the host.

This is not a design flaw – when Oracle v2 and MySQL were released in 1979 and 1995 respectively, the concept of cloud and virtualization were not as popular as they are now. 

In architecture diagrams, databases are often over simplified as dull grey cylinders with the letters “DB” on them (in case the viewer assumed they were cans of soda). However, databases are actually very complex systems that comprise of many components – such as the query parser and optimization, session management, caches, indexes, logical and physical data representation. 

When talking about scaling up, we concern ourselves over the resources given to the database’s internal processes (e.g. mysqld, postmaster, caches). When talking about scaling out (the focus of this post), we are concerned over the replication of the physical data on disk.

The act of scaling out a traditional relational database is a moderately complex task. Fortunately, current relational database systems have gotten rather good at that, providing a variety of options for keeping a database and its replicas in sync. Some examples are block level replication, log shipping, asynchronous and synchronous log streaming. In fact, I have previously written several posts about PostgreSQL replication in Docker.

Scaling Out Requirements

Scaling out a database is not done in a vacuum – it is usually to meet some business need. The most common reason being disaster recovery and better read and write performance. Some other possibilities are to execute long running OLAP (Online Analytical Processing) transactions for management reporting, and database patching and/or upgrades) 

The technical complexity of scaling out comes from two sources:

  1. The limitations imposed by business requirements, and 
  2. Maintaining the “C” in its ACID properties (Atomicity, Consistency, Isolation, Durability). 

And depending on the business requirements, the scale out architecture can become very complex. Take disaster recovery for example, we have to worry ourselves over: 

  • Expected failover duration (changes with opportunity cost)
  • Hotness of the standby 
  • Replication delay/lag between instances (especially when using asynchronous streaming) 
  • State of replica(s)
  • Is a single secondary database sufficient, or should we have more?
  • How should multiple secondary databases be arranged? Some possibilities are:
    • Daisy chained for cascading failovers,
    • Clustered for minimal replication lag, or
    • Cascading clustered failovers?!

Bottlenecks of Scaling Out

There are some bottlenecks to scaling out a database, including:

  • Compute and Disk I/O on secondary database
  • Limited throughput between primary and secondary databases (i.e. replication lag)
  • Copying data to new secondary takes time, which leaves new transcations in primary vulnerable

Scaling up the database instances and/or using a low latency network can alleviate most of these issues. However, the risk of a primary database failure while syncing/setting up the secondary is still not addressed. This is one key issue that has consistently bugged traditional databases in the cloud.

In fact, if we look at the RDS documentation, it states that:

When you create a read replica, Amazon RDS takes a DB snapshot of your source DB instance and begins replication. As a result, you experience a brief I/O suspension on your source DB instance while the DB snapshot occurs. The I/O suspension typically lasts about one minute.

From AWS RDS Documentation

This brief I/O suspension may not be acceptable for critical applications, such as an e-commerce or social media website.

The “Secret” Sauce of Amazon Aurora

At the start of this post, I mentioned that Amazon Aurora is a cloud native database. This means that Amazon Aurora has been re-architected with the intention of being run in the cloud, including natively leveraging on existing and future cloud infrastructure and services, as well as taking advantage of the benefits when running workloads in the cloud.

Architecture of Amazon Aurora

Simplified Overview of Amazon Aurora from AWS Documentation

Amazon Aurora’s architecture (specifically its purpose built distributed storage layer) is able to overcome the issue of scaling out the storage layer of a traditional relational database.

In traditional databases, when the database receive a DML, the redo log is written, before the database page in main memory is modified before being flushed to disk. During a scale out, the primary database state is dumped and used as a starting point for creating new secondary databases. To avoid dumping out a corrupted database state, the database I/O has to be paused until the process is completed.

In Amazon Aurora, the primary database instance simply passes the redo log to the storage layer for processing. The storage layer is then responsible for processing the logs, creating and storing new page versions, and backing everything up to S3. During a scale out, Amazon Aurora only needs to create more database engine instances and connect them to the existing storage layer. The bottleneck of data replication is removed by delegating these tasks to the storage layer for parallelly processing using its own resources.

This approach also allows the creation of several new features, such as:

  • Instant crash recovery: No need to replay logs since the last checkpoint
  • Fast failovers: No need to worry about which replica database has the latest redo log record as storage layer takes care of that 
  • Backtracking: Since storage layer has stream of redo logs, it is able to “rewind” the database to specific points in time without the need to restore a checkpoint from the S3 backup

I have only managed to scratch the surface of Amazon Aurora’s architecture and its benefits. For more detailed coverage, I recommend this AWS re:Invent 2019 talk about Amazon Aurora storage.

When Should I Choose Amazon Aurora?

Does your project need to: 

  • test against latest nightly/stable builds
  • use less popular database extensions
  • have extremely fine grained control of database configurations

Then Amazon Aurora may not be right for you. However, each project is unique and individual assessment is needed as there is no one size fits all recommendation.

Other database alternatives you can explore are:

  • Self managed on physical hardware (e.g. laptop, workstation, rack)
  • Self managed on EC2 
  • Managed RDS MySQL / RDS PostgreSQL 
  • Managed RDS Aurora 
  • Third party managed database

Summary

Amazon Aurora’s purpose built storage layer has managed to overcome the scaling out issues that plague traditional databases in the cloud. However, it best serves the niche of operating production workloads at cloud scale.

Every project’s objectives and requirements are different, and the team should always choose the right database solutions that are minimally aligned with these factors.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s