Explore

Operational DBs

Amazon Aurora

Aurora Global Database

Multi-Master

Aurora Serverless v2

Backup and Restore

Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL. You already know how MySQL and PostgreSQL combine the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. The code, tools, and applications you use today with your existing MySQL and PostgreSQL databases can be used with Aurora. With some workloads, Aurora can deliver up to five times the throughput of MySQL and up to three times the throughput of PostgreSQL without requiring changes to most of your existing applications.

Aurora includes a high-performance storage subsystem. Its MySQL- and PostgreSQL-compatible database engines are customized to take advantage of that fast distributed storage. The underlying storage grows automatically as needed. An Aurora cluster volume can grow to a maximum size of 128 tebibytes (TiB). Aurora also automates and standardizes database clustering and replication, which are typically among the most challenging aspects of database configuration and administration.

Aurora is part of the managed database service Amazon Relational Database Service (Amazon RDS). Amazon RDS makes it easier to set up, operate, and scale a relational database in the cloud.

Amazon Aurora DB clusters

An Amazon Aurora DB cluster consists of one or more DB instances and a cluster volume that manages the data for those DB instances. An Aurora cluster volume is a virtual database storage volume that spans multiple Availability Zones, with each Availability Zone having a copy of the DB cluster data.

Two types of DB instances make up an Aurora DB cluster:

Primary DB instance – Supports read and write operations, and performs all of the data modifications to the cluster volume. Each Aurora DB cluster has one primary DB instance.

Aurora Replica – Connects to the same storage volume as the primary DB instance and supports only read operations. Each Aurora DB cluster can have up to 15 Aurora Replicas in addition to the primary DB instance. Maintain high availability by locating Aurora Replicas in separate Availability Zones. Aurora automatically fails over to an Aurora Replica in case the primary DB instance becomes unavailable. You can specify the failover priority for Aurora Replicas. Aurora Replicas can also offload read workloads from the primary DB instance.

The following diagram illustrates the relationship between the cluster volume, the primary DB instance, and Aurora Replicas in an Aurora DB cluster.

⁠

Architecture

An Aurora cluster consists of a set of compute (database) nodes and a shared storage volume.

The storage volume consists of six storage nodes placed in three Availability Zones for high availability and durability of user data.

Every database node in the cluster is a writer node that can run read and write statements.

There is no single point of failure in the cluster.

Applications can use any writer node for their read/write and DDL needs.

A database change made by a writer node is written to six storage nodes in three Availability Zones, providing data durability and resiliency against storage node and Availability Zone failures.

The writer nodes are all functionally equal, and a failure of one writer node does not affect the availability of the other writer nodes in the cluster.

⁠

Supported features in Amazon Aurora by AWS Region and Aurora DB engine

Aurora MySQL- and PostgreSQL-compatible database engines support several Amazon Aurora and Amazon RDS features and options. The support varies across specific versions of each database engine, and across AWS Regions.

Some of these features are Aurora-only capabilities. For example, Aurora Serverless, Aurora global databases, and support for integration with AWS machine learning services aren't supported by Amazon RDS. Other features, such as Amazon RDS Proxy, are supported by both Amazon Aurora and Amazon RDS.

Supported Regions and DB engines

⁠

Blue/Green Deployments⁠

: A blue/green deployment copies a production database environment in a separate, synchronized staging environment. By using Amazon RDS Blue/Green Deployments, you can make changes to the database in the staging environment without affecting the production environment. For example, you can upgrade the major or minor DB engine version, change database parameters, or make schema changes in the staging environment. When you are ready, you can promote the staging environment to be the new production database environment.

⁠

Aurora cluster configurations⁠

: Amazon Aurora has two DB cluster storage configurations, Aurora I/O-Optimized and Aurora Standard.

⁠

Database activity streams⁠

: By using database activity streams in Aurora, you can monitor and set alarms for auditing activity in your Aurora database.

⁠

Exporting cluster data to Amazon S3⁠

: You can export Aurora DB cluster data to an Amazon S3 bucket. After the data is exported, you can analyze the exported data directly through tools like Amazon Athena or Amazon Redshift Spectrum.

⁠

Exporting snapshot data to Amazon S3⁠

: You can export Aurora DB cluster snapshot data to an Amazon S3 bucket. You can export manual snapshots and automated system snapshots. After the data is exported, you can analyze the exported data directly through tools like Amazon Athena or Amazon Redshift Spectrum.

⁠

Aurora global databases⁠

: An Aurora global database is a single database that spans multiple AWS Regions, enabling low-latency global reads and disaster recovery from any Region-wide outage. It provides built-in fault tolerance for your deployment because the DB instance relies not on a single AWS Region, but upon multiple Regions and different Availability Zones.

⁠

IAM database authentication⁠

: With IAM database authentication in Aurora, you can authenticate to your DB cluster using AWS Identity and Access Management (IAM) database authentication. With this authentication method, you don't need to use a password when you connect to a DB cluster. Instead, you use an authentication token.

⁠

Kerberos authentication⁠

: By using Kerberos authentication with Aurora, you can support external authentication of database users using Kerberos and Microsoft Active Directory. Using Kerberos and Active Directory provides the benefits of single sign-on and centralized authentication of database users. Kerberos and Active Directory are available with AWS Directory Service for Microsoft Active Directory, a feature of AWS Directory Service.

⁠

Aurora machine learning⁠

: By using Amazon Aurora machine learning, you can integrate your Aurora DB cluster with Amazon Comprehend or Amazon SageMaker, depending on your needs. Amazon Comprehend and SageMaker each support different machine learning use cases.

⁠

Performance Insights⁠

: Performance Insights expands on existing Amazon RDS monitoring features to illustrate and help you analyze your database performance. With the Performance Insights dashboard, you can visualize the database load on your Amazon RDS DB instance load and filter the load by waits, SQL statements, hosts, or users.

⁠

Zero-ETL integrations⁠

: Amazon Aurora zero-ETL integrations with Amazon Redshift is a fully managed solution for making transactional data available in Amazon Redshift after it's written to an Aurora cluster.

⁠

RDS Proxy⁠

: Amazon RDS Proxy is a fully managed, highly available database proxy that makes applications more scalable by pooling and sharing established database connections.

⁠

Secrets Manager integration⁠

: With AWS Secrets Manager, you can replace hard-coded credentials in your code, including database passwords, with an API call to Secrets Manager to retrieve the secret programmatically.

⁠

Aurora Serverless v2⁠

: Aurora Serverless v2 is an on-demand, auto-scaling feature designed to be a cost-effective approach to running intermittent or unpredictable workloads on Amazon Aurora. It automatically scales capacity up or down as needed by your applications. The scaling is faster and more granular than with Aurora Serverless v1. With Aurora Serverless v2, each cluster can contain a writer DB instance and multiple reader DB instances. You can combine Aurora Serverless v2 and traditional provisioned DB instances within the same cluster.

⁠

Aurora Serverless v1⁠

: Aurora Serverless v1 is an on-demand, auto-scaling feature designed to be a cost-effective approach to running intermittent or unpredictable workloads on Amazon Aurora. It automatically starts up, shuts down, and scales capacity up or down, as needed by your applications, using a single DB instance in each cluster.

⁠

RDS Data API⁠

: RDS Data API (Data API) provides a web-services interface to an Amazon Aurora DB cluster. Instead of managing database connections from client applications, you can run SQL commands against an HTTPS endpoint.

⁠

Zero-downtime patching (ZDP)⁠

: Performing upgrades for Aurora DB clusters involves the possibility of an outage when the database is shut down and while it's being upgraded. By default, if you start the upgrade while the database is busy, you lose all the connections and transactions that the DB cluster is processing. If you wait until the database is idle to perform the upgrade, you might have to wait a long time.

The zero-downtime patching (ZDP) feature attempts, on a best-effort basis, to preserve client connections through an Aurora upgrade. If ZDP completes successfully, application sessions are preserved and the database engine restarts while the upgrade is in progress. The database engine restart can cause a drop in throughput lasting for a few seconds to approximately one minute.

⁠

Engine-native features⁠

⁠

Amazon Aurora connection management

Amazon Aurora typically involves a cluster of DB instances instead of a single instance. Each connection is handled by a specific DB instance. When you connect to an Aurora cluster, the host name and port that you specify point to an intermediate handler called an endpoint. Aurora uses the endpoint mechanism to abstract these connections. Thus, you don't have to hardcode all the hostnames or write your own logic for load-balancing and rerouting connections when some DB instances aren't available.

For certain Aurora tasks, different instances or groups of instances perform different roles. For example, the primary instance handles all data definition language (DDL) and data manipulation language (DML) statements. Up to 15 Aurora Replicas handle read-only query traffic.

Using endpoints, you can map each connection to the appropriate instance or group of instances based on your use case. For example, to perform DDL statements you can connect to whichever instance is the primary instance. To perform queries, you can connect to the reader endpoint, with Aurora automatically performing load-balancing among all the Aurora Replicas. For clusters with DB instances of different capacities or configurations, you can connect to custom endpoints associated with different subsets of DB instances. For diagnosis or tuning, you can connect to a specific instance endpoint to examine details about a specific DB instance.

⁠

Replication

Replication with Aurora Replicas (Same Region)

When you create a second, third, and so on DB instance in an Aurora provisioned DB cluster, Aurora automatically sets up replication from the writer DB instance to all the other DB instances. These other DB instances are read-only and are known as Aurora Replicas. We also refer to them as reader instances when discussing the ways that you can combine writer and reader DB instances within a cluster.

Aurora Replicas have two main purposes.

You can issue queries to them to scale the read operations for your application. You typically do so by connecting to the reader endpoint of the cluster. That way, Aurora can spread the load for read-only connections across as many Aurora Replicas as you have in the cluster.

Aurora Replicas also help to increase availability. If the writer instance in a cluster becomes unavailable, Aurora automatically promotes one of the reader instances to take its place as the new writer.

An Aurora DB cluster can contain up to 15 Aurora Replicas. The Aurora Replicas can be distributed across the Availability Zones that a DB cluster spans within an AWS Region.

The data in your DB cluster has its own high availability and reliability features, independent of the DB instances in the cluster. If you aren't familiar with Aurora storage features, see

Overview of Amazon Aurora storage⁠

. The DB cluster volume is physically made up of multiple copies of the data for the DB cluster. The primary instance and the Aurora Replicas in the DB cluster all see the data in the cluster volume as a single logical volume.

Replication with Aurora MySQL (Cross Region)

In addition to Aurora Replicas, you have the following options for replication with Aurora MySQL:

Aurora MySQL DB clusters in different AWS Regions.

You can replicate data across multiple Regions by using an Aurora global database. For details, see

High availability across AWS Regions with Aurora global databases⁠

⁠

You can create an Aurora read replica of an Aurora MySQL DB cluster in a different AWS Region, by using MySQL binary log (binlog) replication. Each cluster can have up to five read replicas created this way, each in a different Region.

Two Aurora MySQL DB clusters in the same Region, by using MySQL binary log (binlog) replication.

An RDS for MySQL DB instance as the source of data and an Aurora MySQL DB cluster, by creating an Aurora read replica of an RDS for MySQL DB instance. Typically, you use this approach for migration to Aurora MySQL, rather than for ongoing replication.

Replication with Aurora PostgreSQL

In addition to Aurora Replicas, you have the following options for replication with Aurora PostgreSQL:

An Aurora primary DB cluster in one Region and up to five read-only secondary DB clusters in different Regions by using an Aurora global database. Aurora PostgreSQL doesn't support cross-Region Aurora Replicas. However, you can use Aurora global database to scale your Aurora PostgreSQL DB cluster's read capabilities to more than one AWS Region and to meet availability goals. For more information, see

Using Amazon Aurora global databases⁠

Two Aurora PostgreSQL DB clusters in the same Region, by using PostgreSQL's logical replication feature.

An RDS for PostgreSQL DB instance as the source of data and an Aurora PostgreSQL DB cluster, by creating an Aurora read replica of an RDS for PostgreSQL DB instance. Typically, you use this approach for migration to Aurora PostgreSQL, rather than for ongoing replication.

TL;DR;

There are two types of replication:

Aurora replica (up to 15)

MySQL Read Replica (up to 5).

The table below describes the differences between the two replica options:

replication difference

replication difference

Column 1

Column 2

Column 3

Feature

Aurora Replica

MySQL Replica

Number of replicas

Up to 15

Up to 5

Replication type

Asynchronous (milliseconds)

Asynchronous (seconds)

Performance impact on primary

Low

High

Replica location

In-region

Cross-region

Act as failover target

Yes (no data loss)

Yes (potentially minutes of data loss)

Automated failover

Yes

Support for user-defined replication delay

Yes

Support for different data or schema vs. primary

Yes

There are no rows in this table

⁠

Cross-region replicas with MySQL

Cross-region read replicas allow you to improve your disaster recovery posture, scale read operations in regions closer to your application users, and easily migrate from one region to another.

Cross-region replicas provide fast local reads to your users.

Each region can have an additional 15 Aurora replicas to further scale local reads.

You can choose between

Global Database⁠

, which provides the best replication performance, and traditional

binlog-based replication⁠

You can also set up your own binlog replication with external MySQL databases.

You can create read replicas of both encrypted and unencrypted DB clusters. The read replica must be encrypted if the source DB cluster is encrypted.

For each source DB cluster, you can have up to five cross-Region DB clusters that are read replicas.

The following diagram depicts the Cross-Region Read Replica topology:

⁠

Aurora Optimized Reads for Aurora PostgreSQL

⁠

Amazon Aurora Optimized Reads⁠

is a new price-performance capability that delivers up to 8x improved query latency and up to 30% cost savings compared to instances without it. It is ideal for applications with large datasets that exceed the memory capacity of a database instance.

Optimized Reads instances use local NVMe-based SSD block-level storage, available on Graviton-based r6gd and Intel-based r6id instances, to improve query latency of applications with data sets exceeding the memory capacity of a database instance. Optimized Reads include performance enhancements such as tiered caching and temporary objects to enable you to make the most of your database instances.

With up to 8x improved query latency, you can effectively run read-heavy, I/O-intensive workloads such as operational dashboards, anomaly detection, and similarity searches with pgvector. Amazon Aurora PostgreSQL Optimized Reads with

pgvector⁠

increases queries per second for vector search by up to 9x in workloads that exceed available instance memory. Optimized Reads is available for Aurora with PostgreSQL compatibility.

⁠

Aurora Parallel Query for Aurora MySQL

⁠

Amazon Aurora Parallel Query⁠

provides faster analytical queries compared to your current data. It can speed up queries by up to two orders of magnitude while maintaining high throughput for your core transaction workload. By pushing query processing down to the Aurora storage layer, it gains a large amount of computing power while reducing network traffic. Use Parallel Query to run transactional and analytical workloads alongside each other in the same Aurora database. Parallel Query is available for Aurora with MySQL compatibility.

With parallel query, you can run data-intensive analytic queries on Aurora MySQL tables. In many cases, you can get an order-of-magnitude performance improvement over the traditional division of labor for query processing.

Benefits of parallel query include the following:

Improved I/O performance, due to parallelizing physical read requests across multiple storage nodes.

Reduced network traffic. Aurora doesn't transmit entire data pages from storage nodes to the head node and then filter out unnecessary rows and columns afterward. Instead, Aurora transmits compact tuples containing only the column values needed for the result set.