Amazon DynamoDB
Amazon DynamoDB is a fully managed NoSQL database service which provides fast and predictable performance with seamless scalability, and it enables developers to build modern, serverless applications that can start small and scale globally to support petabytes of data and tens of millions of read and write requests per second. Amazon DynamoDB also offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data. DynamoDB is designed to run high performance, internet-scale applications that would overburden traditional relational databases.
- Amazon DynamoDB enables customers to create database tables that can store and retrieve any amount of data and serve any level of request traffic. They can scale up or down their tables’ throughput capacity without downtime or performance degradation.
Table of Contents
Amazon DynamoDB Features
Amazon DynamoDB is serverless there are no servers to provision, patch, or manage, and no software to install, maintain, or operate. Amazon DynamoDB automatically scales tables to adjust for capacity and maintains performance with zero administration. Amazon DynamoDB provides capacity modes for each table on-demand and provisioned:
- For workloads that are less predictable customers can use on-demand capacity mode. For tables using provisioned capacity, Amazon DynamoDB delivers automatic scaling of throughput and storage based on previously set capacity by monitoring the performance usage of the application.
- Tables using provisioned capacity mode require customers to set read and write capacity, and it is also more cost effective. For tables using on-demand capacity mode, Amazon DynamoDB instantly accommodates customers workloads as they ramp up or down to any previously reached traffic level.
- DynamoDB integrates with AWS Lambda to provide triggers. Using triggers, clients can automatically execute a custom function when item-level changes in a DynamoDB table are detected. With triggers, they can build applications that react to data modifications in Amazon DynamoDB tables.
Amazon DynamoDB is a key-value and document database that can support tables of virtually any size with horizontal scaling. This enables Amazon DynamoDB to scale to more than 10 trillion requests per day with peaks greater than 20 million requests per second, over petabytes of storage.
- Amazon DynamoDB supports both key-value and document data models. This enables DynamoDB to have a flexible schema, so each row can have any number of columns at any point in time. This allows customers to easily adapt the tables as their business requirements change, without having to redefine the table schema as they would in relational databases.
- Amazon DynamoDB Accelerator (DAX) is an in-memory cache that delivers fast read performance for users tables at scale by enabling them to use a fully managed in-memory cache.
- Amazon DynamoDB global tables replicate customer’s data automatically across their choice of AWS Regions and automatically scale capacity to accommodate their workloads.
- Amazon DynamoDB Streams capture a time-ordered sequence of item-level modifications in any DynamoDB table and store this information in a log for up to 24 hours.
Amazon DynamoDB is built for mission-critical workloads, including support for ACID transactions for a broad set of applications that require complex business logic. Amazon DynamoDB helps secure clients data with encryption and continuously backs up their data for protection, with guaranteed reliability through a service level agreement.
- Amazon DynamoDB encrypts all customer data at rest by default. Encryption at rest enhances the security of customers data by using encryption keys stored in AWS Key Management Service. With encryption at rest, they can build security-sensitive applications that meet strict encryption compliance and regulatory requirements.
- Amazon DynamoDB encrypts all customer data at rest by default. Encryption at rest enhances the security of their data by using encryption keys stored in AWS Key Management Service. With encryption at rest, customers can build security-sensitive applications that meet strict encryption compliance and regulatory requirements.
- Point-in-time recovery (PITR) helps protect customer’s Amazon DynamoDB tables from accidental write or delete operations. PITR provides continuous backups of their Amazon DynamoDB table data, and they can restore that table to any point in time up to the second during the preceding 35 days.
- On-demand backup and restore allows customers to create full backups of their Amazon DynamoDB tables’ data for data archiving, which can help them meet their corporate and governmental regulatory requirements.
High Availability and Durability:- Amazon DynamoDB automatically spreads the data and traffic for users tables over a sufficient number of servers to handle their throughput and storage requirements, while maintaining consistent and fast performance.
- All data is stored on solid-state disks (SSDs) and is automatically replicated across multiple Availability Zones in an AWS Region, providing built-in high availability and data durability.
Amazon DynamoDB global tables provide a managed solution for deploying a multiregion, multi-master database. Global tables lets customers to specify the AWS Regions where they want the table to be available.
- Amazon DynamoDB performs all of the necessary tasks to create identical tables in these Regions and propagate ongoing data changes to all of them.
Amazon DynamoDB transactions simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables. Transactions provide atomicity, consistency, isolation, and durability (ACID) in DynamoDB, helping customers to maintain data correctness in their applications.
- AW clients can use the DynamoDB transactional read and write APIs to manage complex business workflows that require adding, updating, or deleting multiple items as a single, all-or-nothing operation.
Amazon DynamoDB components
Tables, items, and attributes are the core components of Amazon DynamoDB. A table is a collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility. DynamoDB Streams enables users to capture data modification events in Amazon DynamoDB tables.
- Tables:- Amazon DynamoDB stores data in tables, and a table is a collection of data.
- Items – Each table contains zero or more items. An item is a group of attributes that is uniquely identifiable among all of the other items. Items in DynamoDB are similar in many ways to rows, records, or tuples in other database systems. There is no limit to the number of items customers can store in a table.
- Attributes – Each item is composed of one or more attributes. An attribute is a fundamental data element, something that does not need to be broken down any further.
- Attributes in Amazon DynamoDB are similar in many ways to fields or columns in other database systems.
- Primary Key:- The primary key uniquely identifies each item in the table, so that no two items can have the same key. Amazon DynamoDB supports two different kinds of primary keys:
- Partition key:- A simple primary key, composed of one attribute known as the partition key.
- DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored.
- Each primary key attribute must be a scalar (meaning that it can hold only a single value). The only data types allowed for primary key attributes are string, number, or binary.
- Partition key and sort key:- Referred to as a composite primary key, because it is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
- Amazon DynamoDB uses the partition key value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to Amazon DynamoDB) in which the item will be stored.
- All items with the same partition key value are stored together, in sorted order by sort key value.
- Secondary Index:- A secondary index lets customers query the data in the table using an alternate key, in addition to queries against the primary key. DynamoDB supports two kinds of indexes:
- Global secondary index:- An index with a partition key and sort key that can be different from those on the table.
- Local secondary index:- An index that has the same partition key as the table, but a different sort key.
- Each table in DynamoDB has a limit of 20 global secondary indexes (default limit) and 5 local secondary indexes per table.
- Amazon DynamoDB Streams:- Amazon DynamoDB Streams is an optional feature that captures data modification events in DynamoDB tables. The data about these events appear in the stream in near-real time, and in the order that the events occurred, and each event is represented by a stream record. When a stream on a table is enabled, Amazon DynamoDB Streams writes a stream record whenever one of the following events occurs:
- A new item is added to the table: The stream captures an image of the entire item, including all of its attributes.
- An item is updated: The stream captures the “before” and “after” image of any attributes that were modified in the item.
- An item is deleted from the table: The stream captures an image of the entire item before it was deleted.
Amazon DynamoDB Schema-less Web Scale
Amazon DynamoDB global tables provide a fully managed solution for deploying a multiregion, multi-master database, without having to build and maintain replication solutions. With global tables customers can specify the AWS Regions where they want the table to be available. DynamoDB performs all of the necessary tasks to create identical tables in these Regions and propagate ongoing data changes to all of them.
- Amazon DynamoDB global tables are ideal for massively scaled applications with globally dispersed users.
- Global tables provide automatic multi-master replication to AWS Regions worldwide, that enable customers to deliver low-latency data access to their users no matter where they are located.
- Transactional operations provide atomicity, consistency, isolation, and durability (ACID) guarantees only within the region where the write is made originally. Transactions are not supported across regions in global tables.
Amazon DynamoDB is schemaless Web-scale applications, including social networks, gaming, media sharing, and Internet of Things (IoT). Every table must have a primary key to uniquely identify each data item, but there are no similar constraints on other non-key attributes. DynamoDB can manage structured or semistructured data, including JSON documents.
Customers can use the AWS Management Console or the AWS CLI to work with Amazon DynamoDB and perform ad hoc tasks. Applications can use the AWS software development kits (SDKs) to work with DynamoDB using object-based, document- centric, or low-level interfaces.
- Amazon DynamoDB is optimized for compute, so performance is mainly a function of the underlying hardware and network latency. As a managed service, DynamoDB insulates.
Amazon DynamoDB is designed to scale out using distributed clusters of hardware. This design allows increased throughput without increased latency. Customers specify their throughput requirements, and DynamoDB allocates sufficient resources to meet those requirements. There are no upper limits on the number of items per table, nor the total size of that table.
DynamoDB Accelerator (DAX)
Web-based applications that have hundreds, thousands, or millions of concurrent users, with terabytes or more of new data generated per day need to use a database, which can handle tens (or hundreds) of thousands of reads and writes per second. Amazon DynamoDB is well-suited for such kinds of workloads. Developers can start with a small amount of provisioned throughput and gradually increase it as their application becomes more popular. DynamoDB scales seamlessly to handle very large amounts of data and very large numbers of users.
- NoSQL is a term used to describe non-relational database systems that are highly available, scalable, and optimized for high performance. Instead of the relational model, NoSQL databases (like Amazon DynamoDB) use alternate models for data management, such as key-value
SQL to NoSQL
Web-based applications that have hundreds, thousands, or millions of concurrent users, with terabytes or more of new data generated per day need to use a database, which can handle tens (or hundreds) of thousands of reads and writes per second. Amazon DynamoDB is well-suited for such kinds of workloads. Developers can start with a small amount of provisioned throughput and gradually increase it as their application becomes more popular. DynamoDB scales seamlessly to handle very large amounts of data and very large numbers of users.
NoSQL is a term used to describe non-relational database systems that are highly available, scalable, and optimized for high performance. Instead of the relational model, NoSQL databases (like DynamoDB) use alternate models for data management, such as key-value pairs or document storage.
Amazon DynamoDB Global Tables
Amazon DynamoDB global tables provide a fully managed solution for deploying a multiregion, multi-master database, without having to build and maintain replication solutions. With global tables customers can specify the AWS Regions where they want the table to be available. Amazon DynamoDB performs all of the necessary tasks to create identical tables in these Regions and propagate ongoing data changes to all of them.
- Amazon DynamoDB global tables are ideal for massively scaled applications with globally dispersed users.
- Global tables provide automatic multi-master replication to AWS Regions worldwide, that enable customers to deliver low-latency data access to their users no matter where they are located.
- Transactional operations provide atomicity, consistency, isolation, and durability (ACID) guarantees only within the region where the write is made originally. Transactions are not supported across regions in global tables.
NoSQL Workbench
NoSQL Workbench for Amazon DynamoDB is a cross-platform client-side application for modern database development and operations and is available for Windows and macOS. NoSQL Workbench is a unified visual tool that provides data modeling, data visualization, and query development features to help you design, create, query, and manage DynamoDB tables.
- Data Modeling:- With NoSQL Workbench for DynamoDB, you can build new data models from, or design models based on, existing data models that satisfy your application’s data access patterns. You can also import and export the designed data model at the end of the process.
- Data Visualization:- The data model visualizer provides a canvas where you can map queries and visualize the access patterns (facets) of the application without having to write code. Every facet corresponds to a different access pattern in DynamoDB. You can manually add data to your data model or import data from MySQL.
- Operation Building:- NoSQL Workbench provides a rich graphical user interface for you to develop and test queries. You can use the operation builder to view, explore, and query datasets. You can also use the structured operation builder to build and perform data plane operations. It supports projection and condition expression, and lets you generate sample code in multiple languages.
Amazon DynamoDB Accelerator (DAX)
Amazon DynamoDB is designed for scale and performance. In most cases, the DynamoDB response times can be measured in single-digit milliseconds. However, there are certain use cases that require response times in microseconds. For these use cases, Amazon DynamoDB Accelerator (DAX) delivers fast response times for accessing eventually consistent data. DAX is a DynamoDB-compatible caching service that enables you to benefit from fast in-memory performance for demanding applications. DAX addresses three core scenarios:
- As an in-memory cache, DAX reduces the response times of eventually consistent read workloads by an order of magnitude from single-digit milliseconds to microseconds.
- DAX reduces operational and application complexity by providing a managed service that is API-compatible with Amazon DynamoDB. Therefore, it requires only minimal functional changes to use with an existing application.
- For read-heavy or bursty workloads, DAX provides increased throughput and potential operational cost savings by reducing the need to overprovision read capacity units. This is especially beneficial for applications that require repeated reads for individual keys.
- DAX provides access to eventually consistent data from Amazon DynamoDB tables, with microsecond latency, and AWS clients whose Applications require the fastest possible response time for reads, read a small number of items more frequently than others, read-intensive, but are also cost-sensitive, and require repeated reads against a large set of data.
Amazon DynamoDB Best Practice
In database certain key table design decisions heavily influence overall query performance. The design choices that the customers make also have a significant effect on storage requirements, which in turn affects query performance by reducing the number of I/O operations and minimizing the memory required to process queries. To avoid all this customers need to apply the best Practices presented by AWS for optimizing query performance. Here are some of them
- Choose the Best Sort Key Amazon Redshift stores:- AWS customers data on disk in sorted order according to the sort key. The Amazon Redshift query optimizer uses sort order when it determines optimal query plans.
- Choose the Best Distribution Style:- When customers execute a query, the query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. The goal in selecting a table distribution style is to minimize the impact of the redistribution step by locating the data where it needs to be before the query is run.
- Define Primary Key and Foreign Key Constraints:- Define primary key and foreign key constraints between tables wherever appropriate. Even though they are informational only, the query optimizer uses those constraints to generate more efficient query plans.
- Use Date/Time Data Types for Date Columns:- Amazon Redshift stores DATE and TIMESTAMP data more efficiently than CHAR or VARCHAR, which results in better query performance. Use the DATE or TIMESTAMP data type, depending on the resolution you need, rather than a character type when storing date/time information.
- Use a COPY Command to Load Data:- The COPY command loads data in parallel from Amazon S3, Amazon EMR, Amazon DynamoDB, or multiple data sources on remote hosts. COPY loads large amounts of data much more efficiently than using INSERT statements, and stores the data more effectively as well.
- Split Load Data into Multiple Files:- The COPY command loads the data in parallel from multiple files, dividing the workload among the nodes customers cluster. The number of files should be a multiple of the number of slices in cluster
- Compress Data Files:- individually compress load files using gzip, lzop, bzip2, or Zstandard for large datasets.