跳转至

L2 DynamoDB

L2 DynamoDB Overview

Alt Image Text

  • Fully Managed, Highly available with replication across 3 AZ
  • NoSQL database - not a relational database
  • Scales to massive workloads, distributed database
  • Millions of requests per seconds, trillions of row, 100s of TB of storage
  • Fast and consistent in performance (low latency on retrieval)
  • Integrated with IAM for security, authorization and administration
  • Enables event driven programming with DynamoDB Streams
  • Low cost and auto scaling capabilities

1-1 DynamoDB - Basics

  • DynamoDB is made of tables
  • Each table has a primary key (must be decided at creation time)

  • Each table can have an infinite number of items (= rows)

  • Each item has attributes (can be added over time can be null)
  • Maximum size of a item is 400KB
  • Data types supported are:
    • Scalar Types: String, Number, Binary, Boolean, Null
    • Document Types: List, Map
    • Set Types: String Set, Number Set, Binary Set

1-2 DynamoDB — Primary Key

  • Option 1: Partition key only (HASH)
  • Partition key must be unique for each item
  • Partition key must be "diverse" so that the data is distributed
  • Example: user id for a users table

Alt Image Text

1-3 DynamoDB — Primary Keys

  • Option 2: Partition key + Sort Key
  • The combination must be unique
  • Data is grouped by partition key
  • Sort key == range key
  • Example: users-games table
    • user_id for the partition key
    • game_id for the sort key

Alt Image Text

partition key and sort key can be duplicate, but the combination must be unique

1-4 DynamoDB Partition Keys exercise

  • We're building a movie database
  • What is the best partition key to maximize data distribution?

    • movie_id
    • producer_name
    • leader_actor_name
    • movie_language
  • movie_id has the highest cardinality(unqiue) so it's a good candidate

  • moving_language doesn't take many values and may be skewed towards English so it's not a great partition key

1-5 DynamoDB in Big Data

Common use cases include:

  • Mobile apps
  • Gaming
  • Digital ad serving
  • Live voting
  • Audience interaction for live events
  • Sensor networks
  • Log ingestion
  • Access control for web-based content
  • Metadata storage for Amazon S3 objects
  • E-commerce shopping carts
  • Web session management

Anti Pattern

  • Pre-written application tied to a traditional relational database: use RDS instead
  • joins or complex transactions
  • Binary Large Object (BLOB) data: store data in S3 & metadata in DynamoDB
  • Large data with low I/O rate: use S3 instead

DynamoDB is to be more for data is hot and smaller

S3 is goiing to be colder but bigger

2、DynamoDB RCU & WCU

2-1 DynamoDB Provisioned Throughput

  • Table must have provisioned read and write capacity units
  • Read Capacity Units (RCU): throughput for reads
  • Write Capacity Units (WCU): throughput for writes
  • Option to setup auto-scaling of throughput to meet demand
  • Throughput can be exceeded temporarily using "burst credit"
  • If burst credit are empty, you'll get a "ProvisionedThroughputException".
  • It's then advised to do an exponential back-off retry

2-2 DynamoDB — Write Capacity Units

  • One write capacity unit represents one write per second for an item up to 1 KB in size.
  • If the items are larger than 1 KB, more WCU are consumed

  • Example 1: We write 10 objects per seconds of 2 KB each.

We need 2 * 10 = 20 WCU 
  • Example 2: we write 6 objects per second of 4.5 KB each
We need 6 * 5 = 30 WCU  #(4.5 gets rounded to the upper KB)
  • Example 3: we write 120 objects per minute of 2 KB each
We need 120 / 60 * 2 = 4 WCU 

2-3 Strongly Consistent Read vs Eventually Consistent Read

Eventually Consistent Read:

If we read just after a write, it's possible we'll get unexpected response because of replication

Strongly Consistent Read:

If we read just after a write, we will get the correct data

By default DynamoDB uses Eventually Consistent Reads, but Getltem, Query & Scan provide a "ConsistentRead" parameter you can set to True

Alt Image Text

2-3 DynamoDB Read Capacity Units

  • One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size.
  • If the items are larger than 4 KB, more RCU are consumed

1 strongly consistent read/s

2 eventually consistent reads/s

4kb up

  • Example 1: 10 strongly consistent reads per seconds of 4 KB each
We need 10 * 4 KB / 4 KB = 10 RCU 
  • Example 2: 16 eventually consistent reads per seconds of 12 KB each
We need (16 / 2) * ( 12 / 4 ) = 24 RCU 
  • Example 3: 10 strongly consistent reads per seconds of 6 KB each
We need 10 * 8 KB / 4 = 20 RCU (we have to round up 6 KB to 8 KB) 

2-4 DynamoDB -Throttling

  • If we exceed our RCU or WCU, we get ProvisionedThroughputExceededExceptions
  • Reasons:

    • Hot keys / partitions: one partition key is being read too many times (popular item for ex)
    • Very large items: remember RCU and WCU depends on size of items
  • Solutions:

    • Exponential back-off when exception is encountered (already in SDK)
    • Distribute partition keys as much as possible
    • If RCU issue, we can use DynamoDB Accelerator (DAX)

2-5 on-demand RCU and WCU

Select on-demand if you want to pay only for the read and writes you perform, with no capacity planning required. It's pretty expensive

Alt Image Text

2-6 Capacity calculator

Alt Image Text

3、DynamoDB Partitions

3-1 DynamoDB - Partitions Internal

  • You start with one partition
  • Each partition:

    • Max of 3000 RCU / 1000 WCU
    • Max of 10GB
  • To compute the number of partitions:

  • By capacity: (TOTAL RCU / 3000) + (TOTAL WCU / 1000)

  • By size: Total Size / 10 GB
  • Total partitions = CEILING(MAX(Capacity, Size))
  • WCU and RCU are spread evenly between partitions

Alt Image Text

4、DynamoDB APIs (ADUS)

4-1 DynamoDB — Writing Data

  • PutItem - Write data to DynamoDB (create data or full replace)
    • Consumes WCU
  • UpdateItem — Update data in DynamoDB (partial update of attributes)
    • Possibility to use Atomic Counters and increase them
  • Conditional Writes:
    • Accept a write / update only if conditions are respected, otherwise reject
    • Helps with concurrent access to items
    • No performance impact

4-2 DynamoDB — Deleting Data

  • Deleteltem

    • Delete an individual row
    • Ability to perform a conditional delete
  • DeleteTable

    • Delete a whole table and all its items
    • Much quicker deletion than calling Deleteltem on all items

4-3 DynamoDB — Batching Writes

  • BatchWriteltem

    • Up to 25 Putltem and / or Deleteltem in one call
    • Up to 16 MB of data written
    • Up to 400 KB of data per item
  • Batching allows you to save in latency by reducing the number of API calls done against DynamoDB

  • Operations are done in parallel for better efficiency
  • It's possible for part of a batch to fail, in which case we have the try the failed items (using exponential back-off algorithm)

4-4 DynamoDB — Reading Data

  • GetItem:

    • Read based on Primary key
    • Primary Key = HASH or HASH-RANGE
    • Eventually consistent read by default
    • ption to use strongly consistent reads (more RCU - might take longer)
    • ProjectionExpression can be specified to include only certain attributes
  • BatchGetltem:

    • Up to 100 items
    • Up to 16 MB of data
    • Items are retrieved in parallel to minimize latency

4-5 DynamoDB — Query

  • Query returns items based on:
    • PartitionKey value (must be = operator)
    • SortKey value (=, <, <=, >, >=, Between, Begin) — optional
    • FilterExpression to further filter (client side filtering)

Query by PartitionKey

Alt Image Text

Query by PartitionKey + SortKey

Alt Image Text

The only thing you can query is PartitionKey and SortKey

You could have filter at the very end and it's client side filtering

Alt Image Text

  • Returns:

    • Up to 1 MB of data
    • Or number of items specified in Limit
  • Able to do pagination on the results

  • Can query table, a local secondary index, or a global secondary index

4-6 DynamoDB - Scan

  • Scan the entire table and then filter out data (inefficient)
  • Returns up to 1 MB of data use pagination to keep on reading
  • Consumes a lot of RCU
  • Limit impact using Limit or reduce the size of the result and pause
  • For faster performance, use parallel scans:

    • Multiple instances scan multiple partitions at the same time
    • Increases the throughput and RCU consumed
    • Limit the impact of parallel scans just like you would for Scans
  • Can use a ProjectionExpression ± FilterExpression (no change to RCU)

Alt Image Text

5、DynamoDB Indexes: LSI & GSI

5-1 DynamoDB — LSI (Local Secondary Index)

  • Alternate range key for your table, local to the hash key
  • Up to five local secondary indexes per table.
  • The sort key consists of exactly one scalar attribute.
  • The attribute that you choose must be a scalar String, Number, or Binary
  • LSI must be defined at table creation time

Alt Image Text

Create local secondary indexex with partition key + Sort key

Alt Image Text

LSI must be defined at table creation time

Alt Image Text

instead Query table, you can query index

Alt Image Text

5-2 DynamoDB — GSI (Global Secondary Index)

  • To speed up queries on non-key attributes, use a Global Secondary Index
  • GSI = partition key ± optional sort key
  • The index is a new "table" and we can project attributes on it

    • The partition key and sort key of the original table are always projected (KEYS_ONLY)
    • Can specify extra attributes to project (INCLUDE)
    • Can use all attributes from main table (ALL)
  • Must define RCU /WCU for the index

  • Possibility to add / modify GSI (not LSI)

Alt Image Text

Alt Image Text

partition key + Sort key

Add RCU /WCU for the index

Alt Image Text

Alt Image Text

6、DynamoDB DAX

  • DAX = DynamoDB Accelerator
  • Seamless cache for DynamoDB, no application re-write
  • Writes go through DAX to DynamoDB
  • Micro second latency for cached reads & queries
  • Solves the Hot Key problem (too many reads)
  • 5 minutes TTL for cache by default
  • Up to 10 nodes in the cluster
  • Multi AZ (3 nodes minimum recommended for production)
  • Secure (Encryption at rest with KMS,VPC, IAM, CloudTrail...)

Alt Image Text

Alt Image Text

7、DynamoDB Streams

  • Changes in DynamoDB (Create, Update, Delete) can end up in a DynamoDB Stream
  • This stream can be read by AWS Lambda, and we can then do:

    • React to changes in real time (welcome email to new users)
    • Create derivative tables / views
    • Insert into ElasticSearch
  • Could implement Cross Region Replication using Streams

  • Stream has 24 hours of data retention
  • Configurable batch size (up to 1,000 rows, 6 MB)

7-1 DynamoDB Streams Kinesis Adapater

  • Use the KCL library to directly consume from DynamoDB Streams
  • You just need to add a "Kinesis Adapter" library
  • The interface and programming is exactly the same as Kinesis Streams
  • That's the alternative to using AWS Lambda

Alt Image Text

Alt Image Text

8、DynamoDB TTL (Time to Live)

  • TTL = automatically delete an item after an expiry date / time
  • TTL is provided at no extra cost, deletions do not use WCU / RCU
  • TTL is a background task operated by the DynamoDB service itself
  • Helps reduce storage and manage the table size over time
  • Helps adhere to regulatory norms
  • TTL is enabled per row (you define a TTL column, and add a date there)
  • DynamoDB typically deletes expired items within 48 hours of expiration
  • Deleted items due to TTL are also deleted in GSI / LSI
  • DynamoDB Streams can help recover expired items

TTL is enabled per row

Add attribute: expire_on

Alt Image Text

Alt Image Text

https://www.epochconverter.com/

Enable TTL on table level

Alt Image Text

9、DynamoDB — Security & Other Features

9-1 Security:

  • VPC Endpoints available to access DynamoDB without internet
  • Access fully controlled by IAM
  • Encryption at rest using KMS
  • Encryption in transit using SSL /TLS •

9-2 Backup and Restore feature available

  • Point in time restore like RDS
  • No performance impact

9-2 Global Tables

  • Multi region, fully replicated, high performance

Amazon Database Migration Service (DMS) can be used to migrate to DynamoDB (from Mongo, Oracle,TlySQL, S3, etc...)

  • You can launch a local DynamoDB on your computer for development purposes

10、DynamoDB: Storing Large Objects

10-1 DynamoDB Storing large objects

  • Max size of an item in DynamoDB = 400 KB
  • For large objects, store them in S3 and reference them in DynamoDB

Alt Image Text

10-2 DynamoDB - Price comparison for 300 KB

Alt Image Text

  • Amazon S3 (300KB of storage)

    • $0.0000069 storage per month
    • $0.0000050 initial PUT
    • $0.0000004 per GET
  • DynamoDB (< 1 KB of storage)

    • $0.0006500 for one WCU per month
    • $0.0001300 for one RCU per month
    • $0.00000025 storage per month
  • Assuming I write, 100 reads per month:

    • $0.00119215 per month

Alt Image Text

  • DynamoDB (300 KB of storage)

    • $0.195 for 300 WCU per month
    • $0.004940 for 38 RCU per month
    • $0.000075 storage per month
  • Assuming I write, 100 reads per month:

    • Storage is 11x more expensive
    • WCU + RCU are under-used
  • Even for items that fit in DynamoDB, if under-used, S3 + DynamoDB is a solution

11 AWS ElastiCache Overview

Alt Image Text

  • The same way RDS is to get managed Relational Databases...
  • ElastiCache is to get managed Redis or Memcached
  • Caches are in-memory databases with really high performance, low latency
  • Helps reduce load off of databases for read intensive workloads
  • Helps make your application stateless
  • Write Scaling using sharding
  • Read Scaling using Read Replicas
  • Multi AZ with Failover Capability
  • AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups

11-1 Redis Overview

  • Redis is an in-memory key-value store
  • Super low latency (sub ms)
  • Cache survive reboots by default (it's called persistence)
  • Great to host
    • User sessions
    • Leaderboard (for gaming)
    • Distributed states
    • Relieve pressure on databases (such as RDS) Pub / Sub capability for messaging
  • Multi AZ with Automatic Failover for disaster recovery if you don't want to lose your cache data
  • Support for Read Replicas

11-2 Memcached Overview

  • Memcached is an in-memory object store(Like S3)
  • Cache doesn't survive reboots
  • Use cases:
    • Quick retrieval of objects from memory
    • Cache often accessed objects
  • Overall, Redis has largely grown in popularity and has better feature sets than Memcached.
  • I would personally only use Redis for caching needs.