28 November 2018

Building with AWS Databases: Match Your Workload to the Right Database (DAT301)

by mo


We have recently seen some convergence of different database technologies. Many customers are evaluating heterogeneous migrations as their database needs have evolved or changed. Evaluating the best database to use for a job isn’t as clear as it was ten years ago. We’ll discuss the ideal use cases for relational and nonrelational data services, including Amazon ElastiCache for Redis, Amazon DynamoDB, Amazon Aurora, Amazon Neptune, and Amazon Redshift. This session digs into how to evaluate a new workload for the best managed database option. Please join us for a speaker meet-and-greet following this session at the Speaker Lounge (ARIA East, Level 1, Willow Lounge). The meet-and-greet starts 15 minutes after the session and runs for half an hour.

  • database workload classifications
  • traditional approaches to rdbms
  • how nosql databases compare
  • the flavours of nosql on aws
  • what database to use.

choose the database because of purpose built.

workloads

  • operations: oltp
    • common business process
    • regular and repeatable
    • same thing happens each time data is processed
  • analytics: olap
    • bi
    • reporting
    • more adhoc access patterns
    • do not know what type of questions the users will ask when they come to the application.
    • decision support system. data lakes and data warehouses
  • sizing a database
  • we usually oversize to make sure we can handle spikes but this means lost dollars for paying for larger instances for short spikes in our workload.
  • scaling rdbs
    • start with small box then slowing scale up to bigger box and bigger box until we run out of bigger boxes.
    • then we have to start sharding the db and that gets complicated.

nosql

  • leverage denorm model
  • sharded and provided horizontal scaling and unbounded storage capacity.
  • uses partition keys to decide which node data should be distributed to.
    • hash(2) => 48
    • hash(1) = 7b
    • hash(3) = cd
  • shard key, partition key: each nosql db needs this to know which node the data needs to go to.
  • CAP theorem
    • C: consistency - consistent view of data. as soon as the write happens the read matches.
    • A: availability - always be able to read and write. if you can read but not write then it is no available.
    • P: partition tolerance - what happens when network between nodes starts to fail.

RDBS: CA NoSQL:

  • MongoDB: CP -> document
  • Redis: CP -> key/value
  • memcache: CP -> key/value
  • dynamo db: CP
  • cassandra: AP -> wide column
  • dynamodb: AP -> wide column
  • riak: AP

hype curve

  • innovators
  • early adopters
  • early majority
  • late majority
  • laggards

  • de normalized data modelling is going to become commonplace.

AWS db’s

no sql

  • dynamodb - wide column/document
  • elasticache - indexed key value
  • qldb - ledger
  • neptune - graph
  • timestream - tsdb

rdbms

  • aurora
  • rds

olap

  • redshift
  • emr
  • athena
  • elastic search

rds

  • aurora
  • mysql
  • pg
  • mariadb
  • sql server
  • oracle

nosql

  • requires knowing access patterns before storing data.

rbms

  • reshape data on the way out so you do not need to know the access patterns up front.

dynamo db

  • wide column / document
  • scalable
  • managed

  • table/collection/keyspace
    • items have attributes. attributes do not need to be the same.
    • partition key
    • sort key

graph query types

  • node query (primary) rbms can do this
  • edge query (index) rbms can do this
  • hybrid query (traversal) this is where graph databases shine.

redshift

  • data warehousing
  • fully scalable backend
  • elasticity

timestream database

sql / nosql / graph

sql: optimized for storage nosql: optimized for compute, pre-created denormalized views. graph: ad hoc entity/relationship aggregations

PIE theorem

  • P: pattern flexibility: supports random access
  • I: infinite scale: can gracefully increase in size and throughput without practical limits
  • E: efficiency: how fast do the results need to come back

Amazon DB: IE Amazon RDS: PE Elastic Search: PE Neptune: PE Redshift: PI Athena: PI

Purpose built db solutions

  • infra
  • software

zero unplanned downtime

99.999 global tables on amazon dynamo db (seconds of downtime per year) 99.99 region amazon dynamo db

Resources:

devops