Database options for system design

                 

Scenario: Types of Databases available based on requirements

Solution:

Two types of Database Indexes:
  1. LSM trees + SS Tables
    1. Balanced binary tree in memory. When it gets big its pushed to disk (SS Table - Sorted list of keys).
    2. If there are many SS Table then they are merged.
    3. Fast writes to memory.
    4. For reads, one needs to search many SS Tables.
  2. B trees
    1. Binary trees using pointers on disk.
    2. Pages on disk with range of keys. Write iterates through the binary tree and either update an existing  key value or create a new page on disk and modify pointer to the new page.
    3. Faster reads, as it knows where key is located.
    4. Slow write to disk.
Replication
  1. Single leader
    1. All writes to one master which replicates to others. Read from any.
    2. There are no conflicts.
    3. Can be mitigated with shards and partition.
  2. Multi leader
    1. Go to small subset of leader DBs. Read from any.
    2. Increase in write throughput but write conflicts could occur. 
  3. Leaderless
    1. Write to all, read from any.
    2. Increase in write throughput but write conflicts could occur. 
SQL Databases
  1. Relational/Normalized data.
  2. It requires 2 phase commits
    1. Check each node if it's able to promise to carry out the update
    2. Commit, actually write the data.
    3. If any node is unable to make that promise, then the coordinator tells all nodes to rollback, releasing any locks they have, and the transaction is aborted. 
  3. ACID guarantees
  4. Slow due to #2. Transactions are slow.
  5. Use B trees.
  6. Relational database or RDBMS databases are vertically Scalable.
  7. Expensive.
Mongo DB
  1. Its document DBs. Documents can be nested.
  2. NoSql
  3. Use B trees and supports transactions.
  4. With NoSQL, unstructured & schema less data can be stored in multiple collections and nodes. It does not require fixed table schemas and it allows limited join queries, and can be scaled horizontally.
  5. Relatively cheaper.
  6. No Stored Procedure.
Apache Casandra
  1. No Sql
  2. Wide column data store (like excel spreadsheet)
  3. Shard key and Sort key
  4. Multileader/leaderless
    1. Fast writes. Quorum. Last write wins. 
  5. Based on LSM & SS Tables.
  6. Good for high write volume and consistency not that vital. All write and reads go to same shard. Like chat application.
  7. No transaction. 
Redis and Memcached
  1. Key - value pair in memory
  2. Cache and geo spatial index. etc.

No comments:

Post a Comment

Move Github Sub Repository back to main repo

 -- delete .gitmodules git rm --cached MyProject/Core git commit -m 'Remove myproject_core submodule' rm -rf MyProject/Core git remo...