Deepak Vadgama bio photo

Deepak Vadgama

Software developer, amateur photographer

Email LinkedIn Github Stackoverflow Youtube Subscribe (RSS)

In last 5-10 years there has been huge wave of startups attempting to re-invent databases. This re-invention is justified by advent of

  • Scale - Proliferation of mobile. Billions of potential users. Hockey stick user growth. These require radical rethink of architectures.
  • Mobile (Web+Apps) - Varied platforms, intermittent connectivity and lean startup principle of constant changes, require a standardised + flexible format (JSON), simplification of synchronization of user data and offline capabilities built into the client.
  • Analytics - For converting scale into value, massive data needs to be analyzed, which calls for structuring data differently (eg: Graph DB for Social Networks, BigTable for storing billions of Web pages).
  • Computing - Constantly reducing storage (HDD/SSD/RAM) prices and improved seek performance (SSD) have enabled new DB engine architectures (moving away from BTables and slow seeking disk-blocks).
  • Cloud - Cloud computing has enabled developers to focus on data itself; offloading major pain points of complicated aspects like sharding, distributed coordination, consistency etc.

Developers / Startups

For consultants/SaaS startups, most important features of DB are:

  • For server - SQL. Reliability. Performance.
  • For Web/Mobile - JSON storage. Dashboard. Offline and Sync.

Thus, I will not mention many important but not-required-by-everyone features.

Classic Relational DB - Schema-based - SQL

  • MS SQL- By Microsoft. Licensed.
  • OracleDB - By Oracle. Licensed.
  • MySQL - Best, free, relational DB. Though now owned by Oracle.
  • PostgreSQL - Another hugely popular, free, relational DB.
  • MariaDB - Fork of MySQL after it was bought out by Oracle. Led by founder of MySQL.
  • WebScaleSQL - Fork of MySQL for large scale deployments. Joint effort of Facebook, Google, Twitter, Alibaba.
  • CockroachDB - Created by ex-employees of Google, Facebook & Twitter. Scalable, distributed SQL with strong focus on availability.

Schemaless DB - NoSQL

Divided into 4 types - Document stores, Columnar DB, Graph DB and Key-value stores

Document stores - aka Store JSON objects.

All these have API, Dashboard, Indexing and Querying facilities.

  • MongoDB - Document store DB pioneers. Loosing steam lately?
  • CouchDB - MongoDB cousin. Apache project. Offline capabilities.
  • PouchDB - CouchDB’s cousin in browser. Works across all browsers. Syncs well with CouchDB on server.
  • RethinkDB - Push JSON from DB into UI/Server.
  • Firebase - Same as RethinkDB, but hosted. More capabilities in cloud section below.

Columnar DB - Wide Column DB

  • Cassandra - To store loose-schema based data in form of rows/columns/column-families. Google’s BigTable cousin. For large scale data. You probably don’t need this. Lets turn back.

Graph DB

  • Neo4J - For Graph based databases like if you want to model social connections of Facebook or LinkedIn. Again, we don’t need this. Lets turn back.

Key-Value stores

  • MemcacheD - In-memory (non-persistent) key-value store.
  • Redis - Persistent Key-Value distributed storage. Not DB per se. Typically used for caching/task-queues. This topic is going tangential. Lets head back to DB.

Cloud - Hosted Solutions

Cloud - Object store

  • Amazon S3 - Key-Object store. Object = images, videos, files etc. Group into buckets. Security rules. API.
  • Google storage - Same as above.

Cloud - Archival Storage aka Cold Storage

  • Google Nearline - Archival storage. Store + Retrieve price 1 cent/GB. 3 secs retrieval time.
  • Amazon Glacier - Archival storage. Cheap to store, expensive to retrieve.

Conclusion:

  • Cloud computing Thanks to cloud computing, developers need not worry about complicated aspects of availability, sharding, load-balancing, replication, backups, performance, cost etc. All these are taken care of by capable cloud providers.

  • Mobile Most NoSQL (document store) database solutions are (fast) converging onto similar set of capabilities. Dashboard, Offline, Push, Sync, Security Rules, REST API, Cloud hosted etc.

Databases have come quite far and its looking bright for developers. I personally use MySQL currently (will move to Google Cloud SQL), and plan to give Firebase a try for specific use cases.


Tags: