System Design

Databases

February 14, 20263 min read

Why You Need Databases

You need databases because you need to persist data. You can't rely on data sitting on the client or the application server—what if the server reboots, or the user closes their laptop or browser? Databases are the main way we persist data because their job is to store it and retrieve it efficiently. Every application has to store and retrieve data, and we usually do that with databases.

The two main types you'll work with are SQL and NoSQL. Neither is inherently better; each has its use cases. Pick the best one based on the problem.


SQL (Relational)

Traditional relational databases—PostgreSQL, MySQL, MariaDB—store data in tables with rows and columns. Think of it like Excel: structured, tabular. They're usually ACID compliant, meaning transactions are atomic, consistent, isolated, and durable.

Joins

Tables can be linked by keys (e.g., user_id in an orders table pointing to the users table). A join combines rows from two or more tables based on those relationships. For example, "get all orders with their user names" might join the orders table and the users table on user_id. This keeps data normalized (no duplication) but requires joins when you need related data.

Schema and Structure

Data is highly structured and must follow a schema you define upfront. You specify columns, types, and constraints. If you need to change the schema (add a column, change a type), you create migrations—explicit, versioned changes to the database structure.

Scaling

SQL databases are mostly vertically scalable: add more CPU, RAM, or disk to a single machine. Horizontal scaling (adding more machines, sharding) is possible but more challenging and time-consuming. Many production systems do it, but it's not as straightforward as with NoSQL.


NoSQL (Non-Relational)

NoSQL databases are non-relational. They emerged with a focus on scalability and availability for large, distributed systems. Again: not "better" than SQL—different trade-offs.

Characteristics

  • No joins. Data is often denormalized: you duplicate data across documents or records for faster reads. Instead of joining tables, you store everything you need in one place.
  • Less structured. Many NoSQL stores are schemaless—you don't define columns upfront. You can add new fields as you go. Documents are often stored as JSON-like structures.
  • Horizontally scalable. Many NoSQL databases are designed to scale out across many machines (sharding, replication) more easily than traditional SQL.

Types of NoSQL

TypeDescriptionExamples
Key-valueStore value by key; fastest for simple lookupRedis, DynamoDB
DocumentStore JSON-like documents; flexible schemaMongoDB, CouchDB
Column / wide-columnStore by column families; good for analyticsCassandra, HBase
GraphNodes and edges; optimized for relationshipsNeo4j, Neptune

Other Database Types

Beyond SQL and NoSQL, there are specialized databases for specific use cases:

TypeUse caseExamples
GraphModeling relationships—social graphs, recommendations, fraud detectionNeo4j, Amazon Neptune
VectorSimilarity search, embeddings, AI/ML—"find items similar to this"Pinecone, Weaviate, pgvector
Time-seriesMetrics, IoT, events—data indexed by timeInfluxDB, TimescaleDB
Search engineFull-text search, fuzzy matching, faceted searchElasticsearch, OpenSearch
In-memoryCaching, session store, real-time—ultra-fast readsRedis, Memcached

These often sit alongside your primary SQL or NoSQL database for specific workloads. For example: Postgres for transactional data, Redis for cache, Elasticsearch for search.