System Design 101

Explain complex systems using visuals and simple terms.

Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.

Communication protocols
CI/CD
- CI/CD Pipeline Explained in Simple Terms
- Netflix Tech Stack (CI/CD Pipeline)
Architecture patterns
- MVC, MVP, MVVM, MVVM-C, and VIPER
- 18 Key Design Patterns Every Developer Should Know
Database
Cache
Microservice architecture
Payment systems
DevOps
GIT
Cloud Services
- A nice cheat sheet of different cloud services (2023 edition)
- What is cloud native?
Developer productivity tools
- Visualize JSON files
- Automatically turn code into architecture diagrams
Linux
- Linux file system explained
- 18 Most-used Linux Commands You Should Know
Security
Real World Case Studies

Communication protocols

Architecture styles define how different components of an application programming interface (API) interact with one another. As a result, they ensure efficiency, reliability, and ease of integration with other systems by providing a standard approach to designing and building APIs. Here are the most used styles:

SOAP:

Mature, comprehensive, XML-based

Best for enterprise applications
RESTful:

Popular, easy-to-implement, HTTP methods

Ideal for web services
GraphQL:

Query language, request specific data

Reduces network overhead, faster responses
gRPC:

Modern, high-performance, Protocol Buffers

Suitable for microservices architectures
WebSocket:

Real-time, bidirectional, persistent connections

Perfect for low-latency data exchange
Webhook:

Event-driven, HTTP callbacks, asynchronous

Notifies systems when events occur

REST API vs. GraphQL

When it comes to API design, REST and GraphQL each have their own strengths and weaknesses.

The diagram below shows a quick comparison between REST and GraphQL.

REST

Uses standard HTTP methods like GET, POST, PUT, DELETE for CRUD operations.
Works well when you need simple, uniform interfaces between separate services/applications.
Caching strategies are straightforward to implement.
The downside is it may require multiple roundtrips to assemble related data from separate endpoints.

GraphQL

Provides a single endpoint for clients to query for precisely the data they need.
Clients specify the exact fields required in nested queries, and the server returns optimized payloads containing just those fields.
Supports Mutations for modifying data and Subscriptions for real-time notifications.
Great for aggregating data from multiple sources and works well with rapidly evolving frontend requirements.
However, it shifts complexity to the client side and can allow abusive queries if not properly safeguarded
Caching strategies can be more complicated than REST.

The best choice between REST and GraphQL depends on the specific requirements of the application and development team. GraphQL is a good fit for complex or frequently changing frontend needs, while REST suits applications where simple and consistent contracts are preferred.

Neither API approach is a silver bullet. Carefully evaluating requirements and tradeoffs is important to pick the right style. Both REST and GraphQL are valid options for exposing data and powering modern applications.

How does gRPC work?

RPC (Remote Procedure Call) is called “remote” because it enables communications between remote services when services are deployed to different servers under microservice architecture. From the user’s point of view, it acts like a local function call.

The diagram below illustrates the overall data flow for gRPC.

Step 1: A REST call is made from the client. The request body is usually in JSON format.

Steps 2 - 4: The order service (gRPC client) receives the REST call, transforms it, and makes an RPC call to the payment service. gRPC encodes the client stub into a binary format and sends it to the low-level transport layer.

Step 5: gRPC sends the packets over the network via HTTP2. Because of binary encoding and network optimizations, gRPC is said to be 5X faster than JSON.

Steps 6 - 8: The payment service (gRPC server) receives the packets from the network, decodes them, and invokes the server application.

Steps 9 - 11: The result is returned from the server application, and gets encoded and sent to the transport layer.

Steps 12 - 14: The order service receives the packets, decodes them, and sends the result to the client application.

What is a webhook?

The diagram below shows a comparison between polling and Webhook.

Assume we run an eCommerce website. The clients send orders to the order service via the API gateway, which goes to the payment service for payment transactions. The payment service then talks to an external payment service provider (PSP) to complete the transactions.

There are two ways to handle communications with the external PSP.

1. Short polling

After sending the payment request to the PSP, the payment service keeps asking the PSP about the payment status. After several rounds, the PSP finally returns with the status.

Short polling has two drawbacks:

Constant polling of the status requires resources from the payment service.
The External service communicates directly with the payment service, creating security vulnerabilities.

2. Webhook

We can register a webhook with the external service. It means: call me back at a certain URL when you have updates on the request. When the PSP has completed the processing, it will invoke the HTTP request to update the payment status.

In this way, the programming paradigm is changed, and the payment service doesn’t need to waste resources to poll the payment status anymore.

What if the PSP never calls back? We can set up a housekeeping job to check payment status every hour.

Webhooks are often referred to as reverse APIs or push APIs because the server sends HTTP requests to the client. We need to pay attention to 3 things when using a webhook:

We need to design a proper API for the external service to call.
We need to set up proper rules in the API gateway for security reasons.
We need to register the correct URL at the external service.

How to improve API performance?

The diagram below shows 5 common tricks to improve API performance.

Pagination

This is a common optimization when the size of the result is large. The results are streaming back to the client to improve the service responsiveness.

Asynchronous Logging

Synchronous logging deals with the disk for every call and can slow down the system. Asynchronous logging sends logs to a lock-free buffer first and immediately returns. The logs will be flushed to the disk periodically. This significantly reduces the I/O overhead.

Caching

We can store frequently accessed data into a cache. The client can query the cache first instead of visiting the database directly. If there is a cache miss, the client can query from the database. Caches like Redis store data in memory, so the data access is much faster than the database.

Payload Compression

The requests and responses can be compressed using gzip etc so that the transmitted data size is much smaller. This speeds up the upload and download.

Connection Pool

When accessing resources, we often need to load data from the database. Opening the closing db connections adds significant overhead. So we should connect to the db via a pool of open connections. The connection pool is responsible for managing the connection lifecycle.

HTTP 1.0 -> HTTP 1.1 -> HTTP 2.0 -> HTTP 3.0 (QUIC)

What problem does each generation of HTTP solve?

The diagram below illustrates the key features.

HTTP 1.0 was finalized and fully documented in 1996. Every request to the same server requires a separate TCP connection.
HTTP 1.1 was published in 1997. A TCP connection can be left open for reuse (persistent connection), but it doesn’t solve the HOL (head-of-line) blocking issue.

HOL blocking - when the number of allowed parallel requests in the browser is used up, subsequent requests need to wait for the former ones to complete.
HTTP 2.0 was published in 2015. It addresses HOL issue through request multiplexing, which eliminates HOL blocking at the application layer, but HOL still exists at the transport (TCP) layer.

As you can see in the diagram, HTTP 2.0 introduced the concept of HTTP “streams”: an abstraction that allows multiplexing different HTTP exchanges onto the same TCP connection. Each stream doesn’t need to be sent in order.
HTTP 3.0 first draft was published in 2020. It is the proposed successor to HTTP 2.0. It uses QUIC instead of TCP for the underlying transport protocol, thus removing HOL blocking in the transport layer.

QUIC is based on UDP. It introduces streams as first-class citizens at the transport layer. QUIC streams share the same QUIC connection, so no additional handshakes and slow starts are required to create new ones, but QUIC streams are delivered independently such that in most cases packet loss affecting one stream doesn't affect others.