【 👨🏻💻 YouTube | 📮 Newsletter 】
System Design 101
Explain complex systems using visuals and simple terms.
Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.
Table of Contents
- Communication protocols
- REST API vs. GraphQL
- How does gRPC work?
- What is a webhook?
- How to improve API performance?
- HTTP 1.0 -> HTTP 1.1 -> HTTP 2.0 -> HTTP 3.0 (QUIC)
- SOAP vs REST vs GraphQL vs RPC
- Code First vs. API First
- HTTP status codes
- What does API gateway do?
- How do we design effective and safe APIs?
- TCP/IP encapsulation
- Why is Nginx called a “reverse” proxy?
- What are the common load-balancing algorithms?
- URL, URI, URN - Do you know the differences?
- CI/CD
- Architecture patterns
- Database
- Cache
- Microservice architecture
- Payment systems
- DevOps
- GIT
- Cloud Services
- Developer productivity tools
- Linux
- Security
- How does HTTPS work?
- Oauth 2.0 Explained With Simple Terms.
- Top 4 Forms of Authentication Mechanisms
- Session, cookie, JWT, token, SSO, and OAuth 2.0 - what are they?
- How to store passwords safely in the database and how to validate a password?
- Explaining JSON Web Token (JWT) to a 10 year old Kid
- How does Google Authenticator (or other types of 2-factor authenticators) work?
- Real World Case Studies
- Netflix's Tech Stack
- Twitter Architecture 2022
- Evolution of Airbnb’s microservice architecture over the past 15 years
- Monorepo vs. Microrepo.
- How will you design the Stack Overflow website?
- Why did Amazon Prime Video monitoring move from serverless to monolithic? How can it save 90% cost?
- How does Disney Hotstar capture 5 Billion Emojis during a tournament?
- How Discord Stores Trillions Of Messages
- How do video live streamings work on YouTube, TikTok live, or Twitch?
Communication protocols
Architecture styles define how different components of an application programming interface (API) interact with one another. As a result, they ensure efficiency, reliability, and ease of integration with other systems by providing a standard approach to designing and building APIs. Here are the most used styles:
-
SOAP:
Mature, comprehensive, XML-based
Best for enterprise applications
-
RESTful:
Popular, easy-to-implement, HTTP methods
Ideal for web services
-
GraphQL:
Query language, request specific data
Reduces network overhead, faster responses
-
gRPC:
Modern, high-performance, Protocol Buffers
Suitable for microservices architectures
-
WebSocket:
Real-time, bidirectional, persistent connections
Perfect for low-latency data exchange
-
Webhook:
Event-driven, HTTP callbacks, asynchronous
Notifies systems when events occur
REST API vs. GraphQL
When it comes to API design, REST and GraphQL each have their own strengths and weaknesses.
The diagram below shows a quick comparison between REST and GraphQL.
REST
- Uses standard HTTP methods like GET, POST, PUT, DELETE for CRUD operations.
- Works well when you need simple, uniform interfaces between separate services/applications.
- Caching strategies are straightforward to implement.
- The downside is it may require multiple roundtrips to assemble related data from separate endpoints.
GraphQL
- Provides a single endpoint for clients to query for precisely the data they need.
- Clients specify the exact fields required in nested queries, and the server returns optimized payloads containing just those fields.
- Supports Mutations for modifying data and Subscriptions for real-time notifications.
- Great for aggregating data from multiple sources and works well with rapidly evolving frontend requirements.
- However, it shifts complexity to the client side and can allow abusive queries if not properly safeguarded
- Caching strategies can be more complicated than REST.
The best choice between REST and GraphQL depends on the specific requirements of the application and development team. GraphQL is a good fit for complex or frequently changing frontend needs, while REST suits applications where simple and consistent contracts are preferred.
Neither API approach is a silver bullet. Carefully evaluating requirements and tradeoffs is important to pick the right style. Both REST and GraphQL are valid options for exposing data and powering modern applications.
How does gRPC work?
RPC (Remote Procedure Call) is called “remote” because it enables communications between remote services when services are deployed to different servers under microservice architecture. From the user’s point of view, it acts like a local function call.
The diagram below illustrates the overall data flow for gRPC.
Step 1: A REST call is made from the client. The request body is usually in JSON format.
Steps 2 - 4: The order service (gRPC client) receives the REST call, transforms it, and makes an RPC call to the payment service. gRPC encodes the client stub into a binary format and sends it to the low-level transport layer.
Step 5: gRPC sends the packets over the network via HTTP2. Because of binary encoding and network optimizations, gRPC is said to be 5X faster than JSON.
Steps 6 - 8: The payment service (gRPC server) receives the packets from the network, decodes them, and invokes the server application.
Steps 9 - 11: The result is returned from the server application, and gets encoded and sent to the transport layer.
Steps 12 - 14: The order service receives the packets, decodes them, and sends the result to the client application.
What is a webhook?
The diagram below shows a comparison between polling and Webhook.
Assume we run an eCommerce website. The clients send orders to the order service via the API gateway, which goes to the payment service for payment transactions. The payment service then talks to an external payment service provider (PSP) to complete the transactions.
There are two ways to handle communications with the external PSP.
1. Short polling
After sending the payment request to the PSP, the payment service keeps asking the PSP about the payment status. After several rounds, the PSP finally returns with the status.
Short polling has two drawbacks:
- Constant polling of the status requires resources from the payment service.
- The External service communicates directly with the payment service, creating security vulnerabilities.
2. Webhook
We can register a webhook with the external service. It means: call me back at a certain URL when you have updates on the request. When the PSP has completed the processing, it will invoke the HTTP request to update the payment status.
In this way, the programming paradigm is changed, and the payment service doesn’t need to waste resources to poll the payment status anymore.
What if the PSP never calls back? We can set up a housekeeping job to check payment status every hour.
Webhooks are often referred to as reverse APIs or push APIs because the server sends HTTP requests to the client. We need to pay attention to 3 things when using a webhook:
- We need to design a proper API for the external service to call.
- We need to set up proper rules in the API gateway for security reasons.
- We need to register the correct URL at the external service.
How to improve API performance?
The diagram below shows 5 common tricks to improve API performance.
Pagination
This is a common optimization when the size of the result is large. The results are streaming back to the client to improve the service responsiveness.
Asynchronous Logging
Synchronous logging deals with the disk for every call and can slow down the system. Asynchronous logging sends logs to a lock-free buffer first and immediately returns. The logs will be flushed to the disk periodically. This significantly reduces the I/O overhead.
Caching
We can store frequently accessed data into a cache. The client can query the cache first instead of visiting the database directly. If there is a cache miss, the client can query from the database. Caches like Redis store data in memory, so the data access is much faster than the database.
Payload Compression
The requests and responses can be compressed using gzip etc so that the transmitted data size is much smaller. This speeds up the upload and download.
Connection Pool
When accessing resources, we often need to load data from the database. Opening the closing db connections adds significant overhead. So we should connect to the db via a pool of open connections. The connection pool is responsible for managing the connection lifecycle.
HTTP 1.0 -> HTTP 1.1 -> HTTP 2.0 -> HTTP 3.0 (QUIC)
What problem does each generation of HTTP solve?
The diagram below illustrates the key features.
-
HTTP 1.0 was finalized and fully documented in 1996. Every request to the same server requires a separate TCP connection.
-
HTTP 1.1 was published in 1997. A TCP connection can be left open for reuse (persistent connection), but it doesn’t solve the HOL (head-of-line) blocking issue.
HOL blocking - when the number of allowed parallel requests in the browser is used up, subsequent requests need to wait for the former ones to complete.
-
HTTP 2.0 was published in 2015. It addresses HOL issue through request multiplexing, which eliminates HOL blocking at the application layer, but HOL still exists at the transport (TCP) layer.
As you can see in the diagram, HTTP 2.0 introduced the concept of HTTP “streams”: an abstraction that allows multiplexing different HTTP exchanges onto the same TCP connection. Each stream doesn’t need to be sent in order.
-
HTTP 3.0 first draft was published in 2020. It is the proposed successor to HTTP 2.0. It uses QUIC instead of TCP for the underlying transport protocol, thus removing HOL blocking in the transport layer.
QUIC is based on UDP. It introduces streams as first-class citizens at the transport layer. QUIC streams share the same QUIC connection, so no additional handshakes and slow starts are required to create new ones, but QUIC streams are delivered independently such that in most cases packet loss affecting one stream doesn't affect others.
SOAP vs REST vs GraphQL vs RPC
The diagram below illustrates the API timeline and API styles comparison.
Over time, different API architectural styles are released. Each of them has its own patterns of standardizing data exchange.
You can check out the use cases of each style in the diagram.
Code First vs. API First
The diagram below shows the differences between code-first development and API-first development. Why do