😽 🔬 🍄 How I learned to take architectural sections 🐓 👃🏿 👵🏽

Architectural sections cause a feeling of uncertainty and anxiety in many people: the wording is not full of details, how to check the answer is not clear. At the same time, the ability to pass the architectural section distinguishes yesterday's graduate from a person who can be trusted to build something more than traversing binary trees. At a certain point, I decided to properly prepare the design section, spent about a couple of weeks on this, and developed a systematic approach that I want to share with you.

Plan

It is very important to have it. Even if you make a mistake somewhere and turn the wrong way in your reasoning, the overall structured approach will play into your hands. At the very beginning, you can moderately blunt and ask the interviewee about the details, but starting from a certain point (which in my plan below corresponds to point 3) you should have the initiative completely and it is best not to let go of it until the very end. My plan was something like this:

Collect a list of key features and write them out in the corner of the board. This simple trick will help you remember an important constraint or assumption.
Understand what technical characteristics the system should have: expected RPS, range of acceptable response times, expectations in terms of consistency and reliability.
Build the simplest one-machine solution that will somehow work. You don't need to start with 20 datacenters around the world, it's much better to gradually come to this.
Find a single point of failure or performance bottleneck.
Offer one or more options for solving the problem, clearly explain the pros and cons of each of them
Choose one of the options and go to step 4, if there is still time, and if it ends, go to the next item
Estimate the size of the storage, the number of servers, network bandwidth, carefully write it all down
Bonus: talk about additional features, ML implementation, product metrics, experiments

Time control is very important. I tried to spend 5-10 minutes on the first two points and 5 minutes on the last two.

Tradeoffs

They need to be spoken, even if they seem obvious. After introducing any new part, it is important to say something like "we added a new element, this will solve such and such a problem, but we will pay for that." Tradeoffs can be something like this:

Any new system components or an increase in the number of existing spare parts solve the load / response speed problem, but add headaches with support and deployment.
Sharding solves the load and space constraints, but adds re-sharding problems in the future.
Replicated storage solves the problem of load and reliability, but in the case of read and write replicas, it makes you think about rotten values and the opposition of availability and consistency
The cache solves the load problem, but makes you think about rancid values and cache coherency.
Your own solution can be easily modified and optimized for your needs, but you have to write it first.
The good thing about the existing solution is that it already exists, but you have to figure it out.

Numbers

Everyone knows about latency numbers every programmer should know , but the numbers on the link, in my opinion, are not structured in the most convenient way and I reformatted them during the preparation for ease of memorization.

Ultimately, the following is important:

Know the time spent reading data from different levels of processor caches, memory, SSD, HDD and network.
Remember the time of round trips inside the data center and around the globe, as well as the minimum latency that a person perceives as lag (~ 100ms).
To be able to quickly convert bytes to gigabytes, nanoseconds to seconds, etc., I developed this skill by itself in the process of practice.

Practice

I bought a whiteboard, took existing services and tried to figure out how I would make them from scratch. I drew diagrams on the board, figured out the load and the necessary resources, looked for weak points in my design. I also have great friends with whom we arranged pseudo-sections and trained on each other - it was a super rewarding experience. After practice, you can go online and look for how it is actually done, and then try again. After 10-20 rounds with different services, enlightenment sets in and individual recurring parts in existing systems begin to be clearly visible. Spare parts can be, for example:

Search (preferably with the ability to update the index in real time)
(gfs, haystack)
kv- (cassandra, dynamo)
Message queue pub-sub (kafka)
(twitter, instagram, facebook)
, , - (whatsapp, telegram, battle.net)
, - (skype, twitch, youtube)

Grokking the system design interview. , , .
System design primer. , .
( ). . , , .
A large selection of High Scalability
Well, the most important resource is your friends and acquaintances who know how their systems work and can tell you about them.

Several good videos and channels

1. Scalability

2. Intro to Architecture and Systems Design Interviews

3. Four Distributed Systems Architectural Patterns

4. Dropbox in 2012

5. Slack

6. Twitter

7. Reddit

8. Instagram

9. Youtube in 2007

10. Channel about System Design from a compatriot

11 . another channel

12. And another channel

If you don't have a hard time frame, but the prospect of an interview is already looming on the horizon, the most correct tactic would be to constantly read / look something in the background on the subject of large systems. It's the same with algorithmic puzzles: it's better to solve them periodically and be always in good shape than to try to master the entire litcode on the weekend before the interview. However, intensive preparation for the architectural section in a short time made me a much better specialist.

How I learned to take architectural sections

Plan

Tradeoffs

Numbers

Practice

Several good videos and channels

More articles: