The law of leaky abstractions



The text that established the "law of leaky abstractions" was written in 2002. Why am I translating it almost 20 years later? It still hasn't lost its relevance and is worth reading. The TCP protocol has not received a better alternative, and the law of leaky abstractions has only become entrenched in the lives of developers and is in danger of becoming an axiom. I will add that I did not recalculate all the time frames indicated in the text, so take into account some "time shift".



It is a key element of the internet's engineering magic that we rely on every day. This magic is in TCP, one of the fundamental building blocks of the Internet.



Using TCP is a way to reliably transfer data. To be clear, using TCP when sending a message over the network ensures that it arrives in its original form.



We use this protocol for a variety of tasks, such as loading web pages and sending emails. It is because of the reliability of TCP that emails arrive as they were sent. Even if it is useless spam.



By comparison, there is another, unreliable, a data transfer protocol called IP. Nobody gives guarantees that the data will be delivered. If you send a sequence of messages using IP, then do not be surprised when half of the messages will not be delivered, and the other half will come in random order. Moreover, there is a chance that some of the messages will turn into images of little monkeys or, more likely, messages will become unreadable garbage.



This is where the real magic happens: TCP is IP based. In other words, TCP is required to deliver data reliably using only an unreliable tool.



To make it clearer why this is magic, consider a realistic, albeit somewhat ridiculous, scenario from life.



Imagine we are in the business of sending actors from Broadway to Hollywood, and our responsibilities include transporting actors across the country. Some cars crash and the actors die. An actor might get drunk on the way and get a haircut or get a tattoo on his nose, which is why he won't be accepted in Hollywood anymore. And the most important thing: we send the actors in a strictly defined order, and they arrive in a random order, since each of them travels on its own route.



Now imagine a Hollywood Express service that guarantees: (a) delivery; (b) in the correct order; (c) in perfect condition. The miracles are that the Hollywood Express has no other way of transferring actors, except for the unreliable - by car. Hollywood Express checks every actor who comes, and if his condition is unsatisfactory, the service calls the actor's homeland and asks to send an identical twin. If the actors arrive in random order, the Hollywood Express will restore the original order. Even if a large alien ship on its way to Area 51 crashes and paralyzes the expressway in Nevada, the actors will simply change routes and travel through Arizona, and the Hollywood Express will not tell the producers in California about the incident. For the producers, everything will look like thisas if the actors drove a little longer than usual and there was no UFO crash.



TCP works much the same way. This is what the experts in Computer Science call abstraction — a simplification of something much more complicated going on under the hood. Most of programming is about building abstractions. What is a string library? It's a way to make working with strings as easy and convenient as working with numbers. What is a file system? This is a way to think of a hard drive not as a set of rotating magnetic platters storing bits in specific locations, but as a hierarchical structure of directories with files that contain data.



But back to TCP. I have embellished a little to make it easier to understand how TCP works. And I realize that such a simplification can bring some to white heat. I said that TCP guarantees message delivery. Well, it isn't. If your pet gnaws at your computer's network cable, IP packets will stop reaching it. Regardless of TCP effort, the message will not be delivered. If you were impolite with the system administrators in your company and decided to punish you by connecting to an overloaded hub, then only part of your IP packets will reach, in this case TCP will work, but extremely slowly.



This is what I call leaky abstractions.... TCP tries to abstract us from the untrustworthy network, but sometimes the network still "flows" through the abstraction and you come across things that the abstraction can't save you from. This is just one example of what I call the Law of Leaky Abstractions:

Any non-trivial abstraction is somewhat leaky.
Abstractions break down. Sometimes a little, sometimes a lot. This is called holes, leaks. Something is not going according to plan. This happens all over the place where abstraction is used. Here are some examples:



  • : . « », — . , . , , , , .
  • SQL , . , , , . SQL- , . , «where a=b and b=c and a=c» , «where a=b and b=c» . , , . , . , , .
  • NFS SMB , . , . « , ”. . ( ), .forward ( ), .forward . , .
  • , ++ . , , , . ++ +, s + “bar” . ? , , , “foo” + “bar”, ++ char*. , . (, ++ . — .)
  • , , , , , . . , , . .


It follows from the law of leaky abstractions that abstractions do not simplify our life as much as we would like. When I teach C ++, I would like to avoid talking about the char * data type and pointer arithmetic. It would be great to talk about STL right away, but one day students will write “foo” + “bar” and get scared, and I have to tell you about char *. Or someday they try to call a Windows API function with an OUT LPTSTR argument, and they still have to learn about char *, pointers, unicode and wchar_t, as well as TCHAR and everything that leaks through the abstraction.



When programming with COM (Component Object Model - approx. Transl.), It would be nice to learn right away the Visual Studio helpers and all the magic of code generation. But if at least something goes wrong, then the programmers will not have the slightest idea what happened, where to look for the error and how to fix it. And I will have to talk about IUnknown, CLSID and ProgIDS and ... Oh, humanity!



When teaching ASP.NET, it would be great to train to double click on objects and write code that will execute on the server when the user clicks on the object. In essence, ASP.NET eliminates the difference between handling a click on a hyperlink (the a) and handling the button click. But here's the problem: in HTML, you can't submit a form by clicking on a hyperlink and ASP.NET developers had to hide this problem. They solved the problem by generating multiple lines of JavaScript code in the onclick hyperlink handler. But, nevertheless, this is a hole in abstraction. If JavaScript is disabled for the end user, ASP.NET will not work correctly, and the application programmer, without realizing what ASP.NET is abstracting, will not be able to understand what happened.



The Law of Leaky Abstractions says that when someone comes up with a great new piece of code generation that should dramatically improve our efficiency, you'll hear, "first learn how to do it yourself and only then use this tool to save time." Code generation tools use abstractions in one way or another, which, of course, are full of holes. And the only way to deal with all the holes is to know how abstractions are applied and what exactly they hide. So abstractions save us time on work, but not time on learning.



It is paradoxical, but every time we invent higher-level tools with better abstractions, it becomes more difficult to become a professional programmer.



During my first internship at Microsoft, I developed libraries for working with strings on the Macintosh. Here's an example of a typical job: write a version of strcat that returns a pointer to the end of a newline. Just a few lines of C code. Everything I did was straight out of K&R, one thin book about the C programming language,



and then I get a job at CityDesk (company closed in 2016). Now I need to know Visual Basic, COM, ATL, C ++, InnoSetup, Internet Explorer internals, regular expressions, DOM, HTML, CSS, and XML. These are all high level instruments compared to the K&R stuff, however I still need to know all the K&R stuff.



Ten years ago, we could imagine that new programming paradigms would make it easier for us to develop now. In fact, the abstractions we have created over the decades allow us to easily get along with new levels of complexity that we didn’t succumb to 10-15 years ago, as in the case of GUI development or networking. And now we have a lot of great tools, such as object-oriented forms-aware languages, that allow us to get our work done incredibly quickly. Until one day we are faced with a problem where the abstraction "leaks", and we need two weeks to solve. When you need to hire a Visual Basic programmer to do only VB code, this is not the best idea. Because such a programmer would get stuck every time he stumbled upon a hole in the Visual Basic abstraction.



The law of leaky abstractions is pulling us to the bottom.



All Articles