Dear Google Cloud, backward compatibility is killing you

Damn it Google, I didn't want to blog again. I have so much to do. Blogging takes time, energy and creativity that I could usefully use: my books, music , my games, and so on. But you pissed me off enough to write it down.



So let's get this over with.



I'll start with a small but instructive story from when I first started working at Google. I know I've said a lot of bad things about Google lately, but it upsets me when my home company regularly makes incompetent business decisions. That said, we must pay tribute: Google's internal infrastructure is truly extraordinary, and we can safely say that there is nothing better today. The founders of Google were far better engineers than I will ever become, and this story only confirms this fact.



First, a little background: Google has a storage technology called Bigtable . It was a remarkable technical achievement, one of the first (if not the first) "infinitely scalable" key-value store (K / V): essentially the beginning of NoSQL. Bigtable still feels good in the rather crowded K / V storage space these days, but at the time (2005) it was amazingly cool.



One funny thing about Bigtable is that they had internal control plane objects (as part of the implementation) called tablet servers, with large indexes, and at some point they became a bottleneck when scaling the system. Bigtable engineers were racking their brains about how to implement scalability, and suddenly realized that they could replace tablet servers with other Bigtable storages. So Bigtable is part of the Bigtable implementation. These storage facilities are there at all levels.



Another interesting detail is that for a while Bigtable became popular and ubiquitous inside Google, and each team had its own repository. So at one of Friday's meetings, Larry Page casually asked in passing, “Why do we have more than one Bigtable? Why not just one? In theory, one storage should have been enough for all of Google's storage needs. Of course, they never jumped to just one for practical development reasons (like the consequences of a potential failure), but the theory was interesting. One repository for the entire universe (by the way, does anyone know if Amazon did this with its Sable? )



Anyway, here's my story.



At that time, I worked at Google for a little over two years, and one day I received an email from the Bigtable engineering team something like this:



Dear Steve,



Greetings from the Bigtable team. We would like to inform you that you are using a very, very old Bigtable binary in [datacenter name] data center. This version is no longer supported and we want to help you upgrade to the latest version.



Please let me know if you can schedule some time to work together on this issue.



All the best,

Bigtable Team


You get a lot of mail on Google, so at first glance I read something like this:



,



- . , ----. -----, -- .



, , --.



,

-


I almost deleted it right away, but at the edge of my consciousness I felt a painful, aching feeling that this does not quite look like a formal letter, although it is obvious that the recipient was mistaken, because I did not use Bigtable.



But that was weird.



For the rest of the day, I alternately thought about work and what kind of shark meat to try in a micro-kitchen, of which at least three were close enough to get out of my place with a well-aimed sponge cake, but the thought of writing did not leave me with growing feeling. slight anxiety.



They clearly called my name. And the email is sent to my email address, not someone else's, and it's not cc: or bcc :. The tone is very personal and clear. Maybe this is some kind of mistake?



Finally, curiosity got the better of me and I went to take a look at the Borg console in the data center they mentioned.



And of course, I had the BigTable storage under my control. I'm sorry, what? I looked at its contents, and - wow! It was from the Codelab incubator, where I sat my first week at Google in June 2005. Codelab forced you to start Bigtable so that you write some values ​​there, and I probably never closed the repository after that. It still worked, although more than two years had passed.



There are several notable aspects to this story. Firstly, Bigtable's work was so insignificant on the scale of Google that only two years later someone noticed the extra storage, and even then only because the version of the binary was outdated. For comparison, I once considered usingBigtable on Google Cloud for my online game. At the time, this service cost roughly $ 16,000 per year for an empty Bigtable on GCP. I'm not saying they are cheating on you, but in my personal opinion, this is a lot of money for an empty fucking database.



Another notable aspect is that the storage was still working after two years... WTF? Data centers come and go; they experience interruptions, they undergo scheduled maintenance, they change all the time. The hardware is updated, the switches are swapped, everything is constantly being improved. How the hell did they keep my program running for two years with all these changes? This may seem like a modest achievement in 2020, but it was quite impressive in 2005-2007.



And the most wonderful aspect is that an outside engineering team in some other state contacts me, the owner of some tiny, almost empty instance of Bigtable, which has had zero traffic for the last two years - and offer help to update it. ...



I thanked them, removed the vault, and life went on as usual. But thirteen years later, I still think about this letter. Because sometimes I get emails like this from Google Cloud. They look like this:



Dear Google Cloud User, We



remind you that we will be discontinuing service [an important service you use] from August 2020, after which you will not be able to update your instances. We recommend that you upgrade to the latest version, which is in beta testing, has no documentation, no migration path, and is outdated in advance with our kind help.



We are committed to minimizing the impact of this change on all users of the Google Cloud platform.



Best Friends Forever,

Google Cloud Platform


But I hardly read such letters, because in fact they say the following:



Dear recipient,



Fuck you. Fuck you, fuck you, fuck you. Throw away everything you do because it doesn't matter. What matters is our time. We spend time and money to support our shit and we are tired of it so we won't support it anymore. So drop your fucking plans and start digging through our shitty documentation, begging for scraps on the forums, and by the way, our new shit is completely different from the old shit because we messed up this design pretty badly, heh, but that's your problem, not our.



We continue to work hard to make all your designs unusable within one year.



Please go nah,

Google Cloud Platform


And the fact is that I receive such letters about once a month. This happens so often and so constantly that they inevitably pushed me away from the GCP and into the opposing cloud camp. I no longer agree to depend on their proprietary developments, because in fact, it is easier for a devops to maintain an open source system on a bare virtual machine than trying to keep up with Google with its policy of closing "outdated" products.



Before going back to Google Cloud because I'm even closedid not finish criticizing them, let's take a look at the company's work in some other areas. Google engineers pride themselves on their software engineering discipline, and this is what actually causes problems. Pride is a trap for the unwary; it has led many Google employees to think that their decisions are always right and that being right (by some vague, fuzzy definition) is more important than customer care.



Here are some arbitrary examples from other large projects outside of Google, but I hope you see this pattern everywhere. It is this: backward compatibility keeps systems alive and up-to-date for decades .



Backward compatibility is the design goal of all successful systems designed toopen use, that is, implemented with open source and / or open standards. I feel like I’m saying something too obvious that everyone’s even uncomfortable, but no. This is a political issue, so examples are needed.



The first system I choose is the oldest: GNU Emacs, which is sort of a hybrid between Windows Notepad, the OS kernel, and the International Space Station. It's a bit difficult to explain, but in a nutshell, Emacs is a platform created in 1976 (yes, almost half a century ago) for programming to increase your productivity, but disguises itself as a text editor.



I use Emacs every single day. Yes, I also use IntelliJ every day, it has become a powerful tooling platform itself. But writing extensions for IntelliJ is much more ambitious and difficult than writing extensions for Emacs. And more importantly, everything written for Emacs will last forever .



I still use software that I wrote for Emacs back in 1995. And I'm sure someone is using modules written for Emacs in the mid-80s, if not before. They may require minor adjustments from time to time, but this is really quite rare. I don’t know anything that I’ve ever written for Emacs (and I have written a lot) that would have to re-architect the architecture.



Emacs has a function called make-obsolete for legacy entities. Emacs terminology for fundamental computer concepts (such as what a "window" is) often differs from industry conventions because Emacs introduced them a long time ago. This is a typical danger for those ahead of their time: all your terms are incorrect. But Emacs does have a concept of obsolescence, which in their jargon is called obsolescence .



But in the Emacs world there seems to be a different working definition. Another underlying philosophy, if you will.



In the world of Emacs (and in many other areas that we'll cover below), the legacy API status basically means, “You really shouldn't use this approach, because while it works, it suffers from various drawbacks that we'll list here. But in the end, it's your choice. "



In the Google world, legacy product status means "We are in breach of our obligation to you." It really is. This is what it essentially means. This means that they will force you to do some work on a regular basis , perhaps a lot of work, as a punishment for believing in their colorful ads : we have the best software. The fastest! You do everything according to the instructions, launch your application or service, and then bam, after a year or two it breaks.



It's like selling a used car that will definitely break down after 1500 km.



These are two completely different philosophical definitions of "obsolescence". Google's definition smells like planned obsolescence . I don’t believe it really isplanned obsolescence in the same sense as Apple. But Google is definitely planning to break your programs in a roundabout way. I know this because I worked there as a software engineer for over 12 years. They have vague internal guidelines as to how much backward compatibility should be, but ultimately it depends on each individual team or service. There is no corporate or engineering grade recommendation, and the boldest recommendation in terms of obsolescence cycles is “try to give customers 6-12 months to upgrade before breaking the whole system.”



The problem is much more serious than they think, and it will persist for years to come because customer care is not in their DNA. More on this below.



For now, I'm going to make the bold assertion that Emacs is successful in large part, and even mostly , because they take backward compatibility so seriously. Actually, this is the thesis of our article. Successful long-lived open source systems owe their success to microcommunities that have lived around extensions / plugins for decades . This is the ecosystem. I've already talked about the essence of platforms and how important they are, and how Google has never, in its entire corporate history, understood what goes into creating a successful open platform, apart from Android or Chrome.



Actually, I have to briefly mention Android, because you probably thought about it.



First, Android is not Google... They have almost nothing to do with each other. Android is a company that was bought by Google in July 2005, this company was allowed to run more or less autonomously and in fact has remained largely intact over the years. Android is an infamous tech stack and an equally infamous thorny organization. As one googler put it, "You can't just go into Android."



In a previous post, I've discussed how bad some of the early Android designs were. Heck, when I wrote that article, they were deploying a shit called "Instant Apps" which are now (surprise!) Obsoleteand I sympathize if you were stupid enough to listen to Google and bring your content to these instant apps.



But there is a difference, a significant difference, which is that Android folks really understand how important platforms are, and they try their best to keep old Android apps working. In fact, their efforts to maintain backward compatibility are so extreme that even I, during my brief stint in the Android division a few years ago, found myself trying to convince them to drop support for some of the oldest devices and APIs (I was wrong, as was many other things past and present. Sorry Android guys! Now that I've been to Indonesia, I understand why we need them).



The Android folks maintain backward compatibility to almost unimaginable extremes, piling up a huge amount of obsolete tech debt in their systems and toolchains. Oh my god, you should have seen some crazy things they have to do in their build system, all in the name of compatibility.



For this, I give Android the coveted You Are Not Google award. They really don't want to become Google, which doesn't know how to build platforms that last, but Android knows how to do it. And so Google is very wise in one respect: it allows people on Android to do things their own way.



However, instant Android apps were a pretty dumb idea. Do you know why? Because they demandedrewrite and redesign your application ! As if people would just take and rewrite two million apps. I guess the instant apps were some googler's idea.



But there is a difference here. Backward compatibility is expensive. Android itself bears the burden of these costs, while Google insists that you , the paid customer, bear the burden .



You can see Android's commitment to backward compatibility in its APIs. When you have four or five different subsystems to do literally the same thing, it's a sure sign that a commitment to backward compatibility is at the core. Which in the platform world is synonymous with commitment to your customers and your market.



Google's main problem here is their pride in their engineering hygiene. They don't like it when there are many different ways to do the same thing, with older, less desirable ways sitting next to newer, more bizarre ways. It increases the learning curve for newcomers to the system, it increases the burden of maintaining legacy APIs, it slows down the speed of new features, and the main sin is ugly. Google - like Lady Ascot from "Alice in Wonderland" by Tim Burton:



Lady Ascot:

- Alice, you know what I'm most afraid of?

- The decline of the aristocracy?

- I was afraid that I would have ugly grandchildren .



To understand the trade-off between beautiful and practical, let's take a look at the third successful platform (after Emacs and Android) and see how it works: Java itself.



Java has a ton of legacy APIs. Deprecation is very popular with Java programmers, even more popular than most programming languages. In Java itself, the main language and libraries, API deprecations are constantly occurring.



If you take just one of thousands of examples, closing streams is deprecated. It has been deprecated since the release of Java 1.2 in December 1998. It's been 22 years since this was deprecated.



But my real production code still kills threads every day... Is it good? Absolutely! I mean, of course, if I were to rewrite the code today, I would implement it differently. But the code for my game, which has made hundreds of thousands of people happy over the past two decades, is written with a function to close threads that hang for too long, and I never had to change it . I know my system the best, I have literally 25 years of experience working with it in production, and I can say for sure: in my case, closing these specific work streams is completely harmless . You shouldn't waste time and effort rewriting this code, and praise to Larry Ellison (probably) that Oracle did not force me to rewrite it.



Probably Oracle also understands platforms. Who knows.



Proofs can be found for all of the core Java APIs that are riddled with waves of obsolescence, like glacier lines in a canyon. You can easily find five or six different keyboard navigation managers (KeyboardFocusManager) in the Java Swing library. It's actually hard to find a Java API that isn't deprecated. But they still work! I think the Java team will only really remove the API if the interface causes an egregious security issue.



Here's the thing guys: we software developers are all very busy, and in every area of ​​software we are faced with competing alternatives. At any given time, X programmers see Y as a possible replacement. Oh don't you believe me? Do you want to be called Swift? Like, everyone migrates to Swift and no one gives up on it, right? Wow, how little you know. Companies are counting the costs of dual mobile development teams (iOS and Android) - and they are beginning to realize that these funny-named cross-platform development systems like Flutter and React Native do work and can reduce the size of their mobile teams. twice, or, conversely, make them twice as productive. Real money is at stake. Yes, there are compromises, but, on the other hand, de-e-money.



Suppose, hypothetically, that Apple foolishly took the example of Guido van Rossum and announced that Swift 6.0 is backward incompatible with Swift 5.0, much like Python 3 is incompatible with Python 2.



I probably told this story ten years ago, but fifteen years ago I went to O'Reilly's Foo Camp with Guido, sat in a tent with Paul Graham and a bunch of big bumps. We sat in the sweltering heat and waited for Larry Page to take off in his personal helicopter, while Guido mumbled monotonously about the Python 3000, which he named after the number of years it would take everyone to migrate there. We asked him all the time why it breaks compatibility, and he replied: “Unicode”. And we asked, if we have to rewrite our code, what other benefits will we see? And he answered “Yoooooooooooooouuuuuuuniiiiiiicoooooooode”.



If you install the Google Cloud Platform SDK (“gcloud”), you will receive the following notification:



Dear Recipient,



We would like to remind you that Python 2 support is deprecated, so do you?


… etc. The circle of life.



But the point is, every developer has a choice. And if you get them to rewrite the code often enough, then they may think about other options. They are not your hostages, no matter how much you want them to be. They are your guests. Python is still a very popular programming language, but heck, Python 3 (000) has created such a mess on its own, in its communities, and among users of its communities that the consequences have not been cleared up for fifteen years.



How many Python programs have been rewritten in Go (or Ruby, or some other alternative) due to this backward incompatibility? How much new software has been written in something other than Python, even though it might have beenwritten in Python, if Guido hadn't burned down the whole village? It's hard to tell, but Python has clearly suffered. This is a huge mess and everyone is a loser.



So let's say Apple follows the example of Guido and breaks compatibility. What do you think will happen next? Well, maybe 80-90% of developers will rewrite their software if they can. In other words, 10-20% of the user base automatically goes to some competing language like Flutter.



Do this a few times and you will lose half of your user base. As in sports, in the programming world, current shape also means everything.... Anyone who loses half of their users in five years would be considered a Big Fat Loser. You should be trending in the platform world. But this is where refusing to support older versions will kill you over time. Because every time you get rid of a part of the developers, you (a) lose them forever, because they are angry with you for breaking the contract, and (b) give them to your competitors.



Ironically, I, too, helped turn Google into the kind of backward-compatible prima donna when I created Grok, a source code analysis and understanding system that facilitates code-based automation and tooling - similar to an IDE, but here the cloud service stores materialized representations of all the billions of lines of Google source code in a large data warehouse.



Grok provided Googlers with a powerful framework for automated refactoring across the entire codebase (literally all over Google). The system calculates not only your upstream dependencies (which you depend on), but also downstream dependencies (which depend on you), so when you change the API you know everyone you break! This way, when you make changes, you can check that every consumer of your API is updated to the new version, and in reality, often with the Rosie tool they wrote, you can completely automate the process.



This allows Google's codebase to be internally almost supernaturally "clean", since they have these robotic servants scurrying around the house and automatically cleaning up everything if they renamed SomeDespicablyLongFunctionName to SomeDespicablyLongMethodName, because someone thought it was an ugly grandson, and he need to be put to sleep.



And to be honest, it works pretty well for Google ... internally. I mean, yes, the Go community at Google has a really good laugh at the Java community at Google because of their habit of continual refactoring. Restarting something N times means that you not only messed it up N-1 times, but after a while it becomes quite clear that you probably messed it up on the Nth try. But by and large, they stay above the fuss and keep the code "clean".



The problems start when they try to impose this attitude on their cloud clients and users of other APIs.



I introduced you to Emacs, Android, and Java a bit; Let's take a look at the latest successful long-lived platform: the Web itself. You can imagine how many iterations HTTP has gone through since 1995, when we used blinking <blink> tags and under construction icons on web pages.



But it still works! And these pages are still working! Yes folks, browsers are the world's backward compatibility champions. Chrome is another example of a rare Google platform that has its heads screwed on correctly, and, you guessed it, Chrome effectively acts as an isolated company separate from the rest of Google.



I also want to thank our friends among operating system developers: Windows, Linux, NOT APPLE FOLLOW YOU APPLE, FreeBSD and so on, for doing so much backward compatibility work on their successful platforms (Apple gets at best a top three with minus, since they constantly break everything for no good reason, but somehow the community handles this in every release, and so far OS X containers are not completely outdated ... yet).



But wait, you say. Aren't we comparing apples to oranges - standalone software systems on a single machine like Emacs / JDK / Android / Chrome, with multi-server systems and APIs like cloud services?



Well, I tweeted about it yesterday, but in the Larry Wall style of suck / rule, I looked up the word deprecated on Google and Amazon developer sites. Although AWS has hundreds of times more service offerings than GCP, Google's developer documentation mentions deprecation about seven times more often.



If someone from Google reads this, then they are probably ready to pull out diagrams showing the Donald Trump style that in fact they are doing everything right, and that I should not make unfair comparisons, such as “the number of times deprecated by the number of services ".



But after so many years, Google Cloud is still # 3 (I never wrote an article about the failed attempt to become # 2), but if you believe the insiders, there are some fears that they may soon sink to # 4.



I have no compelling arguments to "prove" your thesis. All I have are colorful examples that I have accumulated over 30 years as a developer. I have already mentioned the deeply philosophical nature of this problem; in a sense, it is politicized in the developer communities. Some people think that platform creators should care about compatibility, while others believe that this is the concern of users.(the developers themselves). One out of two. Indeed, isn't it a political issue when we decide who should bear the costs of common problems?



So this is politics. And there will certainly be angry responses to my speech.



As a user of the Google cloud platform, as well as an AWS user for two years (at Grab), I can say that there is a huge difference between the philosophies of Amazon and Google when it comes to priorities. I'm not actively developing on AWS, so I don't know very well how often they remove old APIs. But there is a suspicion that this does not happen as often as in Google. And I truly believe that this source of constant controversy and frustration in GCP is one of the biggest constraints on the platform's development.



I know I have not named specific examples of GCP systems that are no longer supported. I can say that practically everything I have used, from networking (from the oldest to VPC) to storage (Cloud SQL v1-v2), Firebase (now Firestore with a completely different API), App Engine (let's not even start), cloud endpoints and before ... I don’t know - absolutely all of this made you rewrite the code in a maximum of 2-3 years, and they never automated the migration for you, and often there was no documented migration path at all . As if it should be.



And every time I look at AWS I ask myself why the hell am I still sitting on GCP. They clearly don't need clients. They want buyers . Do you understand the difference? Let me explain.



Google Cloud has a Marketplace where people offer their software solutions, and to avoid the effect of an empty restaurant, they had to fill it with some suggestions, so they contracted with Bitnami to create a bunch of solutions that are deployed "with one click", or I have to write the "solutions" myself, because these don't solve a damn thing. They just exist as flags, as marketing filler, and Google never cared if any of the tools actually worked. I know product managers who have driven, and I can assure you that these people don't care.



Take, for example, the "one-click" deployment solution Percona... I was bored to death by the Google Cloud SQL antics, so I began to consider creating my own Percona cluster as an alternative. And this time Google seemed to do a good job, they were going to save me some time and effort with the click of a button!



Well, great, let's go. Let's follow the link and press this button. Select "Yes" to agree to all the defaults and deploy the cluster in your Google cloud project. Haha, it doesn't work. None of this shit works. The tool has never been tested and it started to rot from the first minute, and it wouldn't surprise me if more than half of the “solutions” for one-click deployment (now we understand why the quotes) do n't work at all. This is absolutely hopeless darkness, where it is better not to enter.



But Google explicitly encourages you to use them. They want you to buy them . For them, it's a transaction. They don't want to support anything . It's not part of Google's DNA. Yes, engineers support each other, as evidenced by my story with Bigtable. But in products and services for ordinary people, they have always been ruthless in shutting down any service that falls short of the bar for profitability, even if it has millions of users.



And that presents a real challenge for GCP because this DNA is behind all cloud offerings. They don't seek to support anything; it is well known that they refuse to host (as a managed service) any third party softwareas long as AWS does not do the same and will not build a successful business around, and when customers require just the same. However, it takes some effort to get Google to support something.



This lack of a culture of support, coupled with the "let's break it down to make it beautiful" principle, alienates developers from them.



And that's not good if you want to build a long-lived platform.



Google wake up, damn it. It's 2020. You're still losing. It's time to take a close look in the mirror and answer if you really want to stay in the cloud business.



If you wanna stay then stop breaking everything... You guys are rich. We developers are not. So when it comes to who will take on the burden of compatibility, you need to take it upon yourself. Not for us.



Because there are at least three more really good clouds. They beckon to them.



And now I will go on fixing all my broken systems. Eh.



Until next time!



PS Update after reading some of the discussions in this article (discussions are great by the way). Firebase has not been discontinued and there are no plans that I am aware of. However, they have a nasty streaming error that is causing the Java client to stop in App Engine. One of their engineers helped me with this problem when I was working at Google, , , GAE. ! Firestore. , , , Firebase . ? , . , , Firebase GAE, 100 100% , - . , . Redis.



, AWS , AWS , SimpleDB — . , AWS , Google, , .



, , 20 Google App Engine Go, GAE Go. , .



, , ( , !). , , Google . , , AWS, Grab. - , !



, 2005 43, . 2006 . Bigtable 2007 .



Bigtable (-), . , , , , , .



, Apple Microsoft . . , , ! , , ?



.



2, 19.08.2020. Stripe API!



Update 3, 08/31/2020. I was contacted by a Google engineer at Cloud Marketplace who turned out to be an old friend of mine. He wanted to find out why C2D doesn't work, and in the end we figured out: the reason is that I created my network several years ago, and C2D does not work on legacy networks due to the missing subnet parameter in their templates. I think potential GCP users are better off making sure they have enough familiar engineers at Google ...



All Articles