How Uber rewrote the iOS app with Swift

So, friends, sit down in a circle and listen to the story of the biggest engineering disaster in which I participated. It's a story about politics, architecture, and the logic fallacy of sunk costs (sorry, I'm just drinking Aberlour Cask Strength Single Malt Scotch right now).





It was 2016. Trump has not yet been elected president, so the #DeleteUber movement has not yet begun. Travis Kalanick remained a gender, we were experiencing a phase of hyperactive growth with the opening of branches in other countries, public sentiment is generally positive, everyone is happy, Uber is at its best.



But hypergrowth was not without problems, and the application itself began to malfunction. Before that, the number of developers doubled almost every year, and when you grow that fast, you get an incredible range of skills. Combined with the hacker mentality we called "Let builder's build", this meant a complex and fragile application architecture. At that time, the Uber app had extremely heavy logic, so it often crashed. We were constantly releasing hotfixes, patches, unplanned releases, etc. Also, the architecture did not scale well.



As a result of all these problems, a growing movement began at all levels of the organization that rallied around the idea of ​​"rewrite the application from scratch." A team was formed to create a new mobile architecture for the new application. The idea was to create an architecture that "would support Uber's mobile development over the next five years." We developed for both platforms at once. The entire development cycle started over.



The iOS department took this opportunity to implement Swift (then in version 2.x). Uber had tried Swift in the past, but like so many others at that early stage of the technology's development, it experienced many problems and delayed implementation.



However, the general feeling was that most of Swift's problems at the time were due to poor interoperability with Objective-C. And if we write a pure Swift application, we could avoid the main problems.



There was also an idea to use the same basic architectural patterns on both Android and iOS. Android developers were big fans of RxJava at the time. The corresponding RxSwift library took advantage of the functional programming paradigm in Swift. Everything seemed simple.



So a small development team (Design, Product, and Architecture) went headlong into new functional / reactive patterns, a new language, and a new application for several months. Everything was going well. The architecture relied heavily on the advanced language capabilities of Swift.



The UI could scale to a large number of Uber apps, the functional programming paradigm seemed powerful (albeit a little difficult to learn), and the architecture was based on a new real-time streaming network protocol (I wrote this part).



After a couple of months and several striking demos, the movement gained momentum. The project looked successful. With a small number of engineers, it was possible to develop excellent functionality in a short time. Most of the product is ready. The guide is pretty.



Then the deployment to the whole company began. Various teams have started adding their own features to the new application. At first, the excitement of the new created a flurry of motivation and productivity. The architecture provided for isolation of functions, which allowed for rapid advancement.



But as soon as more than ten engineers mastered Swift, the well-coordinated mechanism began to fall apart. The Swift compiler is still significantly slower than Objective-C today, but was then practically unusable. The assembly time went off scale. Debugging has completely stopped.



Somewhere there is a video from one of the demos, where an Uber engineer types a one-line statement in Xcode, and then waits 45 seconds for the letters to slowly, one by one, appear in the editor.



Then we hit a wall with a dynamic linker. At the time, Swift libraries could only be linked dynamically. Unfortunately, the linker ran in polynomial time, so Apple's recommended maximum number of libraries in a single binary file was 6. We had 92, and the number kept growing ...



As a result, after clicking on the application icon, it took 8-12 seconds before even calling main. Our shiny new app turned out to be slower than the old awkward one. Then there was the problem of the size of the binary.



Unfortunately, when the problems began to manifest themselves seriously, we had already passed the point of no return. This is the logical fallacy of sunk cost fallacy. At that point, the entire company was putting all its energy into the new application.



Thousands of people from different directions, millions and millions of dollars (I can't give the real number, but much more than one). All management is unanimous in supporting the project. I had a private conversation with my boss about the need to stop.



He said that if this project failed, he would have to pack. The same was true for his boss all the way up to the vice president. There was no exit.



So we rolled up our sleeves and got the best developers to tackle each of the problems, prioritized the critical issues (dynamic linking, binary size). I was assigned both dynamic linking and the size of the binary, in that order.



We quickly discovered that the linking problem at application startup could be solved by placing all of the code in the main executable. But as we all know, Swift combines namespaces with frameworks; therefore huge code changes would be required, including countless namespace checks.



It was then that the brilliant Richard Howell examined the build output of Xcode and discovered that after the build was complete, he could take all the intermediate object files and relink them back into the main binary using a custom script.



Since Swift distorts the namespace of objects during compilation, it means that it can operate on it. This allowed us to efficiently statically link our libraries and reduce the startup time of main from 10 seconds to almost zero.



The next problem is size. At that time, as a safety net, we planned to package the new application with the old one - and carefully deploy it at runtime. To reduce the size, the first thing we did was just uninstall the old app. We called this strategy "Yolo". Travis personally gave the go-ahead.



We've also replaced all Swift structures with classes . Value types generally give a lot of overhead due to object alignment and additional machine code that is required for copying behavior, autoinitializers, and so on. This saved space.



But the app continued to grow. Soon, we hit the download limit (100 MB) of binaries in iOS 8 and earlier. This translates into a significant number of lost installs ($ 10 + million in lost revenue due to many iOS users not yet updated).



At this point, there were several weeks before the public launch. We had to either return to Objective-C, or drop support for iOS 8. Since iOS 9 introduced the ability to split the architecture, this version was actually half the size (give or take). When there was only a week left, we decided to throw tens of millions of dollars away - and drop support for iOS 8.



The general opinion was that when the size was reduced in half, we had a lot of room for maneuver, and the problem with the size could be solved sometime in the future. when we rake the rest. Unfortunately, we were very wrong.



After the release of the app, we had a huge party. The app was well received by users and the press. It was fast, with a bright new design.



A lot of people got promoted. We all breathed a sigh of relief. After 90 continuous weeks of work, the guys finally got a break.



But then public opinion began to change. The new app focused on calculating the exact price of a trip for a specific route (in the old days, you just saw the fare and the current multiplier). To calculate the price, you had to enter your current location.



For the convenience of users, we have also installed automatic location determination, allowing the collection of location data in the background so that the driver can see exactly where to pick up the passenger at the current time. People started to go crazy. Some of my former co-workers on Twitter urged me to quit the evil company that tracks people like this.



As a result of this unrest, people started to disable location permission in iOS. But the new application did not provide for this use case.



So we tried our best to return the standard version. We discussed that it is possible to turn off background location tracking, but that again ruins usability before getting into a taxi.



Then Trump came to power (this happened about three months after the release of the new app), which set off a chain reaction that led to the #DeleteUber movement .



All this time, the Swift codebase has grown rapidly. Ongoing problems and a slow IDE have spawned two warring factions among our iOS developers. I'll call them Swift fanatics and Objective-C nerds.



The sum of the external and internal pressure brought the tension to the maximum. Fanatics denied Swift's problems. The nerds complained about everything imaginable without offering any special solutions.



Around this time, we were hit by a problem with the size of the binary. I was on call when the team had issues with the release. It turns out that our brilliant solution to the dynamic linking problem created an executable that was too large for some architectures.



Having solved the problem on these architectures, my colleague @aqua_geek and Idid a little digging and found that the size of the compiled code is growing at a rate of 1.3 MB per week. I raised the alarm. If nothing is done, at such a speed, we will run into the download limit over the cellular network in three weeks.



But the internal tension reached such a stage that the fanatics denied everything. One of the tech leaders from the Swift camp wrote a two-page article about how cellular download limits don't matter (Facebook, after all, exceeded it long ago). We ourselves are tired of putting out fires.



Therefore, one of our data scientists developed a test by artificially shifting one of the architectural layers outside the limit - and measuring the impact on business performance. The next week we pulled that layer back out and pushed another one out of the limit (to control the architectures).



The effect was disastrous. The negative impact on the business turned out to be several orders of magnitude greater than all the costs of the annual Swift implementation. It turns out that a lot of people are out of WiFi range when they download the Uber app for the first time (who would have thought?)



So we formed another strike group. We started to decompile the object files and examine line by line to determine why the Swift code has grown so large. Removed unused functions. Tyler had to rewrite the watchOS app back to objc.



(The watch application was only 4400 lines long, but due to the different processor architecture and lack of ABI compatibility, the entire Swift runtime would have to be included in the application.)



We were at our limit. So tired. But they got together. It was then that truly brilliant engineers showed themselves. One of the developers in Amsterdam figured out how to rearrange the compiler optimization passes. For those who are not an expert on compilers, I will explain.



Modern compilers make a ton of passes. For example, one can inline functions. Another is to replace constant expressions with their values. Depending on the order of execution, the machine code may be smaller or larger.



If inline functions pass a value, the compiler can recognize this and replace the entire block. For example:



int x = 3
func(x) {
X + 4
}
      
      





becomes just a constant 7 if the compiler passes through the inline functions first (which means a lot less code).



If this compiler pass is the second, then it may not recognize such functions, and you will get more code. All of this, of course, depends entirely on how the specific code looks, so it is difficult to optimize the order of passes in general.



So said a brilliant engineer from Amsterdam who built an algorithm into the release build to reorder optimization passes and minimize size. This took a whopping 11MB off the total machine code size and gave us a little time to keep developing.



But this approach terrified Swift compiler specialists, they were afraid that unverified compiler passes would reveal untested bugs (although each pass should be intrinsically safe, it is difficult to reason about possible combinations of passes). However, we have not experienced any major problems.



We also applied a bunch of other solutions (linting for especially expensive code templates). We measured each of them in the number of development weeks they give us. But the real problem was the growth curve. In the end, all the winnings were always eaten up.



As a result, we got enough time to wait for Apple's move, which raised the download limit over cellular communication to 150 MB. They also added a number of compiler functions to help with size optimization (-Osize). By their own admission, Swift will never be as small after compilation as Objective-C.



But as of this year, we optimized Swift to 1.5x the size of Objective-C machine code, and eventually Apple raised the optional limit to 200MB again. That's enough to keep us going for a few more years.



But we almost failed. If Apple hadn't increased the limit, the Uber app would have had to be rewritten back to ObjC. In the end, we were able to solve other problems as well. Shiny @alanzeinoand his team made it possible to include Swift support in the Buck build tool, which significantly reduced build times.



We lost a bunch of burned out people along the way. Spent a ton of money and learned hard lessons. Surprisingly, to this day, most insist that the rewriting was worth it. The architectural consistency is popular with new engineers who come to the company. They don't even know how much pain it took to achieve it.



The community has benefited from our knowledge. @ ellsk1 put together an amazing presentation and went on a lecture tour to share his knowledge. I too have been able to leverage this experience to help new companies and development teams make better decisions.



So here's a tip. Everything in programming is about compromise. There is no universally better language. Whatever you do, understand what the compromise is and why you are making it. Avoid political warfare between stubborn factions within the company.



Strive at points of failure. Figure out how to identify trade-offs and leave a retreat if you get to a point and realize you made a mistake. A lot of effort comes at a cost, but the later you realize the wrong compromise, the higher the cost.



Don't be a bore who only grumbles and doesn't contribute. Don't be a fanatic who creates big problems for everyone. The best engineers don't fall into any of these traps.



All Articles