🛳️ 👏🏽 🔉 Dead Code: Find and Disarm 🧝🏿 👩🏻‍🤝‍👨🏼 👎🏼

My name is Danil Mukhametzyanov and I have been working as a backend developer at Badoo for seven years. During this time, I managed to create and change a large amount of code. So big that one day a manager came up to me and said: “The quota is over. To add something, you need to remove something. "

Okay, that's just a joke - he didn't say that. It's a pity! Over the entire existence of the company, Badoo has accumulated more than 5.5 million lines of logical business code, excluding blank lines and closing brackets.

The quantity itself is not so scary: he lies, does not ask for food. But two or three years ago, I started to notice that I read more and more and try to figure out the code that doesn't really work in the production environment. That is, in fact, dead.

This trend was noticed not only by me. Badoo realized that our highly paid engineers are constantly wasting time on dead code.

I gave this talk at Badoo PHP Meetup # 4

Where does the dead code come from?

We started looking for the causes of the problems. Divided them into two categories:

process - those that arise as a result of development;
historical - legacy code.

First of all, we decided to disassemble process sources in order to prevent the emergence of new problems.

A / B testing

Badoo began actively using A / B testing four years ago. Now we have about 200 tests constantly running, and all product features go through this procedure.

As a result, about 2000 completed tests were accumulated in four years, and this figure is constantly growing. She scared us that each test is a piece of dead code that is no longer executed and is not needed at all.

The solution to the problem came quickly: we started to automatically create a ticket for cutting the code after the completion of the A / B test.

An example of a ticket

But the human factor periodically triggered. Over and over, we found the test code that continued to run, but no one thought about it and completed the test.

Then there was a rigid framework: each test must have an end date. If the manager forgot to read the test results, he would automatically stop and turn off. And, as I already mentioned, a ticket was automatically created for cutting it out while maintaining the original version of the logic of the feature.

With the help of such a simple mechanism, we got rid of a large layer of work.

Diversity of clients

Several brands are supported in our company, but the server is one. Each brand is represented on three platforms: web, iOS and Android. On iOS and Android, we have a weekly development cycle: once a week, along with an update, we receive a new version of the application on each platform.

It is easy to guess that with this approach, in a month we have about a dozen new versions that need to be supported. User traffic is unevenly distributed between them: users are gradually switching from one version to another. Some older versions have traffic, but it is so small that it is difficult to maintain it. It's hard and useless.

So we started counting the number of versions we want to support. Two limits have appeared for the client: soft limit and hard limit.

When the soft limit is reached (when three or four new versions have already been released, and the application is still not updated), the user sees a screen with a warning that his version is out of date. When the hard limit is reached (this is about 10-20 "missed" versions, depending on the application and brand), we simply remove the option to skip this screen. It becomes blocking: you cannot use the application with it.

Screen for the hard limit

In this case, it is useless to continue to process the requests coming from the client - he will not see anything but a screen.

But here, as in the case of A / B tests, a nuance arose. Client developers are people too. They use new technologies, chips of operating systems - and after a while the application version is no longer supported on the next version of the operating system. However, the server continues to suffer because it must continue to process these requests.

We came up with a separate solution for the case when support for Windows Phone ended. We prepared a screen that informed the user: “We love you very much! You are very cool! But can you start using another platform? New cool functions will become available to you, but here we cannot do anything. " As a rule, we offer a web platform as an alternative platform, which is always available.

With such a simple mechanism, we limited the number of client versions that the server supports: approximately 100 different versions from all brands, from all platforms.

Feature flags

However, by disabling support for older platforms, we did not fully understand whether it was possible to completely cut out the code they were using. Or platforms that remain for older OS versions continue to use the same functionality?

The problem is that our API was not built on the versioned part, but on the use of feature flags. How we came to this, you can find out from this report .

We had two types of feature flags. I'll tell you about them with examples.

Minor features

The client says to the server, “Hello, it's me. I support photo posts. " The server looks at it and replies: “Great, support! Now I know about it and will send you photo messages. " The key feature here is that the server cannot influence the client in any way - it simply accepts messages from it and is forced to listen to it.

We call these flags minor features. At the moment we have more than 600 of them.

What is the disadvantage of using these flags? Periodically, there is heavy functionality that cannot be covered only from the client side - you want to control it from the server side as well. For this, we introduced other types of flags.

Application features

The same client, the same server. The client says: “Server, I've learned to support video streaming. Turn it on? " The server replies, "Thanks, I'll keep that in mind." And he adds: “Great. Let's show our beloved user this functionality, he will be glad. " Or: "Ok, but we won't include it yet."

We call these features application features. They are heavier, so we have fewer of them, but still enough: more than 300.

So, users move from one version of the client to another. Some flag is starting to be supported by all active versions of applications. Or, conversely, not supported. It's not entirely clear how to control this: 100 client versions, 900 flags! To deal with this, we built a dashboard.

A red square on it means that all versions of this platform do not support this feature; green - all versions of this platform support this flag. If the flag can be turned off and on, it will periodically blink. We can see what happens in which version.

Dashboard screen

Right in this interface, we started creating tasks for cutting out functionality. It should be noted that not all red or green cells in each row need to be filled in. There are flags that only run on one platform. There are flags that are filled in for only one brand.

Automating the process is not so convenient, but, in principle, it is not necessary - you just need to set a task and periodically look at the dashboard. In the first iteration, we managed to cut out more than 200 flags. That's almost a quarter of the flags we used!

This is where the process sources ended. They appeared as a result of our development flow, and we have successfully integrated the work with them into this process.

What to do with legacy code

We have stopped the emergence of new problems in process sources. And we were faced with a difficult question: what to do with the legacy code accumulated over the years? We approached the solution from an engineering point of view, that is, we decided to automate everything. But it was not clear how to find the code that was not being used. He hid in his cozy little world: he is not called in any way, he does not let anyone know about himself.

We had to go from the other side: take all the code that we had, collect information about which pieces are exactly executed, and then do the inversion.

Then we put it together and implemented it at the most minimal level - in files. This way we could easily get a list of files from the repository by running the appropriate UNIX command.

It remained to collect a list of files that are used in production. It's quite simple: for each request on the shutdown, call the corresponding PHP function. The only optimization we've done here is to start requesting OPCache instead of requesting each request. Otherwise, the amount of data would be very large.

As a result, we discovered many interesting artifacts. But with a deeper analysis, we realized that we were missing unused methods: the difference in their number was three to seven times.

It turned out that the file could be loaded, executed, compiled for the sake of only one constant or a pair of methods. Everything else remained useless to lie in this bottomless sea.

Putting together a list of methods

However, it turned out quickly enough to collect a complete list of methods. We just took Nikita Popov's parser , fed him our repository and got everything we have in the code.

The question remains: how to assemble what is being played in production? We are interested in production, because tests can cover what we don't need at all. Without thinking twice, we took XHProf. It has already been executed in production for part of the queries, and therefore we had profile samples that are stored in the databases. It was enough just to go to these databases, parse the generated snapshots - and get a list of files.

Disadvantages of XHProf

We repeated this process on another cluster where XHProf did not start, but was badly needed. This is a cluster for running background scripts and asynchronous processing, which is important for highload, it runs a lot of logic.

And then we made sure that XHProf is inconvenient for us.

It requires changing the PHP code. You need to insert the tracing start code, finish tracing, get the collected data, write it to a file. After all, this is a profiler, and we have production, that is, there are a lot of requests, you need to think about sampling too. In our case, this was aggravated by a large number of clusters with different entry points.
. . , OPCache. : XHProf, . , core- .
. . XHProf . ( XHProf): CPU, , . , , . - XHProf aggregator ( XHProf Live Profiler, open-source) , , , . , : «, , », CPU , , Live Profiler . , , .
XHProf. , . . , . : , ( , youROCK, this is not required by lsd , but it was more convenient to maintain a single wrapper over it). Patching XHProf is not what we wanted to do, because it is a rather large profiler (what if we break something inadvertently?).

There was another idea - to exclude certain namespaces, for example, vendor namespaces from the composer, which are executed in production, because they are useless: we will not refactor vendor packages and cut out unnecessary code from them.

Solution requirements

We got together again and looked at what solutions exist. And they formulated the final list of requirements.

First: minimal overhead. For us, XHProf was the bar: no more than it requires.

Second, we didn't want to change the PHP code.

Third, we wanted the solution to work everywhere - both in FPM and in the CLI.

Fourth, we wanted to handle the forks. They are actively used in CLI, on cloud servers. I didn't want to make specific logic for them inside PHP.

Fifth: sampling out of the box. In fact, this follows from the requirement not to change the PHP code. Below I will explain why we needed sampling.

Sixth and last:the ability to force from code. We love it when everything works automatically, but sometimes it is more convenient to manually start, adjust, look. We needed the ability to enable and disable everything directly from the code, and not by a random decision of the more general mechanism of the PHP module, which sets the probability of inclusion through the settings.

How funcmap works

As a result, we have a solution that we call funcmap.

Funcmap is essentially a PHP extension. In PHP terms, this is a PHP module. To understand how it works, let's take a look at how the PHP process and PHP module works.

So, you start a process. PHP makes it possible to subscribe to hooks when building a module. The process starts, the GINIT (Global Init) hook is launched, where you can initialize the global parameters. Then the module is initialized. Constants can be created and allocated there, but only for a specific module, and not for a request, otherwise you will shoot yourself in the foot.

Then the user request comes in, the RINIT (Request Init) hook is called. When the request is completed, its shutdown occurs, and at the very end - the module shutdown: MSHUTDOWN and GSHUTDOWN. Everything is logical.

If we are talking about FPM, then each user request comes to an already existing worker. Basically, RINIT and RSHUTDOWN just work in circles until FPM decides that the worker is outdated, it's time to shoot him and create a new one. If we're talking about the CLI, then it's just a linear process. Everything will be called once.

How funcmap works

Out of this set, we were interested in two hooks. The first is RINIT . We began to set the data collection flag: this is a kind of random that was called to sample the data. If it worked, then we processed this request: we collected statistics for calls to functions and methods for it. If it did not work, then the request was not processed.

The next thing is to create a hash table if it doesn't exist. The hash table is provided internally by PHP itself. There is no need to invent anything here - just take it and use it.

Next, we initialize the timer. I will talk about him below, for now, just remember that he is, important and needed.

The second hook is MSHUTDOWN... I want to note that it is MSHUTDOWN, not RSHUTDOWN. We didn't want to work out something for each request - we were interested in the whole worker. On MSHUTDOWN we take our hash table, go over it and write a file (what could be more reliable, more convenient and versatile than the good old file?).

The hash table is filled quite simply by the same PHP hook zend_execute_ex, which is called every time a user-defined function is called. The record contains additional parameters by which you can understand what kind of function it is, its name and class. We accept it, read the name, write it to the hash table, and then call the default hook.

This hook does not write inline functions. If you want to override built-in functions, there is a separate functionality for that called zend_execute_internal.

Configuration

How can I configure this without changing the PHP code? The settings are very simple:

enabled: whether it is enabled or not.
The file we are writing to. There is a pid placeholder to exclude a race condition when different PHP processes write to the same file at the same time.
Probability basis: our probability flag. If you set it to 0, then no request will be written; if 100 - it means that all requests will be logged and included in statistics.
flush_interval. This is the frequency with which we dump all data to a file. We want data collection to be executed in the CLI, but there are scripts that can be executed for a long time, eating up memory if you use a large amount of functionality.

In addition, if we have a cluster that is not so heavily loaded, FPM understands that the worker is ready to process more, and does not kill the process - it lives and eats up some part of the memory. After a certain period of time, we flush everything to disk, reset the hash table and start filling it all over again. If, however, the timeout has not yet been reached, then the MSHUTDOWN hook is triggered, where we write everything finally.

The last thing we wanted was the ability to call funcmap from PHP code. The corresponding extension provides the only method that allows you to enable or disable the collection of statistics regardless of how the probability worked.

Overheads

We wondered how all this affects our servers. We have built a graph that shows the number of requests coming to a real combat machine of one of the most loaded PHP clusters.

There can be many such machines, so the graph shows the number of requests, not the CPU. The balancer realizes that the machine has begun to consume more resources than usual, and tries to equalize the demands so that the machines are loaded evenly. This was enough to understand how degrading the server is.

We turned on our extension sequentially at 25%, 50% and 100% and saw the following picture:

The dotted line is the number of requests we expect. The main line is the number of requests that come in. We saw a degradation of approximately 6%, 12% and 23%: this server began to process almost a quarter less incoming requests.

This graph first of all proves that sampling is important to us: we cannot spend 20% of server resources on collecting statistics.

False result

Sampling has a side effect: some methods are not included in the statistics, but in fact are used. We tried to combat this in several ways:

. -, . , , , , .
. , : , , .

We tried two solutions for error handling. The first is to enable statistics collection forcibly starting from the moment when the error was generated: collect the error log and analyze it. But there is a pitfall here: when a resource falls, the number of errors instantly increases. You start processing them, there are many more workers - and the cluster begins to die slowly. Therefore, doing so is not entirely correct.

How to do it differently? We read and, using Nikita Popov's parser, went through the stakes, noting which methods are called there. Thus, we have eliminated the load on the server and reduced the number of false positives.

But still there were methods that were rarely called and about which it was unclear whether they were needed or not. We have added a helper that helps to determine the fact of using such methods: if sampling has already shown that the method is rarely called, then you can turn on processing 100% and not think about what is happening. Any execution of this method will be logged. You will know about it.

If you know for sure the method is being used, it might be overkill. Perhaps this is a necessary, but rare functionality. Imagine that you have the "Complain" option, which is rarely used, but it is important - you cannot cut it out. For such cases, we have learned how to manually label such methods.

We have created an interface that shows which methods are in use (they are on a white background) and which are potentially not used (they are on a red background). Here you can also mark the necessary methods.

Interface screen

The interface is great, but let's go back to the beginning, which is the problem we were solving. It consisted in the fact that our engineers read dead code. Where do they read it? In the IDE. Imagine what it would be like to force a fan of his craft to leave the IDE-world in some kind of web interface and do something there! We decided that we need to meet our colleagues halfway.

We have made a plugin for PhpStorm that loads the entire database of unused methods and displays whether this method is used or not. Moreover, you can mark the method as being used in the interface. This will all go to the server and become available to the rest of the codebase contributors.

This concludes the main part of our work with Legacy. We began to notice faster that we are not executing, respond faster to it and do not waste time searching for unused code manually.

The funcmap extension is available on GitHub . We will be glad if it is useful to someone.

Alternatives

From the outside, it may seem that we at Badoo do not know what to do with ourselves. Why not take a look at what's on the market?

This is a fair question. We looked - and there was nothing on the market at that moment. It was only when we started to actively implement our solution that we discovered that at the same time, a man named Joe Watkins, living in foggy Great Britain, implemented a similar idea and created the Tombs extension.

We didn't study it very carefully, because we already had our own solution, but nevertheless we found several problems:

Lack of sampling. Above, I explained why we need it.
. , APCu ( ), .
CLI. , , CLI-, .
. Tombs, , , , , , . funcmap («» , ): , . Tombs , , FPM CLI. - , .

First, think in advance about how you will remove functionality that is implemented for a short period of time, especially if the development is very active. In our case, these were A / B tests. If you do not think about it in advance, then you will have to clean up the rubble.

Second: know your customers by sight. It doesn't matter if they are internal or external - you must know them. At some point, you need to tell them: “Dear, stop! No".

Third: clean up your API. This leads to the simplification of the entire system.

And fourth: you can automate everything, even the search for dead code. Which is what we did.

Dead Code: Find and Disarm