Hello, Habr! At the Luxoft TechFest held on January 28, Mikhail Zankovich, Senior Team Lead at Luxoft , talked about applications with severe heredity. Today he shares additional thoughts implicated in the content of this report, which caused quite a heated discussion during the meetup.
What emotions and associations does the phrase “We have a legacy project” evoke in you? Most often - the lack of structure, a mess, tons of undocumented code, horror, architectural anarchy, disgust, a sea of crutches, you have to run! My emotions: “Oh! Finally, something interesting. Let's make it work! " I suspect this is a very unusual reaction.
In this article I will try to reveal a different ideology of working with legacy. Let's designate it as "software restoration". I do not intend to change your attitude towards Legacy, but if I manage to at least plant a grain of doubt that Legacy may be interesting, I will be glad.
Typical legacy
What is Legacy? In my experience, you can create the following checklist for legacy product compliance:
- .
.. , , .
- .
, black-box. . . - .
, . - .
. / ..
All this leads to the fact that it becomes more and more difficult to maintain such a project with each release, as well as to work out new functionality / new requirements. Any change turns into reverse-engineering with mandatory regression testing, etc. As a result, the project becomes expensive to maintain and is “frozen” in its current form with the minimization of any changes.
But why do legacy products appear?
Rarely does a team consciously and purposefully create a low-quality product. Most often this is the result of opportunities limited by the current situation on the project.
There are no clear requirements - there is no possibility of a balanced design of the application's functionality.
Constantly changing requirements, unclear formulation, tight implementation times, constantly growing technical debt are clear signs of agile development processes in a team that has not been able to fully adapt to these “agile” approaches. And from "flexible" only "rapidly changing requests from the business" work.
This often leads to increased rotation within the team, which in turn does not have a positive effect on quality. Imagine a new specialist joins the team, for two or three months he only delves into the process, then for one and a half or two months he implements some functionality and prepares to leave the project. He is not interested in a quality product, full documentation of his part, transfer of knowledge to colleagues, etc. Expertise is blurred.
At some point, a fatal decision is made: it is easier to replace / turn off than to accompany. And the project enters the “low maintenance” phase, when it is supported on a leftover basis, trying to minimize changes, create additional “crutches” that implement new requests quickly but poorly. Why high quality? The product will change. In this mode, the product can survive for many years, becoming overgrown with "crutches" and becoming more monstrous.
Summarizing all of the above, the following main reasons for the emergence of legacy products can be identified:
- tight deadlines for the implementation of the functionality;
- lack of clear requirements / intensively changing requirements;
- increased rotation within the team;
- incorrect planning of the life cycle.
We add here the level of professional development of specialists at this moment. Open your project five or ten years ago. I am sure you can easily find elements that you would implement differently now.
So, we take as an axiom: "the code is not created inherently bad". This means that any product had some kind of idea. And if the code went into production, then it worked and satisfied the needs of the business at that time.
Escort approach
The tasks on the part of the customers are quite simple: to maintain the legacy in working order without disrupting the current business processes (some of which are not known at all), while developing the functionality according to the new requirements according to the budget, which is most likely limited.
The typical approach of a team that embarks on a project is to continue to slowly but surely make things worse. Do not touch too much, change only what is asked and only when asked. If the module works, but requires modification of the logic - leave it and do not touch it - it is better to create another one like this, but with the necessary logic. Chaos grows, the app gets more complicated.
The restorer's approach
The approach of a software restorer is to figure out what kind of mechanism is in front of it. What was the main idea behind its creators. Try to cut off all that is unnecessary and keep all the best. If you change the existing structure, then it is extremely careful and attentive to the details. Not a single detail affecting the system should be hidden from the view of the restorer. The introduced changes are first worked out according to the maintenance logic, and then an analysis is made to determine the possibility of implementing a full-fledged solution.
This is a difficult and time consuming job. Not every developer is willing, and most importantly, really capable of doing restoration. The requirements for the level of a restorer are an order of magnitude higher than for an ordinary developer. Without experience in real projects, without understanding how systems can develop, without colliding not only with the best approaches in practice, but also with clearly unsuccessful implementations, there is no point in restoring.
Instead of the typical first urge “Yes, these are crutches! Everything needs to be rewritten here! " - a true restorer will ask the question “Why was it done this way? How exactly was it planned to use it? " And only after making sure that there were no obvious prerequisites for the creation of such a code, the restorer can exclaim: “Yes, these are crutches! Everything needs to be rewritten here! ”, And with a sense of accomplishment, he can really break off unnecessary growth on the ossified framework of the software, making the object of restoration better and of higher quality.
But this rarely happens, although it gives indescribable pleasure to the restorer. Most often, you have to untangle the tangle of dependencies between different modules. It is not uncommon for the threads to stretch far beyond the area of responsibility of the component being disassembled (and sometimes the system). And all these intricacies of module relationships must be taken into account when restoring.
Thus, the software restorer works at the intersection of development, architecture, business analysis, testing and medicine. And it is difficult to say which skills from the designated areas are the highest priority. There must be a certain balance between them, seasoned with an honest desire to engage in restoration. What does medicine have to do with it? So the main principle of the restorer is “primum non nocere” - first of all, do no harm.
Actually, this approach will be considered further on specific examples, gradually disassembling and restoring the typical legacy inherited from the previous technical owners of systems. And with specific examples, let's see why all of the above skills are important.
Data warehouse restoration
What does the system store?
Landing on a new project, the restorer will pay attention to the objects processed by the system. To fully immerse yourself in business flows and source code, especially in the absence of normal documentation, it will take at least several months.
One of the first tasks of a restorer is to assess the effectiveness of the vault. Can you improve something without relying on an understanding of business processes? A typical pain point of any data warehouse is primarily related to the volume of this very data. The larger the volume, the higher the cost of ownership of the system.
The second pain point is the growth of this very volume, which negatively affects the performance of the system in the first place. Most likely, there are already some practices for retaining information in the system, but how effective are they?
All the practices considered here are more applicable to classical RDBMS, but the approach is not very different for no-sql solutions.
One of the main tactics of the restorer in this direction is the creation of monitoring of information storage objects. In the case of classic DBMS, table monitoring.
A framework is needed that will allow the system metadata to periodically collect data on two trivial parameters - the amount of data and the number of elements in each table. The frequency will have to be selected manually (more on this below), based on the characteristics of the system. A typical start-up period of 24 hours is sufficient for basic analysis.
Analyzing data
What to do with the data? What to look for? The first moment is to identify the most "heavy objects". In practice, the standard 20/80 rule works - no more than 20 percent of objects will use more than 80 percent of the space. This allows you to significantly narrow the area of analysis at the first stage.
The longer and in more detail such statistics are accumulated, the more clearly the behavior of the system is reflected. Experience has shown that the recommended period is at least two weeks. The main idea is to "hook" non-working days / periods within which the mechanisms for cleaning and archiving information are most often implemented.
So the framework is written, and the restorer waits for the results for two weeks? Of course not. This does not fight the ideology of the restorer. With the first chunk of data in hand, you can do some basic analysis. Namely - to see the ratio of the occupied space to the number of stored objects (rows). The larger this value, the more likely it is that BLOB fields are stored here. And just these tables and fields become the object of research and analysis for the restorer.
Key questions: How often do business processes actually access these objects? The owner of the system, the existing team, can shed some light on such points. And suddenly (and in practice very often) it turns out that such fields store information that is not important for business: dumps of objects / messages for analysis by the development team, user comments displayed only when creating an order, etc.
The next step: If the data is not used frequently, or does not have clear business value, why not move it to the archive? At the same time, a cardinal approach with dividing a monolithic table into parts, moving blobs to a cheaper / slower medium, but at the same time preserving the original interface of the table (the main point is that there is no reliable information about all processes accessing this information, which means that changes are not should harm them) - can be quite an interesting and complex technical problem.
A less interesting but equally useful task is to use the built-in data storage system to archive the values of certain fields. For example, Sybase ASE has the ASE_Compression feature, Mongo DB allows you to set compression options for collections, etc. Almost any data storage system has the option of additional data compression “under the hood”. The functionality will work transparently for external systems and will not require drastic changes. In practice (especially on legacy systems) such data compression options are not used by default.
Of course, when applying compression, the restorer must first assess the impact of the approach on performance, and for this, key performance indicators of the system must be worked out, or, in extreme cases, elements of regression testing must be present.
In general, there is something to do for a couple of weeks while full statistics on objects are collected.
Big statistics: what to look for
Having received statistics over a long period of time, the restorer tries to figure out what is happening with the dynamics of the used space. All values for one table / object are normalized to the original one. This will make it possible to estimate precisely the relative increase in data and to identify the most intensively changing objects.
The generated profile will most likely correspond to one of the following types:
Profile 1 - constant value. Most likely, these are static directories and it is not so interesting to work with them. The archiving approach described above can be applied based on the intensity of use of the directory.
Small fluctuations in volume - profile 2- they can talk about both a reference book and an operational table, in which data read / write is intensive. These are the most difficult objects from the point of view of the restorer, because it is necessary to analyze their behavior in as much detail as possible. It is for these objects that it makes sense to increase the frequency of information collection: not once a day, but once an hour, once a minute. The main goal is to trace the profile change in more detail and understand the behavior dependencies.
Profiles 3 and 4 are of more interest. Profile 3(“Saw”) clearly states that this table is periodically cleared. But the growing trend - each time after cleaning, the final volume is slightly larger than it was before - speaks of the inefficiency of the existing cleaning mechanisms. Those. over a certain period of time, more data appears in the system than is deleted at the end of the period. This may be a completely normal business process, a classic increase in the load on the system.
But for the restorer, this is, first of all, a signal: are there any conditions for deleting information? Based on practice, most likely some entities, due to the complex conditions of data retention in the system, undeservedly remain forever in the storage. The goal of the restorer is to identify such entities and also include them in periodic activities.
If profile 3 degenerates into constant growth, this is the first contender for a bottleneck in the system. Firstly, there are no explicit pointers to the archiving process, and secondly, performance degradation is expected with data growth.
Profile 4- a typical example of an archive table with periodic data filling. Please note that the table growth occurs only on certain days. It is quite possible that correlations with tables of the third profile will be noticeable. For archive tables, it is also important to understand the principle of their use - is there any appeal to them from users? Or is it a story to analyze? Or is it data for reporting systems? Depending on the answers to these questions, it is quite possible that a decision will be made to separate the archive tables into a separate circuit, a separate base, a separate section. Thus, freeing up the operational space.
How does it work in practice?
In one of the projects, a similar exercise was done in the first month and a half after joining the project. It was the objects of profile No. 3 that were the target, and they were found. Applying the described practices (improving cleaning conditions), deleting data that was not used within the system, etc. allowed to reduce the volume of occupied space by more than 25%, as well as to stop the intensive growth of storage.
As a result, we were able to make the first technical changes to the project and submit plans to improve the functionality. The customer was satisfied with the result of the team, and it expanded from 3 to 9 developers. Throughout the year we continued investigations, the points of improvement in functionality were used to support the system and its characteristics.
Two analysts were added to us, so the team began to engage in its own development - not support, but the implementation of new business functionality. We are now developing a new system.
What is this all for?
If you have read this far, then most likely you are looking for an answer to the question: "Why is all this?" First of all, restoration is a separate process, not like development, not like support, but combining them.
This is a separate drive for a technical specialist - to delve into the logic of the person who created this product, to understand its meaning, to cleanse the product from unnecessary things and make it even better than it was. The app looks like a quest, with many mysteries and unknown plot twists.
No, you are not creating from scratch, you are restoring an existing product, but perhaps destroyed by time. Among other things, the restorer has a unique opportunity to pump in any of the six directions (see the picture above), while having a real product at hand as a test base. A sense of self-control is also pumped - not to fall into technical perfectionism, but to think over and make only the changes that are necessary for the system in terms of improving processes.
All this makes working with legacy systems exciting and unusual. But the final choice to restore or maintain is yours.
Mikhail Zankovich's report at Luxoft TechFest can be viewed here .
The author of the article is Mikhail ZankovichMikhailZankovich