Linux development process: is the game worth the candle?

Linux has been around for nearly three decades now. In the early days of this OS, Linus Torvalds himself manipulated code written by other programmers contributing to Linux development. Then there were no version control systems, everything was done manually. In modern conditions, the same tasks are solved using git.



True, all this time, something remained unchanged. Namely, the code is sent to a mailing list (or several lists), and there it is reviewed and discussed until it is considered ready for inclusion in the Linux kernel. But despite the fact that this process of working with code has been used successfully for many years, it has been constantly criticized. For example, this







A recent article by Microsoft's Sara Novotny made a lot of noise on the Internet. That article said that the code collaboration techniques used in developing the Linux kernel are outdated. It says that if the Linux developer community wants to attract young professionals to its ranks, these methods would be good to replace with something more modern. In the debate around these ideas, their defenders and opponents clashed.



I believe that my position allows me to provide some ideas regarding the development of the Linux kernel. For almost ten years now, I have been writing code for Linux and other projects that are organized in a similar way. When I was at Red Hat, I contributed to the x86 kernel infrastructure code, the KVM hypervisor and QEMU emulator code, and the Xen Hypervisor code. I participated in the development of other projects as well. I didn't do much Linux for about 7 years, but only because I devoted my time to working on the C ++ Seastar framework and on the ScyllaDB database... Both of these projects were developed using a methodology very similar to that used in Linux development. I now work as a lead engineer at Datadog, a company where software development processes are almost the exact opposite of those used in Linux. This is much closer to how development is organized in other web companies.



So which side am I on? Let me make it clear right away that I don't like the Linux development process. I’m pretty sure this is not only a barrier to new developers, but also a barrier to high productivity on your code (and it’s not about email). This is the source of negative feelings experienced by developers. And I'm not going to follow that model for any project in which I have the exclusive right to make decisions about how the work will be organized.



But at the same time, it seems that many critics of the Linux development process believe that the fact that its defenders are fighting for it so fiercely is only a consequence of the fact that the Linux community is full of old people who have a stranglehold on tradition and not willing to change under any pretext. This is not the case (although I'm sure there are such people in the Linux community). The Linux kernel development process brings some unique and important benefits to its users. If you apply the same principles to any other project, such a project will only benefit from it.



Any other tool, besides email, dictates to those who use them rather rigid schemes of work, depriving Linux of such an advantage. And mailing lists are just a noticeable mechanism that attracts the attention of debaters. We need tools that can lower the barriers to entry for Linux developers. Tools that can correct the flaws in the development process. One that allows different organizations to recognize the strengths of the Linux development organization. Tools like these can really make a difference for the entire software development industry.



There are many of these mechanisms that are of great benefit. In order not to prolong our conversation, I will focus on one of them, which I consider to be the most important. I will do my best to reveal its essence and talk about why, despite its strengths, it causes so many negative emotions in developers. I will also tell you why it, on the one hand, is able to benefit other projects, and on the other - why it is incredibly important for Linux.



Commit messages and patches



In the Linux kernel development world, there is a rule: code intended to be included in the kernel must be split into separate patches. Each of them must solve one and only one problem. A meaningful commit message should be prepared for each patch. It often happens that such messages are longer than the code that they describe.



This is a prime example of what is generally lacking in other projects. Most of the commit messages I've seen in modern projects on GitHub look something like "changes as of August 25th", or slightly (but only slightly) better, like "implementation of function X". If anyone needs to look at code like this in the future, it will not be easy for them to figure out why such changes were made to the code. Some of the bugs that these commits fix may be subtle. If you don't know how exactly such errors were fixed, they can easily return to the project. As I read the short, meaningless commit message, I may not be aware of the circumstances under which the bug was discovered.



Here's a small example. Take a look at the commit to the Linux kernelmade by my good friend Johann Weiner. It’s easy for me to imagine how, in another project, a message to a similar commit would have looked like “removing warnings”. And when I read the message of the commit discussed here, I learn about why it is possible, without harm to the project, to get rid of these warnings, about the circumstances under which this, indeed, will not lead to anything bad, and about what the rules must be adhered to if one day it is decided to change this code.



I am sure there are people in many organizations who do this. But when working on the Linux kernel, everyone who is involved in this business must do this. Therefore, I am quite confident that by reading the commit message, I will understand everything there is to understand about the corresponding code changes. If we are talking about an error, then I will learn about in which systems it manifested itself, in what conditions it occurred, why it did not affect other systems in any way, and what should be paid attention to so that this error does not return to the project.



This kind of work is highly desirable in any organization. This makes it easier for other people (and the developer himself, when he turns to his code after a while) to understand the reasons for making changes to the code, to understand why the code works the way it does. This makes it easier for new programmers to get to know the project. This solves the problem of returning old errors, reduces the danger that some code, seemingly unrelated to the one in question, may break something in it.



In other projects this is “highly desirable”. But in Linux it is absolutely necessary for two reasons:



  1. Linux . , . Linux, , - . , , , . ( ) , Linux. , Linux.
  2. (). Linux, . , 2020 , Linux , LTS-. , , - , Linux LTS-, Linux. , 2000- , . , , Red Hat .


Backports are usually not a problem for modern online companies that do not need to maintain multiple parallel product lines. They create something, pass it on to users, and that's where it ends. But when backports come into play, things get more complicated. The developer (probably not the author of the program) may need to decide how to slightly adapt the code to an older codebase that is slightly different from the modern one. And the solution that minimizes the risk can often be (and often is) one that consists of creating a patch only to implement a certain part of a large set of changes. Imagine a 2,000-line commit that contains 5 lines of code to fix a bug. Also, imagine that this error occurred after refactoring the API. What would you choose:preparing a backport based on a huge set of changes, or based on well-documented, well-documented, broken down patches? I, as a person who made countless backports, already know how I would answer such a question.



Well, with or without backports, projects have to pay a high price for the benefits of being organized like this when there is great emphasis on carefully documenting changes. Now the programmer needs to take care not only of the code, but also of how to reorganize it and bring it in line with the rules of work on the project.



Some of these code refactorings are simple. Let's say we're talking about using the git add -p command and choosing what will go into a batch of changes. Things get a little more complicated when a programmer is faced with circular dependencies between individual code fragments. Imagine a function that returns an object of a type that will be added to the project after adding this function to it. In order to cope with this situation, you will have to use code, which, as a result, will not get into the finished project, but will only play the role of a temporary solution.



All this adds a headache to programmers, but one cannot say that such tasks are completely unsolvable. Suppose you have, with surgical precision, divided everything you do into fragments that are easy and convenient to use. The real problems begin after other programmers start looking at your code. Code review in any organization is very important. Experts read someone else's code and suggest (or demand) changes to it.



Suppose a programmer was asked to add a new parameter to a certain method present in the first patch from a set of fixes. And let's also assume that this method is used in all subsequent patches.



This means that the programmer will have to go back to the first patch and add a new parameter to the method. After that, the next patches can no longer be applied. Therefore, the programmer will not only have to puzzle over why this is so, but will also need to manually correct all the errors. If all individual patches were previously tested, then now the results of these tests are out of date and the patches will have to be tested again.



Reorganizing work is a small problem. But reworking what has already been done is a much more serious problem.



Here's what I would like to convey to the Linux developer community and to those who are related to this community: all of this, of course, is quite doable. But if this is not an entry barrier for young professionals, then I don't even know what can be called an "entry barrier". The need to spend your time, effort, nerves and computer resources on reorganizing, rewriting, reworking what has already been done is clearly not what programmers are striving for. In this regard, I came across one idea that periodically appears in this form: "... but a good programmer will not have problems with this." It is also voiced like this: "but it teaches programmers a certain style of thinking, exactly the kind that a good programmer should have." This kind of reasoning seems to me insincere and useless. Indeed:I've just listed all the strengths of this method, but I also find that all these code refactorings are cumbersome and tedious tasks. It can be compared to cleaning an apartment. Let's say someone says that it is very good when the house is kept clean (I agree with that). The same person is quite capable of vacuuming floors (I can), but quite often he does not. The reason for this is simple: he has other, more important things to do. For example, that's why I'm incredibly happy to have a Roomba robot vacuum cleaner. This thing allowed me to enjoy the cleanliness and at the same time not have to tidy up myself. Which brings me to my next thought, directed at people outside of the Linux world.that all these code refactorings are cumbersome and tedious tasks. It can be compared to cleaning an apartment. Let's say someone says that it is very good when the house is kept clean (I agree with that). The same person is quite capable of vacuuming floors (I can), but quite often he does not. The reason for this is simple: he has other, more important things to do. For example, that's why I'm incredibly happy to have a Roomba robot vacuum cleaner. This thing allowed me to enjoy the cleanliness and at the same time not have to tidy up myself. Which brings me to my next thought, directed at people outside of the Linux world.that all these code refactorings are cumbersome and tedious tasks. It can be compared to cleaning an apartment. Let's say someone says that it is very good when the house is kept clean (I agree with that). The same person is quite capable of vacuuming floors (I can), but quite often he does not. The reason for this is simple: he has other, more important things to do. For example, that's why I'm incredibly happy to have a Roomba robot vacuum cleaner. This thing allowed me to enjoy the cleanliness and at the same time not have to tidy up myself. Which brings me to my next thought, directed at people outside of the Linux world.The same person is quite capable of vacuuming floors (I can), but quite often he does not. The reason for this is simple: he has other, more important things to do. For example, that's why I'm incredibly happy to have a Roomba robot vacuum cleaner. This thing allowed me to enjoy the cleanliness and at the same time not have to tidy up myself. Which brings me to my next thought, directed at people outside of the Linux world.The same person is quite capable of vacuuming floors (I can), but quite often he does not. The reason for this is simple: he has other, more important things to do. For example, that's why I'm incredibly happy to have a Roomba robot vacuum cleaner. This thing allowed me to enjoy the cleanliness and at the same time not have to tidy up myself. Which brings me to my next thought, directed at people outside of the Linux world.aimed at people outside the Linux world.aimed at people outside the Linux world.



Here's what I would like to say to those outside the Linux community: there are very real strengths in the development process used to work on Linux. A certain tool is not able to fully cope with the task of working on Linux. GitHub, for example, does great work on projects where new code is always added after existing code. You can, of course, use the git push --force command, forcibly including a certain branch into the repository, but then the comments attached to the commit will be, in fact, "hanging in the air", and the discussion of this commit will be meaningless.



Modern development tools simplify a lot. They allow you to trigger the execution of some actions when certain conditions occur, they support continuous integration and project deployment processes, notify programmers about changes in the code, and solve a lot of other tasks. But they certainly complicate the process of breaking down the results of someone's work into small pieces that are easy and convenient to work with. The use of plain text emails complicates a lot, but this organization of work, it should be noted, does not interfere with the application of development processes leading to a specific goal.



Even if it were possible to objectively and accurately assess how much the Linux ecosystem would have won and lost (nothing it would have lost) by abandoning the existing development process, it would hardly change anything. The fact is that the current situation perfectly illustrates human nature, when people strive to preserve what has previously shown itself very well in practice.



Is there a way out of this situation?



I sincerely believe that if we had tools at our disposal that could provide different organizations with the same benefits that the Linux community derives from its development methodology, it would be very beneficial for everyone. And if such tools did exist, then perhaps even the Linux community could replace regular text emails with them.



I have no answer to the question of how such tools might look like. But I will dare to take a chance and reflect on them a little:



  1. Git — . , , , , . GitHub, - git-, «-» Linux, git, . , -, , , , . Git . CSS HTML, — , CSS HTML- . , HTML CSS? , , , .
  2. , , , , , . , , , ? . , : « create_foo() , create_bar()», : « create_bar() y, ». , , . , , , , GPT-3, , .
  3. , , , , - , . , , , . , , , , , , , , , .


-, Linux?










All Articles