Introduction
When we work on Diablo IV, we write all the code on Windows and then compile it for different platforms. This also applies to our servers that run Linux. (The code includes conditional compilation directives and, where necessary, it contains fragments written specifically for a particular platform.) Our work is organized this way for many reasons. For starters, our team's key professional skills are Windows. Even our server programmers are most familiar with Windows development. We appreciate the ability to use the same tools and the same knowledge base by all programmers on our team.
Another, most important reason that we do development on Windows is the ability to use the highly functional and reliable set of tools that Visual Studio gives us. And even if we were developing something in Linux, then I can say that there is nothing in the Linux world that can even be compared with Visual Studio.
True, because of this, we face some difficulties that arise when the server crashes and we need to debug a memory dump. We have the ability to remotely log into a virtual machine (or, more precisely, into a container) that failed, we can run gdb to find out the reasons for what happened. But this approach has many disadvantages. For example - we do not deploy binaries along with the source code - as a result, when working with a virtual machine or with a container, the source code is not available in the gdb session.
Another complication lies in gdb itself. The fact is that if we do not use this tool constantly, on a regular basis, it cannot be mastered at a level that would suit us. Simply put, our developers would be much more willing to use familiar tools to debug code. Since only 2-3 of our developers know gdb very well, when something goes wrong, they are the ones who are looking for the problem. And this cannot be called the optimal distribution of the workload for programmers.
We've always wanted to find a way to debug Linux code that is intuitive. That is why we are so excited to be able to use the new Visual Studio feature that allows us to solve exactly this problem in a familiar environment! And it would not be an exaggeration to say that thanks to this our dream has come true.
About our code debugging process
Debugging Linux code in Visual Studio is possible only if the Windows Subsystem for Linux (WSL) is installed on the system, or if the connection to Linux is configured in the Connection Manager . All of our backend developers have installed WSL using the distribution on which we deploy our project. We run a script I wrote that installs all the development tools and support libraries needed to build our server in WSL.
(I will digress from our main topic for a moment. I would like to emphasize that we have come to the conclusion that WSL is the best environment in existence that allows developers to test changes in Linux builds. This scheme of work looks extremely convenient: switching to WSL, using the command
cd
to go into a shared code directory and building the project directly from there. This is a much better solution than using a virtual machine or even a container. If you build projects using CMake, then you can also use the built-in Visual Studio support for WSL .)
I'll tell you a little about our assemblies. We develop code on Windows and we have a Windows version of our server designed to work on this OS. This is useful for us when working on the usual project capabilities. But we are deploying our server-side code on Linux, which requires building on Linux. Linux assemblies are created on a build farm. It uses a build system that runs on a Linux computer. With its help, our server project and the corresponding container are assembled, which is later deployed. Linux binaries are deployed only in containers. Usually developers don't have access to these containers.
When one of the servers in our infrastructure fails, we are notified of this by a special script, after which the dump files are written to a shared network folder. To debug these files, either on Linux or Visual Studio, you need a working program. When debugging, it is useful to use exactly the same shared libraries used in the deployed container. We use another script to get these files. First, we copy the dump to the local machine, and then we run the script and pass it information about this dump. The script loads the Docker container that was built for the tested version of the code, extracts the executable files of our server from it, as well as certain common runtime libraries. All this is needed for gdb. (This, when working with gdb, avoids the compatibility issues that might arise ifif the WSL version of the system is not exactly the same as its deployed Linux version.) The script, setting up a debugging session, writes data to
~/.gdbinit
, indicating that shared libraries are system libraries.
Then we go to Visual Studio, where the fun begins. We are downloading a build solution for the Windows version of our servers. Then we open a new debug dialog using the command
Debug -> Other Debug Targets -> Debug Linux Core Dump with Native Only
. We check the box
Debug on WSL
and enter the paths to the dump files and the server binaries (intended for WSL!). After that, just press the button
Debug
and watch what is happening.
Starting Debugging in Visual Studio
Visual Studio automatically launches gdb in WSL. After the system has been working with the disk for some time, the call stack of the program that failed is displayed, and the instruction pointer is set to the corresponding line of code. This is truly a brave new world!
Next, we deal with the identification of the failure itself. We have a fault handler that intercepts the appropriate event to execute some service procedures. Therefore, information about the failure itself is located, on a single-threaded server, deeper in the call stack. But some of our servers are multithreaded. And the crash can happen on any of their threads. The fault handler logs information about the faulty file code and line number. Therefore, examining this data gives us the first clue. We are looking for the place in the call stack that corresponds to the execution of this code.
In the old days, namely a few weeks ago, we would have used gdb to backtrace all threads, and then scan the resulting list to find the thread whose call stack most likely crashed. For example, if the thread was in a dormant state, then most likely it did not crash. We need a stack that has more than a few frames and information that we are dealing with a sleeping thread. Next, we need to examine the code in order to understand what the problem is. If it's something simple, you can see it right in the code. If we face a more complicated problem, we will have to resort to the capabilities of gdb to investigate the state of the process.
But Visual Studio gives us much more powerful capabilities than we had before. In multithreaded environments, you can open a window in a debug session
Threads
and click on the threads to view their stacks. This is, however, very similar to the approach used in gdb. Therefore, if you need to study, say, 50 threads, this can turn into a rather time-consuming and boring task. Fortunately, Visual Studio has a tool that makes this task much easier. This is the Parallel Stacks window .
I'll admit, most of us didn't know about Parallel Stacks until Erica Sweet and her team told us about it. If during the debug session run the command
Debug -> Windows -> Parallel Stacks
- a new window will open, which displays information about the call stack of each thread in the process under investigation. This is a kind of bird's-eye view of the entire process space. Any stack frame of any thread can be double-clicked. After that, Visual Studio will jump to this frame in both the source code window and the call stack window. This helps us a lot to save time.
After we see the code in the vicinity of the crash site, we can examine the variables using the mouse, using QuickWatch, or using any of the many Visual Studio tools. Of course, in release builds, many variables are optimized, but at the same time, many are not! We, using the Visual Studio interface, can pinpoint the problem much faster than before using gdb.
Outcome
Our team is just happy to be able to debug Linux dumps in Visual Studio, the environment we are developing in. For us, this is a major improvement, as it allows many more developers than ever before to diagnose problems in the wild. This allows us all to take advantage of the powerful debugging tools of Visual Studio. After the preliminary preparation of the working environment is completed, you can be in the debug session of Visual Studio in literally a matter of minutes. This feature greatly increases the speed of finding problems and the efficiency of our work. Thanks to Erica and her team for their help.
What do you find most useful in Visual Studio?