🙌🏾 😊 👨🏽‍🔧 The complexities of monitoring running processes in Linux 🏠 👩🏽‍🚀 😰

Everyone knows how to monitor running processes on a Linux system. But almost no one achieves high accuracy in such observations. In fact, all the methods of monitoring processes that will be discussed in this material are missing something. Before we start experimenting, let's define the requirements for a process monitoring system:

Information about all processes, even about short-lived ones, should be logged.
We should have information about the full path to the executable file for all running processes.
We, within reason, should not need to modify or recompile our code for different versions of the kernel.
: - Kubernetes Docker, , / . cgroup ID . , , «» « ». , « », « », « », API, Docker . ID , . Docker .

Let's talk about common Linux APIs that can help with this task. In order not to complicate the story, we will pay special attention to the processes created using system calls execve. If we talk about a more complete solution of the problem, then during its implementation it is necessary, in addition, to monitor the processes created using system calls fork/cloneand their variants, as well as the results of the calls execveat.

Simple solutions implemented in user mode

Contacting /proc. This method, due to the problem of short-lived processes, is not particularly suitable for us.
netlink. netlink , PID . . /proc, .
Linux. — , . API . . . — , , , . , API , auditd osquery. , , auditd go-audit, in theory, can mitigate this problem. But in the case of enterprise-class solutions, you cannot know in advance whether customers are using such tools, and if they do, which ones. Nor is it possible to know ahead of time which security controls that work directly with the Auditing API are being used by clients. The second drawback is that the auditing APIs know nothing about containers. And this is despite the fact that this issue has been discussed for many years.

Simple kernel-mode debugging tools

The implementation of these mechanisms involves the use of "probes" of various types in a single copy.

▍Tracepoints

Using tracepoints ( tracepoint). Tracepoints are sensors that are statically plugged into specific locations in the kernel during compilation. Each such sensor can be turned on independently of the others, as a result of which it will issue notifications in cases when the place of the kernel code is reached where it is embedded. The kernel contains several tracepoints suitable for us, the code of which is executed at various points in the system call execve. This - sched_process_exec, open_exec, sys_enter_execve, sys_exit_execve. In order to get this list, I ran the commandcat /sys/kernel/tracing/available_events | grep execand filtered the resulting list using the information obtained from reading the kernel code. These tracepoints suit us better than the mechanisms described above, since they allow us to organize the observation of short-lived processes. But none of them gives information about the full path to the executable file of the process in the event that the parameters execare the relative path to such a file. In other words, if the user executes a command like cd /bin && ./ls, then we get the path information in the form ./ls, not in the form /bin/ls. Here's a simple example:

#    the sched_process_exec
sudo -s
cd /sys/kernel/debug/tracing
echo 1 > events/sched/sched_process_exec/enable

#  ls    
cd /bin && ./ls

#      sched_process_exec
#    ,        
cd -
cat trace | grep ls

#   
echo 0 > events/sched/sched_process_exec/enable

▍Kprobe / kretprobe sensors

Sensors kprobeallow you to extract debug information from almost anywhere in the kernel. They are like special breakpoints in kernel code that give information without stopping code execution. A sensor kprobe, unlike trackpoints, can be connected to a wide variety of functions. The code of such a sensor will be triggered during the execution of a system call execve. But I did not find in the call graph of execveany function, the parameters of which are both the PIDprocess and the full path to its executable file. As a result, we are faced with the same "relative path problem" as when using tracepoints. Here you can, relying on the peculiarities of a particular kernel, "tweak" something. After all, sensorskprobecan read data from the kernel call stack. But such a solution will not work stably in different kernel versions. Therefore, I do not consider it.

▍Using eBPF programs with tracepoints, with kprobe and kretprobe probes

Here we are talking about the fact that when some code is executed, tracepoints or sensors will be triggered, but the code of eBPF programs will be executed, and not the code of ordinary event handlers.

Using this approach opens up some new possibilities for us. Now we can run arbitrary code in the kernel when we make a system call execve. This, in theory, should give us the ability to extract any information we need from the kernel and send it to user space. There are two ways to get this kind of data, but none of them meet the above requirements.

, task_struct linux_binprm. , , . , sched_process_exec , eBPF- , dentry bprm->file->f_path.dentry, . eBPF- . . , eBPF- , , , .
eBPF . , . — API. — . (, eBPF, cgroup ID, ).

«»

LD_PRELOAD exec libc. , . , , , , .
execve, fork/clone chdir , . , execve execve . — eBPF- eBPF- , .
, ptrace. -. — ptrace seccomp SECCOMP_RET_TRACE. seccomp execve , execve seccomp execve.
Using AppArmor. You can write an AppArmor profile to prevent processes from calling executable files. If you use this profile in the training mode (complain), then AppArmor will not prohibit the execution of processes, but will only issue notifications about violations of the rules specified in the profile. If you connect a profile to each running process, then we get a working, but very unattractive and too "hackish" solution. It is probably not worth using this approach.

Outcome

None of the APIs we've reviewed are perfect. Below I want to give some guidelines on which approaches to use to get information about processes and when to use them:

— auditd go-audit. , . , , , . , , , - API , , . — , .
, , , , , execsnoop. — .
, , , , , . , . , , eBPF- eBPF-, perf... A story about all this is worthy of a separate article. The most important thing to remember when choosing this method of monitoring processes is the following. If you use eBPF programs, check the possibility of their static compilation, which will allow you not to depend on the kernel headers. But it is precisely this dependence that we are trying to avoid using this method. Using this method also means that you cannot work with kernel data structures and that you cannot use frameworks like BCC that compile eBPF programs at runtime.
If you are not interested in short-lived processes and the previous recommendations do not suit you, use netlink capabilities together with /proc.

How do you organize monitoring of running processes in Linux?

The complexities of monitoring running processes in Linux