strace
. Here's what happened when running strace
in the Docker container on my laptop:
$ docker run -it ubuntu:18.04 /bin/bash
$ # ... install strace ...
root@e27f594da870:/# strace ls
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted
strace
works through a system call ptrace
, so ptrace
it won't work without permission ! But it's easy to fix, and on my laptop I did it like this:
docker run --cap-add=SYS_PTRACE -it ubuntu:18.04 /bin/bash
But it was interesting to me not to solve the problem, but to figure out why this situation generally arises. So why does it
strace
not work and --cap-add=SYS_PTRACE
fix everything?
Hypothesis 1: Container processes do not have their own privilege CAP_SYS_PTRACE
Since the problem is consistently resolved through
--cap-add=SYS_PTRACE
, it always seemed to me that Docker container processes, by definition, do not have their own privilege CAP_SYS_PTRACE
, but for two reasons something does not add up here.
Reason 1: As an experiment, I, being logged in as a regular user, could easily start
strace
any process, however, I CAP_SYS_PTRACE
didnโt find anything in the privilege of my current process :
$ getpcaps $$
Capabilities for `11589': =
Reason 2: in
man capabilities
the privilege CAP_SYS_PTRACE
reads as follows:
CAP_SYS_PTRACE
* Trace arbitrary processes using ptrace(2);
The whole point
CAP_SYS_PTRACE
is so that we, by analogy with root, can take control of an arbitrary process of any user. For ptrace
your user this privilege does not need a conventional process.
In addition, I carried out one more check: I launched the Docker container through
docker run --cap-add=SYS_PTRACE -it ubuntu:18.04 /bin/bash
, then revoked the privilege CAP_SYS_PTRACE
- and strace
continued to work correctly even without the privilege. Why?!
Hypothesis 2: Case in user namespace?
My next (and much less well-founded) hypothesis sounded like "hmm, maybe the process is in a different user namespace and
strace
doesn't work ... just because?" It looks like a set of not very coherent statements, but I still tried to look at the problem from this side.
So, is the process in a different user namespace? This is how it looks in the container:
root@e27f594da870:/# ls /proc/$$/ns/user -l
... /proc/1/ns/user -> 'user:[4026531837]'
And this is how it looks on the host:
bork@kiwi:~$ ls /proc/$$/ns/user -l
... /proc/12177/ns/user -> 'user:[4026531837]'
The root in the container is the same user as the root on the host, because they have a common identifier in the user namespace (4026531837), so there should not be any
strace
reasons that interfere with the work . As you can see, the hypothesis turned out to be so-so, but then I did not yet realize that the users in the container and on the host are the same, and this approach seemed interesting to me.
Hypothesis 3: The system call is ptrace
blocked by a ruleseccomp-bpf
I already knew that there is a rule in Docker to restrict a large number of system calls to be run by container processors in Docker
seccomp-bpf
, and it turned out that there are and in its list of calls blocked by definition ptrace
! (In fact, the call list is an exception sheet and ptrace
simply does not get into it, but the result does not change.)
Now itโs clear why the container does not work in the Docker
strace
, because itโs obvious that ptrace
it will not work to call a completely blocked one.
Let's test this hypothesis and see if we can use
strace
the Docker container if we disable all seccomp rules:
$ docker run --security-opt seccomp=unconfined -it ubuntu:18.04 /bin/bash
$ strace ls
execve("/bin/ls", ["ls"], 0x7ffc69a65580 /* 8 vars */) = 0
... it works fine ...
Fine! Everything works, and the secret is revealed! That's just ...
Why --cap-add=SYS_PTRACE
does it solve the problem?
We still haven't explained why it
--cap-add=SYS_PTRACE
solves the emerging challenge problem. The main page docker run
explains how the argument works as follows --cap-add
:
--cap-add=[]
Add Linux capabilities
None of this has anything to do with seccomp rules! What's the matter?
Let's take a look at the Docker source code.
If the documentation does not help already, all that remains for us is to plunge into the source.
Go has one nice feature: thanks to dependency vending in the Go repository, you
grep
can walk through the entire repository and find the code you are interested in. So I github.com/moby/moby
cloned and scoured him for expressions of the kind rg CAP_SYS_PTRACE
.
In my opinion, this is what happens here: in the implementation of seccomp in the container, in the contrib / seccomp / seccomp_default.go section, there is a lot of code that, through the seccomp rule, checks whether a process with privileges has permission to use system calls in accordance with this privilege.
case "CAP_SYS_PTRACE":
s.Syscalls = append(s.Syscalls, specs.LinuxSyscall{
Names: []string{
"kcmp",
"process_vm_readv",
"process_vm_writev",
"ptrace",
},
Action: specs.ActAllow,
Args: []specs.LinuxSeccompArg{},
})
There is also code there, which in moby and for profiles / seccomp / seccomp.go , and for profile seccomp, by definition, performs similar operations, so we probably found our answer!
Docker --cap-add
can do more than said
In the end, it seems that
--cap-add
it does not exactly what it says on the main page, and should rather look like --cap-add-and-also-whitelist-some-extra-system-calls-if-required
. And it seems to be true: if you have the privilege of the spirit CAP_SYS_PTRACE
, which allows you to use a system call process_vm_readv
, but the call is blocked Seccomp profile, you is not much help, so that the authorization to use the system calls process_vm_readv
and ptrace
through CAP_SYS_PTRACE
looks reasonable.
Turns out to strace
work in the latest versions of Docker
For kernel versions 4.8 and higher, thanks to this commit , Docker 19.03 finally allowed system calls
ptrace
. Except, on my laptop, Docker is still version 18.09.7, and this commit is obviously missing.
That's all!
It turned out to be interesting to deal with this problem, and I think that this is a good example of a non-trivially interacting moving โfillingโ of containers.
If you liked this post, you might like my magazine โ How Containers Work โ, its 24 pages explain the features of the Linux kernel for organizing container work. There you can see the privileges and seccomp-bpf .