This is another article in the series "BPF for the little ones" ( 0 , 1 , 2 ) and the first in a series of practical articles on Linux tracing with modern tools.
From it you will learn about the program and the language bpftrace
- the easiest way to dive into the world of BPF from a practical point of view, even if you know absolutely nothing about BPF. The utility bpftrace
allows you to create handler programs directly from the command line using a simple language and connect them to a huge number of kernel and user space events. Take a look at the KPDV below ... congratulations, you already know how to trace system calls with bpftrace
!
Unlike the previous articles in the series, this one is relatively short and its main part is written in the format of a tutorial, so after five minutes of reading you can create handlers and collect statistics about any event in the Linux kernel right from the command line. The end of the article discusses alternatives - ply
and BCC. Minutes in the phrase "after five minutes of reading" are meant to be Mercurial. Unique skills are not guaranteed to emerge after five minutes of reading.
Content
- Installing bpftrace
- What events can we trace?
- Bpftrace: tutorial
- BCC: utilities and framework
- Ply: bpftrace for the poor
Installing bpftrace
. sudo apt install bpftrace
( ). bpftrace
, .
. bpftrace
, , , , BTF. , bpftrace
.
master- bpftrace docker image bpftrace
. , :
$ docker pull quay.io/iovisor/bpftrace:latest
$ cd /tmp
$ docker run -v $(pwd):/o quay.io/iovisor/bpftrace:latest /bin/bash -c "cp /usr/bin/bpftrace /o"
$ sudo ./bpftrace -V
bpftrace v0.11.4
bpftrace glibc
, docker image glibc
.
, . , execve(2)
, :
$ sudo ./bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s -> %s\n", comm, str(uptr(args->filename))); }'
Attaching 1 probe...
bash -> /bin/echo
bash -> /usr/bin/ls
gnome-shell -> /bin/sh
sh -> /home/aspsk/bin/geeqie
sh -> /usr/local/sbin/geeqie
...
, , , :
$ sudo mv /tmp/bpftrace /usr/local/bin
, , , BTF, bpftrace
, docker . bpftrace
. .
. bpftrace
/ BTF, kernel headers. , bpftrace
.
BTF, , BTF kernel headers, , , . BTF , .
?
. , , , . , . : « ?!» , ,
$ sudo apt install bpftrace The following NEW packages will be installed: bpftrace Preparing to unpack .../bpftrace_0.11.0-1_amd64.deb ... Unpacking bpftrace (0.11.0-1) ... Setting up bpftrace (0.11.0-1) ... Processing triggers for man-db (2.9.3-2) ... $
, bpftrace
. ? , bpftrace
. - :
- :
BEGIN
,END
- , kprobes:
kprobe
,kretprobe
,uprobe
,uretprobe
- , BPF trampolines:
kfunc
,kretfunc
- , tracepoints:
tracepoint
- :
usdt
- , perf:
software
,hardware
,profile
,interval
,watchpoint
Bpftrace: hello world
bpftrace
awk , BEGIN
END
, bpftrace
, . :
# bpftrace -e 'BEGIN { printf("Hello world!\n"); }' Attaching 1 probe... Hello world! ^C
"Hello world!"
. , Ctrl-C
, bpftrace
— . bpftrace
exit
. , END
:
# bpftrace -e ' BEGIN { printf("Hello world!\n"); exit(); } END { printf("So long\n"); } ' Attaching 2 probes... Hello world! So long
Kprobes —
— , , , , kprobes (Kernel Probe — ) , , . , , kprobes
, , , , Linux BPF.
, schedule
:
$ sudo bpftrace -e '
k:schedule { @[comm] = count(); }
i:s:2 { exit();}
END { print(@, 10); clear(@); }
'
Attaching 3 probes...
@[Timer]: 147
@[kworker/u65:0]: 147
@[kworker/7:1]: 153
@[kworker/13:1]: 158
@[IPC I/O Child]: 170
@[IPC I/O Parent]: 172
@[kworker/12:1]: 185
@[Web Content]: 229
@[Xorg]: 269
@[SCTP timer]: 1566
@
.
-? :
$ sudo bpftrace -l 'k:*' | wc -l 61106
. , , kprobe, , , kprobes.
kretprobes
kprobe
kretprobe
. kprobe
, kretporobe
. retval
.
, vfs_write
( ):
$ sudo bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:2 { exit(); }' Attaching 2 probes... @: [1] 606 |@@@@@@@@@@@@@@@@@@@@@@@@@ | [2, 4) 0 | | [4, 8) 0 | | [8, 16) 1223 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16, 32) 0 | | [32, 64) 25 |@ |
uprobes uretprobes
, ( ), . kprobes
. bpftrace
uprobes
uretprobes
— .
, ( UID ):
$ sudo bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: [%d]: \"%s\"\n", uid, str(uptr(retval))); }' Attaching 1 probe... readline: [1000]: "echo "hello habr"" readline: [0]: "echo "hello from root"" ^C
, 2
CONFIG_DEBUG_INFO_BTF=y
, kprobes, , bpf trampolines. BTF , . , / .
Tracepoints —
- , , , . tracepoints . :
$ sudo bpftrace -l 't:*'
, kprobes
:
$ sudo bpftrace -l 't:*' | wc -l 1782
tracepoints , API: , tracepoint, , . , bpftrace
, tracepoint, :
$ sudo bpftrace -lv tracepoint:thermal:thermal_temperature tracepoint:thermal:thermal_temperature __data_loc char[] thermal_zone; int id; int temp_prev; int temp;
kprobe
, BTF, , , . BTF kprobes
kfuncs
.
usdt —
#include <sys/sdt.h> #include <sys/time.h> #include <unistd.h> int main(int argc, char **argv) { struct timeval tv; for (;;) { gettimeofday(&tv, NULL); DTRACE_PROBE1(test, probe, tv.tv_sec); sleep(1); } }
test:probe
, tv.tv_sec
— . , systemtap-sdt-dev
( ). , :
$ cc /tmp/test.c -o /tmp/test $ sudo bpftrace -l 'usdt:/tmp/test' usdt:/tmp/test:test:probe
/tmp/test
,
$ sudo bpftrace -e 'usdt:/tmp/test:test:probe { printf(" %u\n", arg0); }' Attaching 1 probe... 1612903991 1612903992 1612903993 1612903994 1612903995 ...
arg0
— tv.tv_sec
, breakpoint.
Perf
bpftrace
, Perf. , bpftrace
:
software
: -hardware
: PMCsinterval
:profile
:
`software`
perf, :
# bpftrace -l 's:*' software:alignment-faults: software:bpf-output: software:context-switches: software:cpu-clock: software:cpu-migrations: software:dummy: software:emulation-faults: software:major-faults: software:minor-faults: software:page-faults: software:task-clock:
, , ,
# bpftrace -e 'software:cpu-migrations:10 { @[comm] = count(); }' Attaching 2 probes... ^C@[kworker/u65:1]: 1 @[bpftrace]: 1 @[SCTP timer]: 2 @[Xorg]: 2
CPU . perf_event_open(2)
, , cpu-migrations
, man- PERF_COUNT_SW_CPU_MIGRATIONS
.
`hardware`
hardware counters, bpftrace
BPF. , :
bpftrace -l 'h*' hardware:backend-stalls: hardware:branch-instructions: hardware:branch-misses: hardware:bus-cycles: hardware:cache-misses: hardware:cache-references: hardware:cpu-cycles: hardware:frontend-stalls: hardware:instructions: hardware:ref-cycles:
( , . PERF_COUNT_HW_BRANCH_MISSES
):
bpftrace -e 'hardware:branch-misses:100000 { @[tid] = count(); }' Attaching 3 probes... @[1055]: 4 @[3078]: 4 @[1947]: 5 @[1066]: 6 @[2551]: 6 @[0]: 29
`interval` `profile`
interval
profile
. CPU, — CPU.
, . , :
$ sudo bpftrace -e ' kr:vfs_write { @ = hist(retval); } interval:s:2 { print(@); clear(@); } '
vfs_write
, interval
, , @
.
profile
:
# bpftrace -e ' profile:hz:99 { @[kstack] = count(); } i:s:10 { exit(); } END { print(@,1); clear(@); } ' Attaching 3 probes... @[ cpuidle_enter_state+202 cpuidle_enter+46 call_cpuidle+35 do_idle+487 cpu_startup_entry+32 start_secondary+345 secondary_startup_64+182 ]: 14455
profile
99 , exit()
, END
@
— ( , , , ).
Bpftrace: tutorial
, bpftrace
:
# bpftrace
, . , bpftrace
. , bpftrace
. , — .
-l
. . ( - , , .) :
# bpftrace -l '*kill_all*' kprobe:rfkill_alloc kprobe:kill_all kprobe:btrfs_kill_all_delayed_nodes
tracepoints
:
# bpftrace -l 't:*kill*' tracepoint:cfg80211:rdev_rfkill_poll tracepoint:syscalls:sys_enter_kill tracepoint:syscalls:sys_exit_kill tracepoint:syscalls:sys_enter_tgkill tracepoint:syscalls:sys_exit_tgkill tracepoint:syscalls:sys_enter_tkill tracepoint:syscalls:sys_exit_tkill
tracepoint:syscalls
, , bpftrace
. X
:
tracepoint:syscalls:sys_enter_X tracepoint:syscalls:sys_exit_X
- , , execve(2)
. , , - tracepoint
-v
, :
# bpftrace -lv 't:s*:sys_*_execve' tracepoint:syscalls:sys_enter_execve int __syscall_nr; const char * filename; const char *const * argv; const char *const * envp; tracepoint:syscalls:sys_exit_execve int __syscall_nr; long ret;
( *
, syscalls
sys_enter_execve
sys_exit_execve
). , -lv
args
:
# bpftrace -e ' t:s*:sys_enter_execve { printf("ENTER: %s\n", str(uptr(args->filename))); } t:s*:sys_exit_execve { printf("EXIT: %s: %d\n", comm, args->ret); } ' Attaching 2 probes... ENTER: /bin/ls EXIT: ls: 0 ENTER: /bin/lss EXIT: bash: -2
.
args->filename
. , str
, : , uptr
. bpftrace
, . , , bpftrace -lv
, , , execve ( __user
).
comm
, . args->ret
. , " " , . , - , : /bin/ls
exec
0 ( ls
/bin/ls
), /bin/lss
exec
-2
bash
.
. , . , . uptr
, .
`bpftrace`
bpftrace
awk (. 6 The AWK Programming Language
) .
<probe> <filter> { <action> }
,
# bpftrace -e 'BEGIN { printf("Hello\n") } END { printf("World\n") }'
<probe>
— BEGIN
END
, <action>
— printf
.
<filter>
,
,
# bpftrace -e 'p:s:1 /cpu == 0/ { printf(" CPU%d\n", cpu); }' Attaching 1 probe... CPU0 CPU0 ^C
, CPU 0.
. , /cpu == 0/
?
<filter>
. , write
. — sys_enter_write
sys_exit_write
:
# cat /tmp/write-times.bt t:syscalls:sys_enter_write { @[tid] = nsecs } t:syscalls:sys_exit_write /@[tid]/ { @write[comm] = sum((nsecs - @[tid]) / 1000); delete(@[tid]); } END { print(@write, 10); clear(@write); clear(@); }
, . :
# bpftrace /tmp/write-times.bt
, sys_enter_write
, write
@
, tid
.
, sys_exit_write
, /@[tid]/
, . , , - write
. ( ) @write
@[tid]
.
, ^C
, END
, @write
@
, bpftrace
.
. , /@[tid]/
?
:
bpftrace
C , , :?
, ++
, --
. if {} else {}
. while
unroll
( , BPF ). , .
: . $
, :
# bpftrace -e 'BEGIN { $x = 1; printf("%d\n", ++$x); exit(); }' # bpftrace -e 'BEGIN { if (1) { $x = 1; } printf("%d\n", ++$x); exit(); }'
— :
# bpftrace -e 'BEGIN { $x = 1; exit(); } END { printf("%d\n", $x); }'
, , @
. "" @
, write
(@[tid]
). ( bpftrace
— . , , : . , , .)
, : ? bpftrace
. kprobes
, tracepoints
— , tracepoint, Perf, , — . tracepoints
, kprobes
, Perf.
kprobes
arg0
, arg1
, .. , . :
#include <linux/skbuff.h> #include <linux/ip.h> k:icmp_echo { $skb = (struct sk_buff *) arg0; $iphdr = (struct iphdr *) ($skb->head + $skb->network_header); @pingstats[ntop($iphdr->saddr), ntop($iphdr->daddr)]++; }
, . kprobe
icmp_echo
, ICMPv4 echo request. , arg0
, — sk_buff
, . IP @pingstats
. , , IP ! , kprobe, user space, .
. , kernel headers. sk_buff
iphdr
, . ( BTF, — , .) icmp_echo
sk_buff
$skb
. $skb
, , , , iphdr
. ntop
bpftrace
, IP .
. . uptr
kptr
. : vfs_write
, — struct file
, <linux/fs.h>
. vfs_write
. (Hint: struct file
kretprobe
?)
. bpftrace
, , , sudo
.
ping
— , CPU. count
. :
# bpftrace -e 'p:hz:5000 { @x++; @y = count(); } i:s:10 { exit(); }' Attaching 2 probes... @x: 760528 @y: 799002
10 5000 16 @x
@y
. ++
. count()
, CPU- : CPU , .
bpftrace
, - . — . .
# bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:10 { exit() }' Attaching 2 probes... @: [1] 14966407 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 4) 0 | | [4, 8) 0 | | [8, 16) 1670 | | [16, 32) 0 | | [32, 64) 123 | | [64, 128) 0 | | [128, 256) 0 | | [256, 512) 0 | | [512, 1K) 0 | | [1K, 2K) 0 | | [2K, 4K) 0 | | [4K, 8K) 0 | | [8K, 16K) 7531982 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
vfs_write
, , - ! ( 20 ):
# bpftrace -e ' kr:vfs_write /retval == 1/ { @[pid, comm] = hist(retval); } i:s:10 { exit() } END { print(@, 1); clear(@); }' Attaching 3 probes... @[133835, dd]: [1] 14254511 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
, , retval
. , — . , END
, . ? :
# tr '\000' ' ' < /proc/133835/cmdline dd if=/dev/zero of=/dev/null bs=1
. bpftrace
, .
: flame graphs
profile
. . , profile
:
profile:hz:rate profile:s:rate profile:ms:rate profile:us:rate
HZ, — , , .
, , , :
bpftrace -e 'profile:hz:99 { print(kstack); exit() }' Attaching 1 probe... cpuidle_enter_state+202 cpuidle_enter+46 call_cpuidle+35 do_idle+487 cpu_startup_entry+32 rest_init+174 arch_call_rest_init+14 start_kernel+1724 x86_64_start_reservations+36 x86_64_start_kernel+139 secondary_startup_64+182
-.
, , . flame graphs, .
, bpftrace
:
# bpftrace -e 'profile:hz:333 { @[kstack] = count(); } i:s:10 { exit() }' > /tmp/raw # wc -l /tmp/raw 3374 /tmp/raw
333 ( kstack
@
, kstack
— ).
FlameGraph :
$ git clone https://github.com/brendangregg/FlameGraph.git $ cd FlameGraph $ ./stackcollapse-bpftrace.pl /tmp/raw > /tmp/ready $ ./flamegraph.pl /tmp/ready > /tmp/kstack.svg
bpftrace
, ( , gist SVG):
, CPU dd
, /dev/zero
/dev/null
. , .
. bpftrace
. FlameGraph
.
, bpftrace
. , .
BCC:
BCC , BPF — 100 . — , , Netflix, Facebook . . , .
libbcc
libbpf
, -C , . , . .
BCC - . , bpfcc-tools
. . ,
# funccount-bpfcc 'vfs_*' -d 5 Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end. FUNC COUNT b'vfs_statfs' 1 b'vfs_unlink' 1 b'vfs_lock_file' 2 b'vfs_fallocate' 31 b'vfs_statx_fd' 32 b'vfs_getattr' 80 b'vfs_getattr_nosec' 88 b'vfs_open' 108 b'vfs_statx' 174 b'vfs_writev' 2789 b'vfs_write' 6554 b'vfs_read' 7363 Detaching...
vfs_
. -p
, PID . , mplayer
, :
# funccount-bpfcc 'vfs_*' -d 5 -p 29380 Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end. FUNC COUNT b'vfs_write' 208 b'vfs_read' 629 Detaching...
BCC
BCC. mutex_lock
mutex_unlock
. , .
#! /usr/bin/python3
from bcc import BPF
from ctypes import c_int
from time import sleep, strftime
from sys import argv
b = BPF(text="""
BPF_PERCPU_ARRAY(mutex_stats, u64, 2);
static inline void inc(int key)
{
u64 *x = mutex_stats.lookup(&key);
if (x)
*x += 1;
}
void do_lock(struct pt_regs *ctx) { inc(0); }
void do_unlock(struct pt_regs *ctx) { inc(1); }
""")
b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")
print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))
while 2 * 2 == 4:
try:
sleep(1)
except KeyboardInterrupt:
exit()
print("%-8s%10d%10d" % (
strftime("%H:%M:%S"),
b["mutex_stats"].sum(0).value,
b["mutex_stats"].sum(1).value))
b["mutex_stats"].clear()
. , — BPF
:
from bcc import BPF
BPF, . BPF
-C.
BPF_PERCPU_ARRAY(mutex_stats, u64, 2); static inline void inc(int key) { u64 *x = mutex_stats.lookup(&key); if (x) *x += 1; } void do_lock(struct pt_regs *ctx) { inc(0); } void do_unlock(struct pt_regs *ctx) { inc(1); }
C, , BPF
C.
, mutex_stats
u64
, . , PERCPU , , CPU . inc
, mutex_stats
. .
, do_lock
do_unlock
.
— BPF
libllvm
, , . kprobes
:
b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")
:
print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))
while 2 * 2 == 4:
try:
sleep(1)
except KeyboardInterrupt:
exit()
print("%-8s%10d%10d" % (
strftime("%H:%M:%S"),
b["mutex_stats"].sum(0).value,
b["mutex_stats"].sum(1).value))
b["mutex_stats"].clear()
mutex_stats
. sum
mutex_stats
, CPU:
sum(index) { result = 0 CPU { result += cpu->mutex_stats[index] } return result }
. :
$ sudo ./bcc-tool-example TIME LOCKED UNLOCKED 18:06:33 11382 12993 18:06:34 11200 12783 18:06:35 18597 22553 18:06:36 20776 25516 18:06:37 59453 68201 18:06:38 49282 58064 18:06:39 25541 27428 18:06:40 22094 25280 18:06:41 5539 7250 18:06:42 5662 7351 18:06:43 5205 6905 18:06:44 6733 8438
- 18:06:35 youtube Firefox, youtube 18:06:40 . , , youtube .
, C, libbpf
CO-RE. libbpf
, libllvm
, , . , BCC libbpf
+ CO-RE BCC, . libbpf-tools. BCC to libbpf conversion guide ( ).
Ply: bpftrace
ply
, Tobias Waldekranz 2015 , bpftrace
. awk- BPF, ,
ply 'tracepoint:tcp/tcp_receive_reset { printf("saddr:%v port:%v->%v\n", data->saddr, data->sport, data->dport); }'
ply
: libc
(). , BPF . , , ply
ply script language -> BPF
.
, ply
, , — ply
, , . ply
, , , , C libbpf
— , ., , Building BPF applications with libbpf-bootstrap.
:
bpftrace
, BCC Linux:
The bpftrace One-Liner Tutorial.
bpftrace
, . , , .
bpftrace Reference Guide. ,
bpftrace
, . ,bpftrace
.
BCC Tutorial.
bpftrace
(libbpf
), BCC, BCC Reference Guide, , .
Brendan Gregg, «BPF Performance Tools». BPF Linux — BCC. BPF .
Brendan Gregg, «Systems Performance: Enterprise and the Cloud, 2nd Edition (2020)». «Systems Performance». : BPF, Solaris, . «BPF Performance Tools» «?», «?», ( BPF ).