In the beginning there was a technology called BPF. We looked at it in the previous , Old Testament, article of this cycle. In 2013, thanks to the efforts of Alexei Starovoitov and Daniel Borkman, an improved version of it, optimized for modern 64-bit machines, was developed and included in the Linux kernel. This new technology was briefly called Internal BPF, then it was renamed Extended BPF, and now, after a few years, everyone calls it simply BPF.
Roughly speaking, BPF allows arbitrary user-provided code to run in the Linux kernel space, and the new architecture is so successful that we need a dozen more articles to describe all its uses. ( The only thing the developers failed to deal with, as you can see in the CPDV below, was creating a decent logo. )
This article describes the structure of the BPF virtual machine, kernel interfaces for working with BPF, development tools, as well as a short, very short, overview of the existing capabilities, i.e. everything that we need in the future for a deeper study of the practical applications of BPF.
Summary of the article
Introduction to BPF Architecture. First, we'll take a bird's eye view of the BPF architecture and outline the main components.
BPF. , BPF.
BPF, bpffs. BPF β .
bpf. , , , β bpf(2)
.
BPF libbpf. , , . . libbpf
. BPF, .
Kernel Helpers. BPF - β , , , BPF .
maps BPF. , , . verifier.
. , .
. , , , . , .
BPF
BPF ( ) BPF, RISC . , , Berkeley UNIX, , .
BPF 64- , SDN (Software-defined networking). BPF, BPF Linux , , , , , , .
BPF β -, «» . BPF , - . , , , - , .. BPF ( , , , ), β , , ..
. BPF, . , , , , C. llvm, - BPF.
BPF , , , . , , - BPF, JIT compiler (Just In Time). , , BPF β . β bpf(2)
, , , , , (attaches) .
: , ? ? BPF (- verifier ):
Verifier β , , . , , , β BPF, , , , , , . Verifier , BPF , , , , . verifier , , BPF.
, ? C, bpf(2)
, verifier . . . -, verifier β , , . -, , , «» , . ( , , libbpf
.)
, . , , BPF BPF. , β - (kernel helpers). BPF maps β API. , , , map -. , (per-CPU) - , , , BPF . , BPF .
maps bpf(2)
, BPF, β -. , helpers , . , BPF - , perf, ..
, BPF , .., verifier, . , .
, , BPF (, , , β BPF ). ( , BPF , ), ( , ), β , BPF , ( , BPF).
BPF , : BPF, BPF 24x7, , BPF. BPF : DDoS , SDN (, kubernetes), , - ..
BPF .
:
, , , llvm
/clang
bpf bpftool
. , . , .
BPF
BPF , C . , , . , , , , 4096 ( ).
BPF 64- r0
βr10
(program counter). r10
(frame pointer) . 512 maps.
BPF - (kernel helpers) , , . , r1
βr5
, r0
. , r6
βr9
.
r0
βr11
ABI . , x86_64
r1
βr5
, , rdi
, rsi
, rdx
, rcx
, r8
, x86_64
. , :
1: (b7) r1 = 1 mov $0x1,%rdi
2: (b7) r2 = 2 mov $0x2,%rsi
3: (b7) r3 = 3 mov $0x3,%rdx
4: (b7) r4 = 4 mov $0x4,%rcx
5: (b7) r5 = 5 mov $0x5,%r8
6: (85) call pc+1 callq 0x0000000000001ee8
r0
, r1
β , , struct xdp_md
( XDP) struct __sk_buff
( ) struct pt_regs
( tracing ) ..
, , kernel helpers, , maps. , , ...
. ( ) BPF 64- . 64- Big Endian ,
Code
β , Dst
/Src
β , , Off
β 16- , Imm
β 32- , ( K cBPF). Code
:
0, 1, 2, 3 . , BPF_LD
, BPF_LDX
, BPF_ST
, BPF_STX
, . 4, 7 (BPF_ALU
, BPF_ALU64
) ALU . 5, 6 (BPF_JMP
, BPF_JMP32
) .
BPF : , , , BPF. Verifier, JIT , BPF, maps, ..
, bpf.h
bpf_common.h
, BPF. / , , , : Unofficial eBPF spec, BPF and XDP Reference Guide, Instruction Set, Documentation/networking/filter.txt , , Linux β verifier, JIT, BPF.
: BPF
, readelf-example.c
. readelf-example.c
, :
$ clang -target bpf -c readelf-example.c -o readelf-example.o -O2
$ llvm-readelf -x .text readelf-example.o
Hex dump of section '.text':
0x00000000 b7000000 01000000 15010100 00000000 ................
0x00000010 b7000000 02000000 95000000 00000000 ................
readelf
β , , :
Code Dst Src Off Imm
b7 0 0 0000 01000000
15 0 1 0100 00000000
b7 0 0 0000 02000000
95 0 0 0000 00000000
b7
, 15
, b7
95
. , β . , , , 7, 5, 7, 5. 7 β BPF_ALU64
, 5 β BPF_JMP
. (. ) ( ):
Op S Class Dst Src Off Imm
b 0 ALU64 0 0 0 1
1 0 JMP 0 1 1 0
b 0 ALU64 0 0 0 2
9 0 JMP 0 0 0 0
b
ALU64
β BPF_MOV. -. s
(source), -, , , , Imm
. , r0 = Imm
. , 1 JMP β BPF_JEQ (jump if equal). , S
, - Imm
. , PC + Off
, PC
, , . , 9 JMP β BPF_EXIT
. , r0
. :
Op S Class Dst Src Off Imm Disassm
MOV 0 ALU64 0 0 0 1 r0 = 1
JEQ 0 JMP 0 1 1 0 if (r1 == 0) goto pc+1
MOV 0 ALU64 0 0 0 2 r0 = 2
EXIT 0 JMP 0 0 0 0 exit
:
r0 = 1
if (r1 == 0) goto END
r0 = 2
END:
exit
, r1
, r0
, , , 1, β 2. , , :
$ cat readelf-example.c
int foo(void *ctx)
{
return ctx ? 2 : 1;
}
, , .
-: 16-
, , 64 . , , lddw
(Code = 0x18
= BPF_LD
| BPF_DW
| BPF_IMM
) β Imm
. , Imm
32, β 64 , 64- 64- . 64- Imm
. :
$ cat x64.c
long foo(void *ctx)
{
return 0x11223344aabbccdd;
}
$ clang -target bpf -c x64.c -o x64.o -O2
$ llvm-readelf -x .text x64.o
Hex dump of section '.text':
0x00000000 18000000 ddccbbaa 00000000 44332211 ............D3".
0x00000010 95000000 00000000 ........
:
Binary Disassm
18000000 ddccbbaa 00000000 44332211 r0 = Imm[0]|Imm[1]
95000000 00000000 exit
lddw
, maps.
: BPF
, BPF , . , , , :
$ llvm-objdump -d x64.o
Disassembly of section .text:
0000000000000000 <foo>:
0: 18 00 00 00 dd cc bb aa 00 00 00 00 44 33 22 11 r0 = 1234605617868164317 ll
2: 95 00 00 00 00 00 00 00 exit
BPF, bpffs
( , , Alexei Starovoitov BPF Blog.)
BPF β β BPF_PROG_LOAD
BPF_MAP_CREATE
bpf(2)
, . refcount
( ) , , . refcount
, .
, refcount
, .. refcount
:
- . , - tracepoint
. -.
? (hook). , , , . , , , ( , Β«local to the processΒ»). , , β . - .
? userspace, , DDoS β BPF , . , , β , , .
, , tracepoint . . bpf. - , , , BPF , , refcount
. , .
bpffs, BPF «» (Β«pinΒ», : Β«process can pin a BPF program or mapΒ»). BPF , β DDoS, .
BPF /sys/fs/bpf
, , , :
$ mkdir bpf-mountpoint
$ sudo mount -t bpf none bpf-mountpoint
BPF_OBJ_PIN
BPF. - , , bpffs
. , , :
$ cat test.c
__attribute__((section("xdp"), used))
int test(void *ctx)
{
return 0;
}
char _license[] __attribute__((section("license"), used)) = "GPL";
bpffs
:
$ clang -target bpf -c test.c -o test.o
$ mkdir bpf-mountpoint
$ sudo mount -t bpf none bpf-mountpoint
bpftool
bpf(2)
( strace ):
$ sudo strace -e bpf bpftool prog load ./test.o bpf-mountpoint/test
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, prog_name="test", ...}, 120) = 3
bpf(BPF_OBJ_PIN, {pathname="bpf-mountpoint/test", bpf_fd=3}, 120) = 0
BPF_PROG_LOAD
, 3
BPF_OBJ_PIN
"bpf-mountpoint/test"
. - bpftool
, , :
$ sudo bpftool prog | tail -3
783: xdp name test tag 5c8ba0cf164cb46c gpl
loaded_at 2020-05-05T13:27:08+0000 uid 0
xlated 24B jited 41B memlock 4096B
unlink(2)
:
$ sudo rm ./bpf-mountpoint/test
$ sudo bpftool prog show id 783
Error: get by id (783): No such file or directory
, , , ( ), , , .
BPF , .. replace = detach old program, attach new program
. , , «» , .
bpf
BPF
BPF bpf
, :
#include <linux/bpf.h>
int bpf(int cmd, union bpf_attr *attr, unsigned int size);
cmd
β enum bpf_cmd
, attr
β size
β , .. sizeof(*attr)
. 5.8 bpf
34 , union bpf_attr
200 . , .
BPF_PROG_LOAD
, BPF β BPF . verifier, JIT compiler , , . BPF.
, BPF, , β , verifier. , , : BPF_PROG_TYPE_XDP
, XDP_PASS
( ). BPF :
r0 = 2
exit
, , , :
#define _GNU_SOURCE
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/bpf.h>
static inline __u64 ptr_to_u64(const void *ptr)
{
return (__u64) (unsigned long) ptr;
}
int main(void)
{
struct bpf_insn insns[] = {
{
.code = BPF_ALU64 | BPF_MOV | BPF_K,
.dst_reg = BPF_REG_0,
.imm = XDP_PASS
},
{
.code = BPF_JMP | BPF_EXIT
},
};
union bpf_attr attr = {
.prog_type = BPF_PROG_TYPE_XDP,
.insns = ptr_to_u64(insns),
.insn_cnt = sizeof(insns)/sizeof(insns[0]),
.license = ptr_to_u64("GPL"),
};
strncpy(attr.prog_name, "woo", sizeof(attr.prog_name));
syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
for ( ;; )
pause();
}
insns
β BPF . BPF bpf_insn
. insns
r0 = 2
, β exit
.
. , , tools/include/linux/filter.h
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, XDP_PASS),
BPF_EXIT_INSN()
};
BPF BPF, .
BPF . attr
, , , "woo"
, , . , , bpf
.
, . , bpf
, .
, . strace
, , :
$ clang -g -O2 simple-prog.c -o simple-prog
$ sudo strace ./simple-prog
execve("./simple-prog", ["./simple-prog"], 0x7ffc7b553480 /* 13 vars */) = 0
...
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=2, insns=0x7ffe03c4ed50, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_V
ERSION(0, 0, 0), prog_flags=0, prog_name="woo", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = 3
pause(
, bpf(2)
3 pause()
. . bpftool
:
# bpftool prog | grep -A3 woo
390: xdp name woo tag 3b185187f1855c4c gpl
loaded_at 2020-08-31T24:66:44+0000 uid 0
xlated 16B jited 40B memlock 4096B
pids simple-prog(10381)
, woo
ID 390, simple-prog
, ( simple-prog
, woo
). , woo
16 β β BPF, (x86_64) β 40 . :
# bpftool prog dump xlated id 390
0: (b7) r0 = 2
1: (95) exit
. , JIT :
# bpftool prog dump jited id 390
bpf_prog_3b185187f1855c4c_woo:
0: nopl 0x0(%rax,%rax,1)
5: push %rbp
6: mov %rsp,%rbp
9: sub $0x0,%rsp
10: push %rbx
11: push %r13
13: push %r14
15: push %r15
17: pushq $0x0
19: mov $0x2,%eax
1e: pop %rbx
1f: pop %r15
21: pop %r14
23: pop %r13
25: pop %rbx
26: leaveq
27: retq
- exit(2)
, , , , JIT , , .
Maps
BPF , BPF, . maps bpf
.
, maps . , , , BPF , perf events .. , . , , . <linux/bpf.h>
, , - BPF_MAP_TYPE_HASH
.
-, , C++, unordered_map<int,long> woo
, - Β« woo
, int
, β long
Β». , - BPF , , , . BPF_MAP_CREATE
bpf
. - , map. , BPF, :
$ cat simple-map.c
#define _GNU_SOURCE
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/bpf.h>
int main(void)
{
union bpf_attr attr = {
.map_type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 4,
};
strncpy(attr.map_name, "woo", sizeof(attr.map_name));
syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
for ( ;; )
pause();
}
attr
, Β« - sizeof(int)
, Β». BPF , , , , "woo"
.
:
$ clang -g -O2 simple-map.c -o simple-map
$ sudo strace ./simple-map
execve("./simple-map", ["./simple-map"], 0x7ffd40a27070 /* 14 vars */) = 0
...
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=4, max_entries=4, map_name="woo", ...}, 72) = 3
pause(
bpf(2)
3
, , pause(2)
.
background bpftool
( map ):
$ sudo bpftool map
...
114: hash name woo flags 0x0
key 4B value 4B max_entries 4 memlock 4096B
...
114 β ID . ID, map BPF_MAP_GET_FD_BY_ID
bpf
.
-. :
$ sudo bpftool map dump id 114
Found 0 elements
. hash[1] = 1
:
$ sudo bpftool map update id 114 key 1 0 0 0 value 1 0 0 0
:
$ sudo bpftool map dump id 114
key: 01 00 00 00 value: 01 00 00 00
Found 1 element
! . , , bptftool
-. ( , BTF, .)
bpftool ? :
$ sudo strace -e bpf bpftool map dump id 114
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_MAP_GET_NEXT_KEY, {map_fd=3, key=NULL, next_key=0x55856ab65280}, 120) = 0
bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x55856ab65280, value=0x55856ab652a0}, 120) = 0
key: 01 00 00 00 value: 01 00 00 00
bpf(BPF_MAP_GET_NEXT_KEY, {map_fd=3, key=0x55856ab65280, next_key=0x55856ab65280}, 120) = -1 ENOENT
ID BPF_MAP_GET_FD_BY_ID
bpf(2)
3. BPF_MAP_GET_NEXT_KEY
, NULL
«» . BPF_MAP_LOOKUP_ELEM
, value
. β , , BPF_MAP_GET_NEXT_KEY
ENOENT
.
, 1, , - hash[1] = 2
:
$ sudo strace -e bpf bpftool map update id 114 key 1 0 0 0 value 2 0 0 0
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=3, key=0x55dcd72be260, value=0x55dcd72be280, flags=BPF_ANY}, 120) = 0
, : BPF_MAP_GET_FD_BY_ID
ID, BPF_MAP_UPDATE_ELEM
.
BPF_MAP_LOOKUP_ELEM
:BPF_MAP_UPDATE_ELEM
: /BPF_MAP_DELETE_ELEM
:BPF_MAP_GET_NEXT_KEY
: ( )BPF_MAP_GET_NEXT_ID
: ,bpftool map
BPF_MAP_GET_FD_BY_ID
: IDBPF_MAP_LOOKUP_AND_DELETE_ELEM
:BPF_MAP_FREEZE
: userspace ( )BPF_MAP_LOOKUP_BATCH
,BPF_MAP_LOOKUP_AND_DELETE_BATCH
,BPF_MAP_UPDATE_BATCH
,BPF_MAP_DELETE_BATCH
: . ,BPF_MAP_LOOKUP_AND_DELETE_BATCH
β
, maps , -.
, -. , , ? :
$ sudo bpftool map update id 114 key 2 0 0 0 value 1 0 0 0
$ sudo bpftool map update id 114 key 3 0 0 0 value 1 0 0 0
$ sudo bpftool map update id 114 key 4 0 0 0 value 1 0 0 0
:
$ sudo bpftool map dump id 114
key: 01 00 00 00 value: 01 00 00 00
key: 02 00 00 00 value: 01 00 00 00
key: 04 00 00 00 value: 01 00 00 00
key: 03 00 00 00 value: 01 00 00 00
Found 4 elements
:
$ sudo bpftool map update id 114 key 5 0 0 0 value 1 0 0 0
Error: update failed: Argument list too long
, . :
$ sudo strace -e bpf bpftool map update id 114 key 5 0 0 0 value 1 0 0 0
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=80, info=0x7ffe6c626da0}}, 120) = 0
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=3, key=0x56049ded5260, value=0x56049ded5280, flags=BPF_ANY}, 120) = -1 E2BIG (Argument list too long)
Error: update failed: Argument list too long
+++ exited with 255 +++
: , BPF_MAP_UPDATE_ELEM
, , , E2BIG
.
, BPF, . , BPF. , - -, , BPF β libbpf
.
BPF libbpf
BPF , . llvm
, BPF, libbpf
, BPF BPF, llvm
/clang
.
, , libbpf
( β iproute2
, libbcc
, libbpf-go
, ..) . killer- libbpf
BPF CO-RE (Compile Once, Run Everywhere) β , BPF, , API (, ). , CO-RE, BTF ( . , BTF , β :
$ ls -lh /sys/kernel/btf/vmlinux
-r--r--r-- 1 root root 2.6M Jul 29 15:30 /sys/kernel/btf/vmlinux
, , libbpf
. CO-RE , β CONFIG_DEBUG_INFO_BTF
.
libbpf
tools/lib/bpf
bpf@vger.kernel.org
. , , https://github.com/libbpf/libbpf - .
, , libbpf
, (- ) . , BPF maps, kernel helpers, BTF, ..
, libbpf
git submodule, :
$ mkdir /tmp/libbpf-example
$ cd /tmp/libbpf-example/
$ git init-db
Initialized empty Git repository in /tmp/libbpf-example/.git/
$ git submodule add https://github.com/libbpf/libbpf.git
Cloning into '/tmp/libbpf-example/libbpf'...
remote: Enumerating objects: 200, done.
remote: Counting objects: 100% (200/200), done.
remote: Compressing objects: 100% (103/103), done.
remote: Total 3354 (delta 101), reused 118 (delta 79), pack-reused 3154
Receiving objects: 100% (3354/3354), 2.05 MiB | 10.22 MiB/s, done.
Resolving deltas: 100% (2176/2176), done.
libbpf
:
$ cd libbpf/src
$ mkdir build
$ OBJDIR=build DESTDIR=root make -s install
$ find root
root
root/usr
root/usr/include
root/usr/include/bpf
root/usr/include/bpf/bpf_tracing.h
root/usr/include/bpf/xsk.h
root/usr/include/bpf/libbpf_common.h
root/usr/include/bpf/bpf_endian.h
root/usr/include/bpf/bpf_helpers.h
root/usr/include/bpf/btf.h
root/usr/include/bpf/bpf_helper_defs.h
root/usr/include/bpf/bpf.h
root/usr/include/bpf/libbpf_util.h
root/usr/include/bpf/libbpf.h
root/usr/include/bpf/bpf_core_read.h
root/usr/lib64
root/usr/lib64/libbpf.so.0.1.0
root/usr/lib64/libbpf.so.0
root/usr/lib64/libbpf.a
root/usr/lib64/libbpf.so
root/usr/lib64/pkgconfig
root/usr/lib64/pkgconfig/libbpf.pc
: BPF BPF_PROG_TYPE_XDP
, , , C, clang
, -, . BPF, -.
: libbpf
/sys/kernel/btf/vmlinux
, , :
$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
, , , IPv4:
$ grep -A 12 'struct iphdr {' vmlinux.h
struct iphdr {
__u8 ihl: 4;
__u8 version: 4;
__u8 tos;
__be16 tot_len;
__be16 id;
__be16 frag_off;
__u8 ttl;
__u8 protocol;
__sum16 check;
__be32 saddr;
__be32 daddr;
};
BPF C:
$ cat xdp-simple.bpf.c
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
SEC("xdp/simple")
int simple(void *ctx)
{
return XDP_PASS;
}
char LICENSE[] SEC("license") = "GPL";
, . -, , vmlinux.h
, bpftool btf dump
β kernel-headers, , . libbpf
. , SEC
, ELF. xdp/simple
, BPF β , libbpf
, bpf(2)
. BPF C
β return XDP_PASS
. , "license"
.
llvm/clang, >= 10.0.0, β (. ):
$ clang --version
clang version 11.0.0 (https://github.com/llvm/llvm-project.git afc287e0abec710398465ee1f86237513f2b5091)
...
$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o
: -target bpf
libbpf
, . , -O2
, . , , ?
$ llvm-objdump --section=xdp/simple --no-show-raw-insn -D xdp-simple.bpf.o
xdp-simple.bpf.o: file format elf64-bpf
Disassembly of section xdp/simple:
0000000000000000 <simple>:
0: r0 = 2
1: exit
, ! , , , . libbpf
β API API. , , BPF .
, «» bpftool
β BPF ( , Daniel Borkman β BPF β ):
$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h
xdp-simple.skel.h
β , , . overkill, , BPF ELF - , .
, - β :
#include <err.h>
#include <unistd.h>
#include "xdp-simple.skel.h"
int main(int argc, char **argv)
{
struct xdp_simple_bpf *obj;
obj = xdp_simple_bpf__open_and_load();
if (!obj)
err(1, "failed to open and/or load BPF object\n");
pause();
xdp_simple_bpf__destroy(obj);
}
struct xdp_simple_bpf
xdp-simple.skel.h
:
struct xdp_simple_bpf {
struct bpf_object_skeleton *skeleton;
struct bpf_object *obj;
struct {
struct bpf_program *simple;
} progs;
struct {
struct bpf_link *simple;
} links;
};
API: struct bpf_program *simple
struct bpf_link *simple
. , xdp/simple
, β , .
xdp_simple_bpf__open_and_load
, ELF, , ( ELF β data, readonly data, , ..), bpf
, , :
$ clang -O2 -I ./libbpf/src/root/usr/include/ xdp-simple.c -o xdp-simple ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz
$ sudo strace -e bpf ./xdp-simple
...
bpf(BPF_BTF_LOAD, 0x7ffdb8fd9670, 120) = 3
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=2, insns=0xdfd580, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 8, 0), prog_flags=0, prog_name="simple", prog_ifindex=0, expected_attach_type=0x25 /* BPF_??? */, ...}, 120) = 4
bpftool
. ID:
# bpftool p | grep -A4 simple
463: xdp name simple tag 3b185187f1855c4c gpl
loaded_at 2020-08-01T01:59:49+0000 uid 0
xlated 16B jited 40B memlock 4096B
btf_id 185
pids xdp-simple(16498)
( bpftool prog dump xlated
):
# bpftool p d x id 463
int simple(void *ctx):
; return XDP_PASS;
0: (b7) r0 = 2
1: (95) exit
- ! C. libbpf
, , BTF, BPF_BTF_LOAD
, BPG_PROG_LOAD
.
Kernel Helpers
BPF «» β kernel helpers. - BPF , maps, Β« Β» β perf events, (, ) ..
: bpf_get_smp_processor_id
Β« Β», -, bpf_get_smp_processor_id()
, kernel/bpf/helpers.c
. , BPF. , , :
BPF_CALL_0(bpf_get_smp_processor_id)
{
return smp_processor_id();
}
- BPF Linux. , , , . (, , , , BPF_CALL_3
. .) , . struct bpf_func_proto
, -, verifier:
const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
.func = bpf_get_smp_processor_id,
.gpl_only = false,
.ret_type = RET_INTEGER,
};
, BPF , , , BPF_PROG_TYPE_XDP
xdp_func_proto
, ID - , XDP . :
static const struct bpf_func_proto *
xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
...
case BPF_FUNC_get_smp_processor_id:
return &bpf_get_smp_processor_id_proto;
...
}
}
BPF «» include/linux/bpf_types.h
BPF_PROG_TYPE
. , , C . , kernel/bpf/verifier.c
bpf_types.h
, bpf_verifier_ops[]
:
static const struct bpf_verifier_ops *const bpf_verifier_ops[] = {
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
[_id] = & _name ## _verifier_ops,
#include <linux/bpf_types.h>
#undef BPF_PROG_TYPE
};
, BPF struct bpf_verifier_ops
, _name ## _verifier_ops
, .., xdp_verifier_ops
xdp
. xdp_verifier_ops
net/core/filter.c
:
const struct bpf_verifier_ops xdp_verifier_ops = {
.get_func_proto = xdp_func_proto,
.is_valid_access = xdp_is_valid_access,
.convert_ctx_access = xdp_convert_ctx_access,
.gen_prologue = bpf_noop_prologue,
};
xdp_func_proto
, verifier , - BPF, . verifier.c
.
, BPF bpf_get_smp_processor_id
. :
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
SEC("xdp/simple")
int simple(void *ctx)
{
if (bpf_get_smp_processor_id() != 0)
return XDP_DROP;
return XDP_PASS;
}
char LICENSE[] SEC("license") = "GPL";
bpf_get_smp_processor_id
<bpf/bpf_helper_defs.h>
libbpf
static u32 (*bpf_get_smp_processor_id)(void) = (void *) 8;
, bpf_get_smp_processor_id
β , 8, 8 β BPF_FUNC_get_smp_processor_id
enum bpf_fun_id
, vmlinux.h
( bpf_helper_defs.h
, «» β ok). __u32
. , clang
BPF_CALL
Β« Β». xdp/simple
:
$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o
$ llvm-objdump -D --section=xdp/simple xdp-simple.bpf.o
xdp-simple.bpf.o: file format elf64-bpf
Disassembly of section xdp/simple:
0000000000000000 <simple>:
0: 85 00 00 00 08 00 00 00 call 8
1: bf 01 00 00 00 00 00 00 r1 = r0
2: 67 01 00 00 20 00 00 00 r1 <<= 32
3: 77 01 00 00 20 00 00 00 r1 >>= 32
4: b7 00 00 00 02 00 00 00 r0 = 2
5: 15 01 01 00 00 00 00 00 if r1 == 0 goto +1 <LBB0_2>
6: b7 00 00 00 01 00 00 00 r0 = 1
0000000000000038 <LBB0_2>:
7: 95 00 00 00 00 00 00 00 exit
call
, IMM
8, SRC_REG
β . ABI-, verifier, - . . r0
r1
2,3 u32
β 32 . 4,5,6,7 2 (XDP_PASS
) 1 (XDP_DROP
) , - 0 .
: bpftool prog dump xlated
:
$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h
$ clang -O2 -g -I ./libbpf/src/root/usr/include/ -o xdp-simple xdp-simple.c ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz
$ sudo ./xdp-simple &
[2] 10914
$ sudo bpftool p | grep simple
523: xdp name simple tag 44c38a10c657e1b0 gpl
pids xdp-simple(10915)
$ sudo bpftool p d x id 523
int simple(void *ctx):
; if (bpf_get_smp_processor_id() != 0)
0: (85) call bpf_get_smp_processor_id#114128
1: (bf) r1 = r0
2: (67) r1 <<= 32
3: (77) r1 >>= 32
4: (b7) r0 = 2
; }
5: (15) if r1 == 0x0 goto pc+1
6: (b7) r0 = 1
7: (95) exit
, verifier kernel-helper.
: , , !
-
u64 fn(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
- r1
βr5
, r0
. , β .
kernel helper BPF . xdp-simple.bpf.c
( ):
SEC("xdp/simple")
int simple(void *ctx)
{
bpf_printk("running on CPU%u\n", bpf_get_smp_processor_id());
return XDP_PASS;
}
CPU, . :
$ llvm-objdump -D --section=xdp/simple --no-show-raw-insn xdp-simple.bpf.o
0000000000000000 <simple>:
0: r1 = 10
1: *(u16 *)(r10 - 8) = r1
2: r1 = 8441246879787806319 ll
4: *(u64 *)(r10 - 16) = r1
5: r1 = 2334956330918245746 ll
7: *(u64 *)(r10 - 24) = r1
8: call 8
9: r1 = r10
10: r1 += -24
11: r2 = 18
12: r3 = r0
13: call 6
14: r0 = 2
15: exit
0-7 running on CPU%u\n
, 8 bpf_get_smp_processor_id
. 9-12 bpf_printk
β r1
, r2
, r3
. , ? bpf_printk
β - bpf_trace_printk
, .
xdp-simple.c
, lo
- !
$ cat xdp-simple.c
#include <linux/if_link.h>
#include <err.h>
#include <unistd.h>
#include "xdp-simple.skel.h"
int main(int argc, char **argv)
{
__u32 flags = XDP_FLAGS_SKB_MODE;
struct xdp_simple_bpf *obj;
obj = xdp_simple_bpf__open_and_load();
if (!obj)
err(1, "failed to open and/or load BPF object\n");
bpf_set_link_xdp_fd(1, -1, flags);
bpf_set_link_xdp_fd(1, bpf_program__fd(obj->progs.simple), flags);
cleanup:
xdp_simple_bpf__destroy(obj);
}
bpf_set_link_xdp_fd
, BPF XDP . lo
, 1. , , . , pause
: - , BPF , . , , lo
.
lo
:
$ sudo ./xdp-simple
$ sudo bpftool p | grep simple
669: xdp name simple tag 4fca62e77ccb43d6 gpl
$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
prog/xdp id 669
, ID 669 ID lo
. 127.0.0.1
(request + reply):
$ ping -c1 localhost
/sys/kernel/debug/tracing/trace_pipe
, bpf_printk
:
# cat /sys/kernel/debug/tracing/trace_pipe
ping-13937 [000] d.s1 442015.377014: bpf_trace_printk: running on CPU0
ping-13937 [000] d.s1 442015.377027: bpf_trace_printk: running on CPU0
lo
CPU0 β BPF !
, bpf_printk
: production, - .
maps BPF
: BPF
, . , , . xdp-simple.bpf.c
:
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 8);
__type(key, u32);
__type(value, u64);
} woo SEC(".maps");
SEC("xdp/simple")
int simple(void *ctx)
{
u32 key = bpf_get_smp_processor_id();
u32 *val;
val = bpf_map_lookup_elem(&woo, &key);
if (!val)
return XDP_ABORTED;
*val += 1;
return XDP_PASS;
}
char LICENSE[] SEC("license") = "GPL";
woo
: 8 , u64
( C u64 woo[8]
). "xdp/simple"
key
- bpf_map_lookup_element
, . : , CPU . :
$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o
$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h
$ clang -O2 -g -I ./libbpf/src/root/usr/include/ -o xdp-simple xdp-simple.c ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz
$ sudo ./xdp-simple
, lo
:
$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
prog/xdp id 108
$ for s in `seq 234`; do sudo ping -f -c 100 127.0.0.1 >/dev/null 2>&1; done
:
$ sudo bpftool map dump name woo
[
{ "key": 0, "value": 0 },
{ "key": 1, "value": 400 },
{ "key": 2, "value": 0 },
{ "key": 3, "value": 0 },
{ "key": 4, "value": 0 },
{ "key": 5, "value": 0 },
{ "key": 6, "value": 0 },
{ "key": 7, "value": 46400 }
]
CPU7. , , BPF β bpf_mp_*
.
, BPF
val = bpf_map_lookup_elem(&woo, &key);
-
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
&woo
struct { ... }
...
, , &woo
( 4):
llvm-objdump -D --section xdp/simple xdp-simple.bpf.o
xdp-simple.bpf.o: file format elf64-bpf
Disassembly of section xdp/simple:
0000000000000000 <simple>:
0: 85 00 00 00 08 00 00 00 call 8
1: 63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0
2: bf a2 00 00 00 00 00 00 r2 = r10
3: 07 02 00 00 fc ff ff ff r2 += -4
4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
6: 85 00 00 00 01 00 00 00 call 1
...
:
$ llvm-readelf -r xdp-simple.bpf.o | head -4
Relocation section '.relxdp/simple' at offset 0xe18 contains 1 entries:
Offset Info Type Symbol's Value Symbol's Name
0000000000000020 0000002700000001 R_BPF_64_64 0000000000000000 woo
, map ( 4):
$ sudo bpftool prog dump x name simple
int simple(void *ctx):
0: (85) call bpf_get_smp_processor_id#114128
1: (63) *(u32 *)(r10 -4) = r0
2: (bf) r2 = r10
3: (07) r2 += -4
4: (18) r1 = map[id:64]
...
, , - &woo
- libbpf
. strace
:
$ sudo strace -e bpf ./xdp-simple
...
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=8, max_entries=8, map_name="woo", ...}, 120) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, prog_name="simple", ...}, 120) = 5
, libbpf
woo
simple
. , :
-
xdp_simple_bpf__open_and_load
xdp-simple.skel.h
-
xdp_simple_bpf__load
xdp-simple.skel.h
-
bpf_object__load_skeleton
libbpf/src/libbpf.c
-
bpf_object__load_xattr
libbpf/src/libbpf.c
, , bpf_object__create_maps
, maps, . ( , BPF_MAP_CREATE
strace
.) bpf_object__relocate
, , woo
. , , - bpf_program__relocate
, :
case RELO_LD64:
insn[0].src_reg = BPF_PSEUDO_MAP_FD;
insn[0].imm = obj->maps[relo->map_idx].fd;
break;
,
18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
- BPF_PSEUDO_MAP_FD
, IMM , , , 0xdeadbeef
,
18 11 00 00 ef eb ad de 00 00 00 00 00 00 00 00 r1 = 0 ll
BPF. BPF_MAP_CREATE
, ID BPF_MAP_GET_FD_BY_ID
.
, libbpf
:
libbpf
ELF,-
LD64
, , . , β BPF_PSEUDO_MAP_FD
- , β kernel/bpf/verifier.c
, struct bpf_map
:
static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env) {
...
f = fdget(insn[0].imm);
map = __bpf_map_get(f);
if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
addr = (unsigned long)map;
}
insn[0].imm = (u32)addr;
insn[1].imm = addr >> 32;
- verifier
struct bpf_map
ELF libbpf
, .
libbpf
, , , , , libbpf
. , , , , ply
, BPF .
, , xdp-simple
. , , gist.
:
-
BPF_MAP_TYPE_ARRAY
BPF_MAP_CREATE
, - , ,
-
lo
,
int main(void)
{
int map_fd, prog_fd;
map_fd = map_create();
if (map_fd < 0)
err(1, "bpf: BPF_MAP_CREATE");
prog_fd = prog_load(map_fd);
if (prog_fd < 0)
err(1, "bpf: BPF_PROG_LOAD");
xdp_attach(1, prog_fd);
}
map_create
, bpf
β Β«, , 8 __u64
Β»:
static int map_create()
{
union bpf_attr attr;
memset(&attr, 0, sizeof(attr));
attr.map_type = BPF_MAP_TYPE_ARRAY,
attr.key_size = sizeof(__u32),
attr.value_size = sizeof(__u64),
attr.max_entries = 8,
strncpy(attr.map_name, "woo", sizeof(attr.map_name));
return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
}
:
static int prog_load(int map_fd)
{
union bpf_attr attr;
struct bpf_insn insns[] = {
...
};
memset(&attr, 0, sizeof(attr));
attr.prog_type = BPF_PROG_TYPE_XDP;
attr.insns = ptr_to_u64(insns);
attr.insn_cnt = sizeof(insns)/sizeof(insns[0]);
attr.license = ptr_to_u64("GPL");
strncpy(attr.prog_name, "woo", sizeof(attr.prog_name));
return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
}
prog_load
β BPF struct bpf_insn insns[]
. , C, :
$ llvm-objdump -D --section xdp/simple xdp-simple.bpf.o
0000000000000000 <simple>:
0: 85 00 00 00 08 00 00 00 call 8
1: 63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0
2: bf a2 00 00 00 00 00 00 r2 = r10
3: 07 02 00 00 fc ff ff ff r2 += -4
4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
6: 85 00 00 00 01 00 00 00 call 1
7: b7 01 00 00 00 00 00 00 r1 = 0
8: 15 00 04 00 00 00 00 00 if r0 == 0 goto +4 <LBB0_2>
9: 61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0)
10: 07 01 00 00 01 00 00 00 r1 += 1
11: 63 10 00 00 00 00 00 00 *(u32 *)(r0 + 0) = r1
12: b7 01 00 00 02 00 00 00 r1 = 2
0000000000000068 <LBB0_2>:
13: bf 10 00 00 00 00 00 00 r0 = r1
14: 95 00 00 00 00 00 00 00 exit
, 14 struct bpf_insn
(: , , linux/bpf.h
linux/bpf_common.h
struct bpf_insn insns[]
):
struct bpf_insn insns[] = {
/* 85 00 00 00 08 00 00 00 call 8 */
{
.code = BPF_JMP | BPF_CALL,
.imm = 8,
},
/* 63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0 */
{
.code = BPF_MEM | BPF_STX,
.off = -4,
.src_reg = BPF_REG_0,
.dst_reg = BPF_REG_10,
},
/* bf a2 00 00 00 00 00 00 r2 = r10 */
{
.code = BPF_ALU64 | BPF_MOV | BPF_X,
.src_reg = BPF_REG_10,
.dst_reg = BPF_REG_2,
},
/* 07 02 00 00 fc ff ff ff r2 += -4 */
{
.code = BPF_ALU64 | BPF_ADD | BPF_K,
.dst_reg = BPF_REG_2,
.imm = -4,
},
/* 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll */
{
.code = BPF_LD | BPF_DW | BPF_IMM,
.src_reg = BPF_PSEUDO_MAP_FD,
.dst_reg = BPF_REG_1,
.imm = map_fd,
},
{ }, /* placeholder */
/* 85 00 00 00 01 00 00 00 call 1 */
{
.code = BPF_JMP | BPF_CALL,
.imm = 1,
},
/* b7 01 00 00 00 00 00 00 r1 = 0 */
{
.code = BPF_ALU64 | BPF_MOV | BPF_K,
.dst_reg = BPF_REG_1,
.imm = 0,
},
/* 15 00 04 00 00 00 00 00 if r0 == 0 goto +4 <LBB0_2> */
{
.code = BPF_JMP | BPF_JEQ | BPF_K,
.off = 4,
.src_reg = BPF_REG_0,
.imm = 0,
},
/* 61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0) */
{
.code = BPF_MEM | BPF_LDX,
.off = 0,
.src_reg = BPF_REG_0,
.dst_reg = BPF_REG_1,
},
/* 07 01 00 00 01 00 00 00 r1 += 1 */
{
.code = BPF_ALU64 | BPF_ADD | BPF_K,
.dst_reg = BPF_REG_1,
.imm = 1,
},
/* 63 10 00 00 00 00 00 00 *(u32 *)(r0 + 0) = r1 */
{
.code = BPF_MEM | BPF_STX,
.src_reg = BPF_REG_1,
.dst_reg = BPF_REG_0,
},
/* b7 01 00 00 02 00 00 00 r1 = 2 */
{
.code = BPF_ALU64 | BPF_MOV | BPF_K,
.dst_reg = BPF_REG_1,
.imm = 2,
},
/* <LBB0_2>: bf 10 00 00 00 00 00 00 r0 = r1 */
{
.code = BPF_ALU64 | BPF_MOV | BPF_X,
.src_reg = BPF_REG_1,
.dst_reg = BPF_REG_0,
},
/* 95 00 00 00 00 00 00 00 exit */
{
.code = BPF_JMP | BPF_EXIT
},
};
, β map_fd
.
β xdp_attach
. , XDP bpf
. , BPF XDP Linux, , ( ) : netlink sockets, . RFC3549. xdp_attach
β libbpf
, , netlink.c
, , :
netlink NETLINK_ROUTE
:
int netlink_open(__u32 *nl_pid)
{
struct sockaddr_nl sa;
socklen_t addrlen;
int one = 1, ret;
int sock;
memset(&sa, 0, sizeof(sa));
sa.nl_family = AF_NETLINK;
sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
if (sock < 0)
err(1, "socket");
if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK, &one, sizeof(one)) < 0)
warnx("netlink error reporting not supported");
if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0)
err(1, "bind");
addrlen = sizeof(sa);
if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0)
err(1, "getsockname");
*nl_pid = sa.nl_pid;
return sock;
}
:
static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq)
{
bool multipart = true;
struct nlmsgerr *errm;
struct nlmsghdr *nh;
char buf[4096];
int len, ret;
while (multipart) {
multipart = false;
len = recv(sock, buf, sizeof(buf), 0);
if (len < 0)
err(1, "recv");
if (len == 0)
break;
for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
nh = NLMSG_NEXT(nh, len)) {
if (nh->nlmsg_pid != nl_pid)
errx(1, "wrong pid");
if (nh->nlmsg_seq != seq)
errx(1, "INVSEQ");
if (nh->nlmsg_flags & NLM_F_MULTI)
multipart = true;
switch (nh->nlmsg_type) {
case NLMSG_ERROR:
errm = (struct nlmsgerr *)NLMSG_DATA(nh);
if (!errm->error)
continue;
ret = errm->error;
// libbpf_nla_dump_errormsg(nh); too many code to copy...
goto done;
case NLMSG_DONE:
return 0;
default:
break;
}
}
}
ret = 0;
done:
return ret;
}
, , , :
static int xdp_attach(int ifindex, int prog_fd)
{
int sock, seq = 0, ret;
struct nlattr *nla, *nla_xdp;
struct {
struct nlmsghdr nh;
struct ifinfomsg ifinfo;
char attrbuf[64];
} req;
__u32 nl_pid = 0;
sock = netlink_open(&nl_pid);
if (sock < 0)
return sock;
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
req.nh.nlmsg_type = RTM_SETLINK;
req.nh.nlmsg_pid = 0;
req.nh.nlmsg_seq = ++seq;
req.ifinfo.ifi_family = AF_UNSPEC;
req.ifinfo.ifi_index = ifindex;
/* started nested attribute for XDP */
nla = (struct nlattr *)(((char *)&req)
+ NLMSG_ALIGN(req.nh.nlmsg_len));
nla->nla_type = NLA_F_NESTED | IFLA_XDP;
nla->nla_len = NLA_HDRLEN;
/* add XDP fd */
nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
nla_xdp->nla_type = IFLA_XDP_FD;
nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
memcpy((char *)nla_xdp + NLA_HDRLEN, &prog_fd, sizeof(prog_fd));
nla->nla_len += nla_xdp->nla_len;
/* if user passed in any flags, add those too */
__u32 flags = XDP_FLAGS_SKB_MODE;
nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
nla_xdp->nla_type = IFLA_XDP_FLAGS;
nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
nla->nla_len += nla_xdp->nla_len;
req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
if (send(sock, &req, req.nh.nlmsg_len, 0) < 0)
err(1, "send");
ret = bpf_netlink_recv(sock, nl_pid, seq);
cleanup:
close(sock);
return ret;
}
, :
$ cc nolibbpf.c -o nolibbpf
$ sudo strace -e bpf ./nolibbpf
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, map_name="woo", ...}, 72) = 3
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=15, prog_name="woo", ...}, 72) = 4
+++ exited with 0 +++
, lo
:
$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
prog/xdp id 160
map:
$ for s in `seq 234`; do sudo ping -f -c 100 127.0.0.1 >/dev/null 2>&1; done
$ sudo bpftool m dump name woo
key: 00 00 00 00 value: 90 01 00 00 00 00 00 00
key: 01 00 00 00 value: 00 00 00 00 00 00 00 00
key: 02 00 00 00 value: 00 00 00 00 00 00 00 00
key: 03 00 00 00 value: 00 00 00 00 00 00 00 00
key: 04 00 00 00 value: 00 00 00 00 00 00 00 00
key: 05 00 00 00 value: 00 00 00 00 00 00 00 00
key: 06 00 00 00 value: 40 b5 00 00 00 00 00 00
key: 07 00 00 00 value: 00 00 00 00 00 00 00 00
Found 8 elements
, . , , map . - , , libbpf
(BTF). .
BPF.
, BPF β BPF , clang
, . , - , BPF , , BPF 2019-,
llvm
/clang
pahole
bpftool
( : Debian 10.)
llvm/clang
BPF LLVM , BPF gcc, LLVM. clang
git:
$ sudo apt install ninja-build
$ git clone --depth 1 https://github.com/llvm/llvm-project.git
$ mkdir -p llvm-project/llvm/build/install
$ cd llvm-project/llvm/build
$ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
-DLLVM_ENABLE_PROJECTS="clang" \
-DBUILD_SHARED_LIBS=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_BUILD_RUNTIME=OFF
$ time ninja
...
$
, :
$ ./bin/llc --version
LLVM (http://llvm.org/):
LLVM version 11.0.0git
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver1
Registered Targets:
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
( clang
bpf_devel_QA.)
, PATH
, :
export PATH="`pwd`/bin:$PATH"
( .bashrc
. ~/bin/activate-llvm.sh
. activate-llvm.sh
.)
Pahole BTF
pahole
BTF. BTF, , . , , pahole
( pahole
CONFIG_DEBUG_INFO_BTF
:
$ git clone https://git.kernel.org/pub/scm/devel/pahole/pahole.git
$ cd pahole/
$ sudo apt install cmake
$ mkdir build
$ cd build/
$ cmake -D__LIB=lib ..
$ make
$ sudo make install
$ which pahole
/usr/local/bin/pahole
BPF
BPF . , , , BPF , , BPF, , , . .
, , -, , -, . BPF . BPF Linux (David Miller) β Linux. β β β net
net-next
. BPF bpf
bpf-next
, net net-next, . . bpf_devel_QA netdev-FAQ. , (*-next
).
:
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
$ cd bpf-next
:
$ cp /boot/config-`uname -r` .config
$ make localmodconfig
BPF .config
( , CONFIG_BPF
, systemd). , :
CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_LSM=y
CONFIG_BPF_SYSCALL=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
CONFIG_IPV6_SEG6_BPF=y
# CONFIG_NETFILTER_XT_MATCH_BPF is not set
# CONFIG_BPFILTER is not set
CONFIG_NET_CLS_BPF=y
CONFIG_NET_ACT_BPF=y
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_BPF_KPROBE_OVERRIDE=y
CONFIG_DEBUG_INFO_BTF=y
(, clang
, CC=clang
):
$ make -s -j $(getconf _NPROCESSORS_ONLN)
$ sudo make modules_install
$ sudo make install
( kexec
kexec-tools
):
v=5.8.0-rc6+ # , v=`uname -r`
sudo kexec -l -t bzImage /boot/vmlinuz-$v --initrd=/boot/initrd.img-$v --reuse-cmdline &&
sudo kexec -e
bpftool
bpftool
, Linux. BPF BPF BPF β , maps, BPF, .. man pages , , .
bpftool
RHEL, Fedora Ubuntu (., , , bpftool
Debian). , bpftool
:
$ cd ${linux}/tools/bpf/bpftool
# ... clang,
$ make -s
Auto-detecting system features:
... libbfd: [ on ]
... disassembler-four-args: [ on ]
... zlib: [ on ]
... libcap: [ on ]
... clang-bpf-co-re: [ on ]
Auto-detecting system features:
... libelf: [ on ]
... zlib: [ on ]
... bpf: [ on ]
$
( ${linux}
β .) bpftool
${linux}/tools/bpf/bpftool
( root
) /usr/local/sbin
.
bpftool
clang
, , , , β , ,
$ sudo bpftool feature probe kernel
Scanning system configuration...
bpf() syscall for unprivileged users is enabled
JIT compiler is enabled
JIT compiler hardening is disabled
JIT compiler kallsyms exports are enabled for root
...
, BPF .
,
# bpftool f p k
iproute2
, , , ip a s eth0
ip addr show dev eth0
.
BPF . , UNIX: , () , . , , BPF, , ABI, , , -.
, , , ( - ), β , () . , .
, , BPF «» . , , : BPF ( 5.8 30 ), , , , BPF , BPF, β security BPF.
BPF and XDP Reference Guide β BPF cilium, Daniel Borkman, BPF. , , Daniel . , BPF XDP TC
ip
iproute2
.
Documentation/networking/filter.txt β , extended BPF. , .
BPF facebook. , , Alexei Starovoitov ( eBPF) Andrii Nakryiko β (
libbpf
).
Bpftool secrets . An entertaining twitter thread from Quentin Monnet with examples and secrets of using bpftool.
Dive into BPF: a list of reading material . Giant (and still maintained) list of links to BPF documentation from Quentin Monnet.