Information on this topic on the web, or elsewhere, is rare. The most important of the available resources is the official x64 ABI, you can download it here (hereinafter it will be referred to as the "ABI"). Some of the information can also be found on the
man
-pagesgcc
... The goal of this article is to provide accessible recommendations on the topic, discuss related issues, and also demonstrate some concepts through the code used in the work with good examples.
Important Note: This article is not intended to be a tutorial for beginners. Before acquaintance, it is recommended to have a strong command of C and assembler, as well as a basic acquaintance with the x64 architecture.
See also our previous post on a similar topic: How x86_x64 addresses memory
Code Models. Motivational part
In the x64 architecture, both code and data are referenced through command-relative (or, using x64 jargon, RIP-relative) addressing models. In these commands, the shift from RIP is limited to 32 bits, however, there may be cases when the team, when trying to address part of the memory or data, simply does not have a 32-bit shift, for example, when working with programs more than two gigabytes.
One way to solve this problem is to completely abandon the RIP-relative addressing mode in favor of a full 64-bit shift for all data and code references. However, this step is very costly: to cover the (rather rare) case of incredibly large programs and libraries, even the simplest operations within the entire code will require more instructions than usual.
Thus, code models become a compromise. [1] A code model is a formal agreement between the programmer and the compiler in which the programmer specifies his intentions about the size of the expected program (or programs) that will contain the object module currently being compiled. [2] The code models are needed so that the programmer can tell the compiler: "Don't worry, this object module will only go into small programs, so you can use fast RIP-relative addressing modes." On the other hand, it may tell the compiler the following: "we are going to link this module into large programs, so please use the leisurely and safe absolute addressing modes with full 64-bit shift."
What this article will talk about
We will talk about the two scenarios described above, a small code model and a large code model: the first model tells the compiler that a 32-bit relative shift should be enough for all references to the code and data in the object module; the second insists that the compiler use absolute 64-bit addressing modes. In addition, there is also an intermediate version, the so-called middle code model .
Each of these code models is presented in independent PIC and non-PIC variations, and we will talk about each of the six.
Original example in C
To demonstrate the concepts discussed in this article, I will use the following C program and compile it with various code models. As you can see, the function
main
gets access to four different global arrays and one global function. Arrays differ in two parameters: size and visibility. Size is important to explain the average code model and will not be needed when working with small and large models. Visibility is important for the operation of PIC code models and is either static (visible only in the source file) or global (visibility to all objects linked into the program).
int global_arr[100] = {2, 3};
static int static_arr[100] = {9, 7};
int global_arr_big[50000] = {5, 6};
static int static_arr_big[50000] = {10, 20};
int global_func(int param)
{
return param * 10;
}
int main(int argc, const char* argv[])
{
int t = global_func(argc);
t += global_arr[7];
t += static_arr[7];
t += global_arr_big[7];
t += static_arr_big[7];
return t;
}
gcc
uses the code model as an option value -mcmodel
. In addition, a -fpic
PIC compilation can be set with a flag .
An example of compilation into an object module through a large code model using PIC:
> gcc -g -O0 -c codemodel1.c -fpic -mcmodel=large -o codemodel1_large_pic.o
Small code model
Translation of a quote from man gcc on the small code model:
-mcmodel = small
Code generation for a small model: the program and its symbols must be linked in the bottom two gigabytes of the address space. Pointers are 64 bits in size. Programs can be built both statically and dynamically. This is the basic code model.
In other words, the compiler can safely assume that the code and data are accessible via a 32-bit RIP relative offset from any command in the code. Let's take a look at a disassembled example of a C program that we compiled through a non-PIC small code model:
> objdump -dS codemodel1_small.o
[...]
int main(int argc, const char* argv[])
{
15: 55 push %rbp
16: 48 89 e5 mov %rsp,%rbp
19: 48 83 ec 20 sub $0x20,%rsp
1d: 89 7d ec mov %edi,-0x14(%rbp)
20: 48 89 75 e0 mov %rsi,-0x20(%rbp)
int t = global_func(argc);
24: 8b 45 ec mov -0x14(%rbp),%eax
27: 89 c7 mov %eax,%edi
29: b8 00 00 00 00 mov $0x0,%eax
2e: e8 00 00 00 00 callq 33 <main+0x1e>
33: 89 45 fc mov %eax,-0x4(%rbp)
t += global_arr[7];
36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
3c: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr[7];
3f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
45: 01 45 fc add %eax,-0x4(%rbp)
t += global_arr_big[7];
48: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
4e: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr_big[7];
51: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
57: 01 45 fc add %eax,-0x4(%rbp)
return t;
5a: 8b 45 fc mov -0x4(%rbp),%eax
}
5d: c9 leaveq
5e: c3 retq
As you can see, access to all arrays is organized in the same way - using the RIP-relative shift. However, the shift in the code is 0, because the compiler does not know where the data segment will be placed, therefore, for each such access, it creates a relocation:
> readelf -r codemodel1_small.o
Relocation section '.rela.text' at offset 0x62bd8 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000002f 001500000002 R_X86_64_PC32 0000000000000000 global_func - 4
000000000038 001100000002 R_X86_64_PC32 0000000000000000 global_arr + 18
000000000041 000300000002 R_X86_64_PC32 0000000000000000 .data + 1b8
00000000004a 001200000002 R_X86_64_PC32 0000000000000340 global_arr_big + 18
000000000053 000300000002 R_X86_64_PC32 0000000000000000 .data + 31098
Let's completely decode access to
global_arr
. The disassembled segment we are interested in is:
t += global_arr[7];
36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
3c: 01 45 fc add %eax,-0x4(%rbp)
RIP-relative addressing is relative to the next command, so the shift must be patched into the command
mov
so that it corresponds to 0x3s. We are interested in the second relocation, R_X86_64_PC32
it points to the operand mov
at the address 0x38
and means the following: we take the value of the symbol, add the term and subtract the shift indicated by the relocation. If you have calculated everything correctly, you will see how the result will place a relative shift between the next command and global_arr
, plus 01
. Since it 01
means "the seventh int in the array" (in the x64 architecture the size of each int
is 4 bytes), then we need this relative shift. Thus, using RIP-relative addressing, the command correctly references global_arr[7]
.
It is also interesting to note the following: although the access commands
static_arr
here are similar, its redirection uses a different symbol, thereby pointing to a section instead of a specific symbol .data
. This is due to the actions of the linker, it places a static array in a known place in the section, and thus the array cannot be used in conjunction with other shared libraries. As a result, the linker will resolve the situation with this relocation. On the other hand, since it global_arr
can be used (or overwritten) by another shared library, the already dynamic loader will have to deal with the link to global_arr
. [3]
Finally, let's take a look at the reference to
global_func
:
int t = global_func(argc);
24: 8b 45 ec mov -0x14(%rbp),%eax
27: 89 c7 mov %eax,%edi
29: b8 00 00 00 00 mov $0x0,%eax
2e: e8 00 00 00 00 callq 33 <main+0x1e>
33: 89 45 fc mov %eax,-0x4(%rbp)
Since the operand is
callq
also RIP-relative, relocation R_X86_64_PC32
works here in the same way as placing the actual relative offset to global_func in the operand.
In conclusion, we note that due to the small code model, the compiler perceives all the data and code of the future program as accessible through a 32-bit shift, and thereby creates simple and efficient code to access all kinds of objects.
Big code model
Translation of quotes from
man
gcc
on the topic of a large code model:
-mcmodel = large
Generating code for a large model: This model makes no assumptions about addresses and section sizes.
An example of disassembled code
main
compiled with a non-PIC large model:
int main(int argc, const char* argv[])
{
15: 55 push %rbp
16: 48 89 e5 mov %rsp,%rbp
19: 48 83 ec 20 sub $0x20,%rsp
1d: 89 7d ec mov %edi,-0x14(%rbp)
20: 48 89 75 e0 mov %rsi,-0x20(%rbp)
int t = global_func(argc);
24: 8b 45 ec mov -0x14(%rbp),%eax
27: 89 c7 mov %eax,%edi
29: b8 00 00 00 00 mov $0x0,%eax
2e: 48 ba 00 00 00 00 00 movabs $0x0,%rdx
35: 00 00 00
38: ff d2 callq *%rdx
3a: 89 45 fc mov %eax,-0x4(%rbp)
t += global_arr[7];
3d: 48 b8 00 00 00 00 00 movabs $0x0,%rax
44: 00 00 00
47: 8b 40 1c mov 0x1c(%rax),%eax
4a: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr[7];
4d: 48 b8 00 00 00 00 00 movabs $0x0,%rax
54: 00 00 00
57: 8b 40 1c mov 0x1c(%rax),%eax
5a: 01 45 fc add %eax,-0x4(%rbp)
t += global_arr_big[7];
5d: 48 b8 00 00 00 00 00 movabs $0x0,%rax
64: 00 00 00
67: 8b 40 1c mov 0x1c(%rax),%eax
6a: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr_big[7];
6d: 48 b8 00 00 00 00 00 movabs $0x0,%rax
74: 00 00 00
77: 8b 40 1c mov 0x1c(%rax),%eax
7a: 01 45 fc add %eax,-0x4(%rbp)
return t;
7d: 8b 45 fc mov -0x4(%rbp),%eax
}
80: c9 leaveq
81: c3 retq
Again, it's useful to look at relocations:
Relocation section '.rela.text' at offset 0x62c18 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000030 001500000001 R_X86_64_64 0000000000000000 global_func + 0
00000000003f 001100000001 R_X86_64_64 0000000000000000 global_arr + 0
00000000004f 000300000001 R_X86_64_64 0000000000000000 .data + 1a0
00000000005f 001200000001 R_X86_64_64 0000000000000340 global_arr_big + 0
00000000006f 000300000001 R_X86_64_64 0000000000000000 .data + 31080
Since there is no need to make assumptions about the size of code sections and data, the large code model is fairly unified and identifies access to all data in the same way. Let's take another look at
global_arr
:
t += global_arr[7];
3d: 48 b8 00 00 00 00 00 movabs $0x0,%rax
44: 00 00 00
47: 8b 40 1c mov 0x1c(%rax),%eax
4a: 01 45 fc add %eax,-0x4(%rbp)
Two teams need to get the desired value from the array. The first command places an absolute 64-bit address in
rax
, which, as we will see shortly, turns out to be an address global_arr
, while the second command loads a word from (rax) + 01
into eax
.
So let's focus on the team at
0x3d
, movabs
absolute 64-bit version mov
in the x64 architecture. It can drop the full 64-bit constant directly into the register, and since in our disassembled code the value of this constant is zero, we will have to refer to the relocation table for the answer. In it, we will find the absolute relocation R_X86_64_64
for the operand at the address 0x3f
, with the following value: placing the value of the symbol plus the summand back into the shift. In other words,rax
will contain an absolute address global_arr
.
What about the call function?
int t = global_func(argc);
24: 8b 45 ec mov -0x14(%rbp),%eax
27: 89 c7 mov %eax,%edi
29: b8 00 00 00 00 mov $0x0,%eax
2e: 48 ba 00 00 00 00 00 movabs $0x0,%rdx
35: 00 00 00
38: ff d2 callq *%rdx
3a: 89 45 fc mov %eax,-0x4(%rbp)
We already know what
movabs
follows the command call
that calls the function at rdx
. It is enough to look at the corresponding relocation to understand how similar it is to data access.
As you can see, the large code model does not make any assumptions about the size of the code and data sections, as well as about the final location of characters, it simply refers to characters through absolute 64-bit steps, a kind of "safe path". However, note how, compared to a small code model, a large model is forced to use an additional command when accessing each character. This is the price of security.
So, we met with you two completely opposite models: while the small code model assumes that everything fits into the bottom two gigabytes of memory, the large model assumes that nothing is impossible and any character can be anywhere in its entirety 64- bit address space. The trade-off between the two models is the middle code model.
Medium Code Model
As before, let's take a look at the translation of the quote from
man
gcc
:
-mcmodel=medium
: . . , -mlarge-data-threshold, bss . , .
Similar to the small code model, the middle model assumes that the entire code is arranged in two lower gigabytes. However, the data is divided into โsmall dataโ supposedly arranged in the lower two gigabytes and unlimited in memory โbig dataโ. Data falls into the large category when they exceed the limit, by definition, equal to 64 kilobytes.
It is also important to note that when working with an average code model for big data, by analogy with sections
.data
and .bss
, special sections are created: .ldata
and .lbss
. This is not so important in the prism of the topic of the current article, but I'm going to deviate from it a little. More details on this issue can be found in the ABI.
Now it becomes clear why those arrays appeared in the example.
_big
: they are needed by the middle model for interpreting the "big data" that they are, at a size of 200 kilobytes each. Below you can see the result of disassembly:
int main(int argc, const char* argv[])
{
15: 55 push %rbp
16: 48 89 e5 mov %rsp,%rbp
19: 48 83 ec 20 sub $0x20,%rsp
1d: 89 7d ec mov %edi,-0x14(%rbp)
20: 48 89 75 e0 mov %rsi,-0x20(%rbp)
int t = global_func(argc);
24: 8b 45 ec mov -0x14(%rbp),%eax
27: 89 c7 mov %eax,%edi
29: b8 00 00 00 00 mov $0x0,%eax
2e: e8 00 00 00 00 callq 33 <main+0x1e>
33: 89 45 fc mov %eax,-0x4(%rbp)
t += global_arr[7];
36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
3c: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr[7];
3f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
45: 01 45 fc add %eax,-0x4(%rbp)
t += global_arr_big[7];
48: 48 b8 00 00 00 00 00 movabs $0x0,%rax
4f: 00 00 00
52: 8b 40 1c mov 0x1c(%rax),%eax
55: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr_big[7];
58: 48 b8 00 00 00 00 00 movabs $0x0,%rax
5f: 00 00 00
62: 8b 40 1c mov 0x1c(%rax),%eax
65: 01 45 fc add %eax,-0x4(%rbp)
return t;
68: 8b 45 fc mov -0x4(%rbp),%eax
}
6b: c9 leaveq
6c: c3 retq
Pay attention to how the arrays are accessed: the arrays
_big
are accessed through the methods of the large code model, while the rest of the arrays are accessed through the methods of the small model. The function is also called using the small code model method, and relocations are so similar to the previous examples that I won't even demonstrate them.
The medium code model is a skillful trade-off between large and small models. It is unlikely that the program code will turn out to be too large [4], so only large chunks of data statically linked into it can move it beyond the two gigabyte limit, perhaps as part of some kind of voluminous table search. Since the middle code model filters out such large chunks of data and processes them in a special way, calls by the code of functions and small symbols will be as efficient as in the small code model. Only calls to large characters, by analogy with a large model, will require the code to use the full 64-bit method of the large model.
Small PIC Code Model
Now let's look at the PIC variants of the code models, and as before we start with the small model. [5] Below you can see an example of code compiled through a small PIC model:
int main(int argc, const char* argv[])
{
15: 55 push %rbp
16: 48 89 e5 mov %rsp,%rbp
19: 48 83 ec 20 sub $0x20,%rsp
1d: 89 7d ec mov %edi,-0x14(%rbp)
20: 48 89 75 e0 mov %rsi,-0x20(%rbp)
int t = global_func(argc);
24: 8b 45 ec mov -0x14(%rbp),%eax
27: 89 c7 mov %eax,%edi
29: b8 00 00 00 00 mov $0x0,%eax
2e: e8 00 00 00 00 callq 33 <main+0x1e>
33: 89 45 fc mov %eax,-0x4(%rbp)
t += global_arr[7];
36: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax
3d: 8b 40 1c mov 0x1c(%rax),%eax
40: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr[7];
43: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
49: 01 45 fc add %eax,-0x4(%rbp)
t += global_arr_big[7];
4c: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax
53: 8b 40 1c mov 0x1c(%rax),%eax
56: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr_big[7];
59: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
5f: 01 45 fc add %eax,-0x4(%rbp)
return t;
62: 8b 45 fc mov -0x4(%rbp),%eax
}
65: c9 leaveq
66: c3 retq
Relocations:
Relocation section '.rela.text' at offset 0x62ce8 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000002f 001600000004 R_X86_64_PLT32 0000000000000000 global_func - 4
000000000039 001100000009 R_X86_64_GOTPCREL 0000000000000000 global_arr - 4
000000000045 000300000002 R_X86_64_PC32 0000000000000000 .data + 1b8
00000000004f 001200000009 R_X86_64_GOTPCREL 0000000000000340 global_arr_big - 4
00000000005b 000300000002 R_X86_64_PC32 0000000000000000 .data + 31098
Since the differences between large and small data do not play any role in a small code model, we will focus on the points that are important when generating code through PIC: the differences between local (static) and global characters.
As you can see, there is no difference between the code generated for static arrays and the code in the non-PIC case. This is one of the advantages of the x64 architecture: thanks to IP-relative data access, we get a PIC as a bonus, at least until external access to symbols is required. All commands and relocations remain the same, so there is no need to process them again.
It is interesting to pay attention to global arrays: it is worth recalling that in PIC global data must pass through the GOT, because at some point they can be stored, or shared, by shared libraries [6]. Below you can see the code to access
global_arr
:
t += global_arr[7];
36: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax
3d: 8b 40 1c mov 0x1c(%rax),%eax
40: 01 45 fc add %eax,-0x4(%rbp)
The relocation we are interested in is
R_X86_64_GOTPCREL
: the position of the input of the symbol in the GOT plus the term, minus the shift for applying the relocation. In other words, the command is patching the relative offset between the RIP (next instruction) and the global_arr
slot reserved for the GOT. Thus, the actual address is placed rax
in the command by 0x36
address global_arr
. This step is followed by a reset of the reference to the address global_arr
plus an offset to its seventh element in eax
.
Now let's take a look at the function call:
int t = global_func(argc);
24: 8b 45 ec mov -0x14(%rbp),%eax
27: 89 c7 mov %eax,%edi
29: b8 00 00 00 00 mov $0x0,%eax
2e: e8 00 00 00 00 callq 33 <main+0x1e>
33: 89 45 fc mov %eax,-0x4(%rbp)
It has a relocation of the operand
callq
address 0x2e
, R_X86_64_PLT32
: PLT entry address for the symbol plus term negative shift for the application of relocation. In other words, the callq
PLT must correctly call trampoline for global_func
.
Note what implicit assumptions the compiler makes: that GOT and PLT can be accessed through RIP-relative addressing. This will be important when comparing this model with other PIC variants of code models.
Large PIC Code Model
Disassembling:
int main(int argc, const char* argv[])
{
15: 55 push %rbp
16: 48 89 e5 mov %rsp,%rbp
19: 53 push %rbx
1a: 48 83 ec 28 sub $0x28,%rsp
1e: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx
25: 49 bb 00 00 00 00 00 movabs $0x0,%r11
2c: 00 00 00
2f: 4c 01 db add %r11,%rbx
32: 89 7d dc mov %edi,-0x24(%rbp)
35: 48 89 75 d0 mov %rsi,-0x30(%rbp)
int t = global_func(argc);
39: 8b 45 dc mov -0x24(%rbp),%eax
3c: 89 c7 mov %eax,%edi
3e: b8 00 00 00 00 mov $0x0,%eax
43: 48 ba 00 00 00 00 00 movabs $0x0,%rdx
4a: 00 00 00
4d: 48 01 da add %rbx,%rdx
50: ff d2 callq *%rdx
52: 89 45 ec mov %eax,-0x14(%rbp)
t += global_arr[7];
55: 48 b8 00 00 00 00 00 movabs $0x0,%rax
5c: 00 00 00
5f: 48 8b 04 03 mov (%rbx,%rax,1),%rax
63: 8b 40 1c mov 0x1c(%rax),%eax
66: 01 45 ec add %eax,-0x14(%rbp)
t += static_arr[7];
69: 48 b8 00 00 00 00 00 movabs $0x0,%rax
70: 00 00 00
73: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax
77: 01 45 ec add %eax,-0x14(%rbp)
t += global_arr_big[7];
7a: 48 b8 00 00 00 00 00 movabs $0x0,%rax
81: 00 00 00
84: 48 8b 04 03 mov (%rbx,%rax,1),%rax
88: 8b 40 1c mov 0x1c(%rax),%eax
8b: 01 45 ec add %eax,-0x14(%rbp)
t += static_arr_big[7];
8e: 48 b8 00 00 00 00 00 movabs $0x0,%rax
95: 00 00 00
98: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax
9c: 01 45 ec add %eax,-0x14(%rbp)
return t;
9f: 8b 45 ec mov -0x14(%rbp),%eax
}
a2: 48 83 c4 28 add $0x28,%rsp
a6: 5b pop %rbx
a7: c9 leaveq
a8: c3 retq
Relocations: This time around, the differences between big and small data still don't matter, so we'll focus on and . But first you need to pay attention to the prolog in this code, previously we did not encounter this:
Relocation section '.rela.text' at offset 0x62c70 contains 6 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000027 00150000001d R_X86_64_GOTPC64 0000000000000000 _GLOBAL_OFFSET_TABLE_ + 9
000000000045 00160000001f R_X86_64_PLTOFF64 0000000000000000 global_func + 0
000000000057 00110000001b R_X86_64_GOT64 0000000000000000 global_arr + 0
00000000006b 000800000019 R_X86_64_GOTOFF64 00000000000001a0 static_arr + 0
00000000007c 00120000001b R_X86_64_GOT64 0000000000000340 global_arr_big + 0
000000000090 000900000019 R_X86_64_GOTOFF64 0000000000031080 static_arr_big + 0
static_arr
global_arr
1e: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx
25: 49 bb 00 00 00 00 00 movabs $0x0,%r11
2c: 00 00 00
2f: 4c 01 db add %r11,%rbx
Below you can read the translation of the related quote from the ABI:
( GOT) AMD64 IP- . GOT . GOT , AMD64 ISA 32 .
Let's take a look at how the prolog described above calculates the GOT address. First, the command at the address
0x1e
loads its own address into the rbx
. Then, together with the relocation, R_X86_64_GOTPC64
an absolute 64-bit step is performed in r11
. This relocation means the following: take the address of the GOT, subtract the shifted shift and add the term. Finally, the command at address 0x2f
adds both results together. The result is the GOT's absolute address rbx
. [7]
Why bother calculating the GOT address? Firstly, as noted in the quote, in a large code model, we cannot assume that a 32-bit RIP-relative shift will be enough for GOT addressing, which is why we need a full 64-bit address. Second, we still want to work with the PIC variation, so we cannot simply put the absolute address in a register. Rather, the address itself must be computed relative to the RIP. For this, we need a prologue: it performs a 64-bit RIP-relative calculation.
Anyway, since we
rbx
now have a GOT address, let's take a look at how to access static_arr
:
t += static_arr[7];
69: 48 b8 00 00 00 00 00 movabs $0x0,%rax
70: 00 00 00
73: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax
77: 01 45 ec add %eax,-0x14(%rbp)
The relocation of the first command is
R_X86_64_GOTOFF64
: the symbol plus the minus GOT term. In our case, this is the relative offset between the address static_arr
and the GOT address. The following instruction adds the result to rbx
(absolute GOT address) and resets the offset by reference 0x1c
. For ease of visualization of such a calculation, a pseudo-C example can be found below:
// char* static_arr
// char* GOT
rax = static_arr + 0 - GOT; // rax now contains an offset
eax = *(rbx + rax + 0x1c); // rbx == GOT, so eax now contains
// *(GOT + static_arr - GOT + 0x1c) or
// *(static_arr + 0x1c)
Note an interesting point: the GOT address is used as a binding to
static_arr
. Usually a GOT does not contain a symbol address, and since it is static_arr
not an external symbol, there is no reason to store it inside a GOT. However, in this case, the GOT is used as a binding to the relative symbol address of the data section. This address, which, among other things, is independent of location, can be found with a full 64-bit shift. The linker is able to handle this relocation, so there is no need to modify the section of code at load time.
But what about
global_arr
?
t += global_arr[7];
55: 48 b8 00 00 00 00 00 movabs $0x0,%rax
5c: 00 00 00
5f: 48 8b 04 03 mov (%rbx,%rax,1),%rax
63: 8b 40 1c mov 0x1c(%rax),%eax
66: 01 45 ec add %eax,-0x14(%rbp)
This code is slightly longer, and the relocation is different from the usual one. In fact, GOT is used in a more traditional way: relocation
R_X86_64_GOT64
for movabs
only tells the function to place the shift in the GOT where the rax
address is located global_arr
. The command at the address 0x5f
takes the address global_arr
from the GOT and puts it in rax
. The following command resets the link to global_arr[7]
and places the value in eax
.
Now let's take a look at the code link for
global_func
. Recall that in a large code model, we could not make assumptions about the size of the code sections, so we should assume that even to access the PLT we need an absolute 64-bit address:
int t = global_func(argc);
39: 8b 45 dc mov -0x24(%rbp),%eax
3c: 89 c7 mov %eax,%edi
3e: b8 00 00 00 00 mov $0x0,%eax
43: 48 ba 00 00 00 00 00 movabs $0x0,%rdx
4a: 00 00 00
4d: 48 01 da add %rbx,%rdx
50: ff d2 callq *%rdx
52: 89 45 ec mov %eax,-0x14(%rbp)
The relocation we are interested in is
R_X86_64_PLTOFF64
: the PLT address of the input for global_func
minus the GOT address. The result is placed in rdx
where it is then placed rbx
(absolute GOT address). As a result, we get an address for the PLT entry global_func
in rdx
.
Note that again the GOT is used as a binding, this time to provide an address-independent reference to the shift of the input PLT.
Average PIC Code Model
Finally, we will analyze the code generated for the average PIC model:
int main(int argc, const char* argv[])
{
15: 55 push %rbp
16: 48 89 e5 mov %rsp,%rbp
19: 53 push %rbx
1a: 48 83 ec 28 sub $0x28,%rsp
1e: 48 8d 1d 00 00 00 00 lea 0x0(%rip),%rbx
25: 89 7d dc mov %edi,-0x24(%rbp)
28: 48 89 75 d0 mov %rsi,-0x30(%rbp)
int t = global_func(argc);
2c: 8b 45 dc mov -0x24(%rbp),%eax
2f: 89 c7 mov %eax,%edi
31: b8 00 00 00 00 mov $0x0,%eax
36: e8 00 00 00 00 callq 3b <main+0x26>
3b: 89 45 ec mov %eax,-0x14(%rbp)
t += global_arr[7];
3e: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax
45: 8b 40 1c mov 0x1c(%rax),%eax
48: 01 45 ec add %eax,-0x14(%rbp)
t += static_arr[7];
4b: 8b 05 00 00 00 00 mov 0x0(%rip),%eax
51: 01 45 ec add %eax,-0x14(%rbp)
t += global_arr_big[7];
54: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax
5b: 8b 40 1c mov 0x1c(%rax),%eax
5e: 01 45 ec add %eax,-0x14(%rbp)
t += static_arr_big[7];
61: 48 b8 00 00 00 00 00 movabs $0x0,%rax
68: 00 00 00
6b: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax
6f: 01 45 ec add %eax,-0x14(%rbp)
return t;
72: 8b 45 ec mov -0x14(%rbp),%eax
}
75: 48 83 c4 28 add $0x28,%rsp
79: 5b pop %rbx
7a: c9 leaveq
7b: c3 retq
Relocations:
Relocation section '.rela.text' at offset 0x62d60 contains 6 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000021 00160000001a R_X86_64_GOTPC32 0000000000000000 _GLOBAL_OFFSET_TABLE_ - 4
000000000037 001700000004 R_X86_64_PLT32 0000000000000000 global_func - 4
000000000041 001200000009 R_X86_64_GOTPCREL 0000000000000000 global_arr - 4
00000000004d 000300000002 R_X86_64_PC32 0000000000000000 .data + 1b8
000000000057 001300000009 R_X86_64_GOTPCREL 0000000000000000 global_arr_big - 4
000000000063 000a00000019 R_X86_64_GOTOFF64 0000000000030d40 static_arr_big + 0
First, let's remove the function call. Similar to the small model, in the middle model we assume that the code references do not exceed the limits of the 32-bit RIP shift, therefore, the code for the call is
global_func
completely similar to the same code in the small PIC model, as well as for small data arrays static_arr
and global_arr
. Therefore, we will focus on big data arrays, but first let's talk about the prologue: here it differs from the prologue of a large data model.
1e: 48 8d 1d 00 00 00 00 lea 0x0(%rip),%rbx
This is the whole prologue: in order to use the relocation to
R_X86_64_GOTPC32
put the GOT address in rbx
, it took only one team (compared to three in the large model). What's the difference? The fact is that since in the middle model the GOT is not part of the "big data sections", we assume that it is available within the 32-bit shift. In the large model, we could not make such assumptions, and had to use full 64-bit shift.
Of interest is the fact that the access code
global_arr_big
is similar to the same code in the small PIC model. This is for the same reason that the middle model prologue is shorter than the large model prologue: we assume that the GOT is available within 32-bit RIP relative addressing. Indeed, to the veryglobal_arr_big
you cannot get such access, but this case still covers the GOT, since in fact global_arr_big
it is located in it, moreover, in the form of a full 64-bit address.
The situation, however, is different for
static_arr_big
:
t += static_arr_big[7];
61: 48 b8 00 00 00 00 00 movabs $0x0,%rax
68: 00 00 00
6b: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax
6f: 01 45 ec add %eax,-0x14(%rbp)
This case is similar to the large PIC model of the code, because here we still get the absolute address of the character, which is not in the GOT itself. Since this is a large symbol, which cannot be assumed to be in the lower two gigabytes, we, as in the large model, require a 64-bit PIC shift.
Notes:
[1] Do not confuse code models with 64-bit data models and Intel memory models , these are all different topics.
[2] It is important to remember: the compiler creates the commands themselves, and the addressing modes are fixed at this step. The compiler cannot know which programs or shared libraries the object module will fall into, some may be small, while others may be large. The linker knows the size of the final program, but it's too late: the linker can only patch the shift of commands with relocation, and not change the commands themselves. Thus, the "convention" of the code model must be "signed" by the programmer at compile time.
[3] If something remains unclear, check out the next article .
[4] However, volumes are gradually increasing. When I last checked the Debug + Asserts build of Clang, it almost reached one gigabyte, thanks in large part to the autogenerated code.
[5] If you still do not know how PIC works (both in general and, in particular, for x64 architecture), it's time to familiarize yourself with the following articles on the topic: once and twice .
[6] Thus, the linker cannot resolve the links on its own, and has to shift the GOT processing to the dynamic loader.
[7] 0x25 - 0x7 + GOT - 0x27 + 0x9 = GOT