Understanding x64 architecture code models

โ€œWhat code model should I use?โ€ - a frequently arising, but infrequently discussed question when writing code for the x64 architecture. However, this is a rather interesting problem, and it is useful to have an understanding of the code models to understand the x64 machine code generated by compilers. In addition, for those who are worried about performance down to the smallest instructions, the choice of the code model affects optimization as well.



Information on this topic on the web, or elsewhere, is rare. The most important of the available resources is the official x64 ABI, you can download it here (hereinafter it will be referred to as the "ABI"). Some of the information can also be found on the man-pagesgcc... The goal of this article is to provide accessible recommendations on the topic, discuss related issues, and also demonstrate some concepts through the code used in the work with good examples.



Important Note: This article is not intended to be a tutorial for beginners. Before acquaintance, it is recommended to have a strong command of C and assembler, as well as a basic acquaintance with the x64 architecture.






See also our previous post on a similar topic: How x86_x64 addresses memory






Code Models. Motivational part



In the x64 architecture, both code and data are referenced through command-relative (or, using x64 jargon, RIP-relative) addressing models. In these commands, the shift from RIP is limited to 32 bits, however, there may be cases when the team, when trying to address part of the memory or data, simply does not have a 32-bit shift, for example, when working with programs more than two gigabytes.



One way to solve this problem is to completely abandon the RIP-relative addressing mode in favor of a full 64-bit shift for all data and code references. However, this step is very costly: to cover the (rather rare) case of incredibly large programs and libraries, even the simplest operations within the entire code will require more instructions than usual.



Thus, code models become a compromise. [1] A code model is a formal agreement between the programmer and the compiler in which the programmer specifies his intentions about the size of the expected program (or programs) that will contain the object module currently being compiled. [2] The code models are needed so that the programmer can tell the compiler: "Don't worry, this object module will only go into small programs, so you can use fast RIP-relative addressing modes." On the other hand, it may tell the compiler the following: "we are going to link this module into large programs, so please use the leisurely and safe absolute addressing modes with full 64-bit shift."



What this article will talk about



We will talk about the two scenarios described above, a small code model and a large code model: the first model tells the compiler that a 32-bit relative shift should be enough for all references to the code and data in the object module; the second insists that the compiler use absolute 64-bit addressing modes. In addition, there is also an intermediate version, the so-called middle code model .



Each of these code models is presented in independent PIC and non-PIC variations, and we will talk about each of the six.



Original example in C



To demonstrate the concepts discussed in this article, I will use the following C program and compile it with various code models. As you can see, the function maingets access to four different global arrays and one global function. Arrays differ in two parameters: size and visibility. Size is important to explain the average code model and will not be needed when working with small and large models. Visibility is important for the operation of PIC code models and is either static (visible only in the source file) or global (visibility to all objects linked into the program).



int global_arr[100] = {2, 3};
static int static_arr[100] = {9, 7};
int global_arr_big[50000] = {5, 6};
static int static_arr_big[50000] = {10, 20};

int global_func(int param)
{
    return param * 10;
}

int main(int argc, const char* argv[])
{
    int t = global_func(argc);
    t += global_arr[7];
    t += static_arr[7];
    t += global_arr_big[7];
    t += static_arr_big[7];
    return t;
}


gccuses the code model as an option value -mcmodel. In addition, a -fpicPIC compilation can be set with a flag .



An example of compilation into an object module through a large code model using PIC:



> gcc -g -O0 -c codemodel1.c -fpic -mcmodel=large -o codemodel1_large_pic.o


Small code model



Translation of a quote from man gcc on the small code model:



-mcmodel = small

Code generation for a small model: the program and its symbols must be linked in the bottom two gigabytes of the address space. Pointers are 64 bits in size. Programs can be built both statically and dynamically. This is the basic code model.




In other words, the compiler can safely assume that the code and data are accessible via a 32-bit RIP relative offset from any command in the code. Let's take a look at a disassembled example of a C program that we compiled through a non-PIC small code model:



> objdump -dS codemodel1_small.o
[...]
int main(int argc, const char* argv[])
{
  15: 55                      push   %rbp
  16: 48 89 e5                mov    %rsp,%rbp
  19: 48 83 ec 20             sub    $0x20,%rsp
  1d: 89 7d ec                mov    %edi,-0x14(%rbp)
  20: 48 89 75 e0             mov    %rsi,-0x20(%rbp)
    int t = global_func(argc);
  24: 8b 45 ec                mov    -0x14(%rbp),%eax
  27: 89 c7                   mov    %eax,%edi
  29: b8 00 00 00 00          mov    $0x0,%eax
  2e: e8 00 00 00 00          callq  33 <main+0x1e>
  33: 89 45 fc                mov    %eax,-0x4(%rbp)
    t += global_arr[7];
  36: 8b 05 00 00 00 00       mov    0x0(%rip),%eax
  3c: 01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr[7];
  3f: 8b 05 00 00 00 00       mov    0x0(%rip),%eax
  45: 01 45 fc                add    %eax,-0x4(%rbp)
    t += global_arr_big[7];
  48: 8b 05 00 00 00 00       mov    0x0(%rip),%eax
  4e: 01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr_big[7];
  51: 8b 05 00 00 00 00       mov    0x0(%rip),%eax
  57: 01 45 fc                add    %eax,-0x4(%rbp)
    return t;
  5a: 8b 45 fc                mov    -0x4(%rbp),%eax
}
  5d: c9                      leaveq
  5e: c3                      retq


As you can see, access to all arrays is organized in the same way - using the RIP-relative shift. However, the shift in the code is 0, because the compiler does not know where the data segment will be placed, therefore, for each such access, it creates a relocation:



> readelf -r codemodel1_small.o

Relocation section '.rela.text' at offset 0x62bd8 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000002f  001500000002 R_X86_64_PC32     0000000000000000 global_func - 4
000000000038  001100000002 R_X86_64_PC32     0000000000000000 global_arr + 18
000000000041  000300000002 R_X86_64_PC32     0000000000000000 .data + 1b8
00000000004a  001200000002 R_X86_64_PC32     0000000000000340 global_arr_big + 18
000000000053  000300000002 R_X86_64_PC32     0000000000000000 .data + 31098


Let's completely decode access to global_arr. The disassembled segment we are interested in is:



  t += global_arr[7];
36:       8b 05 00 00 00 00       mov    0x0(%rip),%eax
3c:       01 45 fc                add    %eax,-0x4(%rbp)


RIP-relative addressing is relative to the next command, so the shift must be patched into the command movso that it corresponds to 0x3s. We are interested in the second relocation, R_X86_64_PC32it points to the operand movat the address 0x38and means the following: we take the value of the symbol, add the term and subtract the shift indicated by the relocation. If you have calculated everything correctly, you will see how the result will place a relative shift between the next command and global_arr, plus 01. Since it 01means "the seventh int in the array" (in the x64 architecture the size of each intis 4 bytes), then we need this relative shift. Thus, using RIP-relative addressing, the command correctly references global_arr[7].



It is also interesting to note the following: although the access commands static_arrhere are similar, its redirection uses a different symbol, thereby pointing to a section instead of a specific symbol .data. This is due to the actions of the linker, it places a static array in a known place in the section, and thus the array cannot be used in conjunction with other shared libraries. As a result, the linker will resolve the situation with this relocation. On the other hand, since it global_arrcan be used (or overwritten) by another shared library, the already dynamic loader will have to deal with the link to global_arr. [3]



Finally, let's take a look at the reference to global_func:



  int t = global_func(argc);
24:       8b 45 ec                mov    -0x14(%rbp),%eax
27:       89 c7                   mov    %eax,%edi
29:       b8 00 00 00 00          mov    $0x0,%eax
2e:       e8 00 00 00 00          callq  33 <main+0x1e>
33:       89 45 fc                mov    %eax,-0x4(%rbp)


Since the operand is callqalso RIP-relative, relocation R_X86_64_PC32works here in the same way as placing the actual relative offset to global_func in the operand.



In conclusion, we note that due to the small code model, the compiler perceives all the data and code of the future program as accessible through a 32-bit shift, and thereby creates simple and efficient code to access all kinds of objects.



Big code model



Translation of quotes from man gccon the topic of a large code model:



-mcmodel = large

Generating code for a large model: This model makes no assumptions about addresses and section sizes.


An example of disassembled code maincompiled with a non-PIC large model:



int main(int argc, const char* argv[])
{
  15: 55                      push   %rbp
  16: 48 89 e5                mov    %rsp,%rbp
  19: 48 83 ec 20             sub    $0x20,%rsp
  1d: 89 7d ec                mov    %edi,-0x14(%rbp)
  20: 48 89 75 e0             mov    %rsi,-0x20(%rbp)
    int t = global_func(argc);
  24: 8b 45 ec                mov    -0x14(%rbp),%eax
  27: 89 c7                   mov    %eax,%edi
  29: b8 00 00 00 00          mov    $0x0,%eax
  2e: 48 ba 00 00 00 00 00    movabs $0x0,%rdx
  35: 00 00 00
  38: ff d2                   callq  *%rdx
  3a: 89 45 fc                mov    %eax,-0x4(%rbp)
    t += global_arr[7];
  3d: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  44: 00 00 00
  47: 8b 40 1c                mov    0x1c(%rax),%eax
  4a: 01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr[7];
  4d: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  54: 00 00 00
  57: 8b 40 1c                mov    0x1c(%rax),%eax
  5a: 01 45 fc                add    %eax,-0x4(%rbp)
    t += global_arr_big[7];
  5d: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  64: 00 00 00
  67: 8b 40 1c                mov    0x1c(%rax),%eax
  6a: 01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr_big[7];
  6d: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  74: 00 00 00
  77: 8b 40 1c                mov    0x1c(%rax),%eax
  7a: 01 45 fc                add    %eax,-0x4(%rbp)
    return t;
  7d: 8b 45 fc                mov    -0x4(%rbp),%eax
}
  80: c9                      leaveq
  81: c3                      retq


Again, it's useful to look at relocations:



Relocation section '.rela.text' at offset 0x62c18 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000030  001500000001 R_X86_64_64       0000000000000000 global_func + 0
00000000003f  001100000001 R_X86_64_64       0000000000000000 global_arr + 0
00000000004f  000300000001 R_X86_64_64       0000000000000000 .data + 1a0
00000000005f  001200000001 R_X86_64_64       0000000000000340 global_arr_big + 0
00000000006f  000300000001 R_X86_64_64       0000000000000000 .data + 31080


Since there is no need to make assumptions about the size of code sections and data, the large code model is fairly unified and identifies access to all data in the same way. Let's take another look at global_arr:



  t += global_arr[7];
3d:       48 b8 00 00 00 00 00    movabs $0x0,%rax
44:       00 00 00
47:       8b 40 1c                mov    0x1c(%rax),%eax
4a:       01 45 fc                add    %eax,-0x4(%rbp)


Two teams need to get the desired value from the array. The first command places an absolute 64-bit address in rax, which, as we will see shortly, turns out to be an address global_arr, while the second command loads a word from (rax) + 01into eax.



So let's focus on the team at 0x3d, movabsabsolute 64-bit version movin the x64 architecture. It can drop the full 64-bit constant directly into the register, and since in our disassembled code the value of this constant is zero, we will have to refer to the relocation table for the answer. In it, we will find the absolute relocation R_X86_64_64for the operand at the address 0x3f, with the following value: placing the value of the symbol plus the summand back into the shift. In other words,raxwill contain an absolute address global_arr.



What about the call function?



  int t = global_func(argc);
24:       8b 45 ec                mov    -0x14(%rbp),%eax
27:       89 c7                   mov    %eax,%edi
29:       b8 00 00 00 00          mov    $0x0,%eax
2e:       48 ba 00 00 00 00 00    movabs $0x0,%rdx
35:       00 00 00
38:       ff d2                   callq  *%rdx
3a:       89 45 fc                mov    %eax,-0x4(%rbp)


We already know what movabsfollows the command callthat calls the function at rdx. It is enough to look at the corresponding relocation to understand how similar it is to data access.



As you can see, the large code model does not make any assumptions about the size of the code and data sections, as well as about the final location of characters, it simply refers to characters through absolute 64-bit steps, a kind of "safe path". However, note how, compared to a small code model, a large model is forced to use an additional command when accessing each character. This is the price of security.



So, we met with you two completely opposite models: while the small code model assumes that everything fits into the bottom two gigabytes of memory, the large model assumes that nothing is impossible and any character can be anywhere in its entirety 64- bit address space. The trade-off between the two models is the middle code model.



Medium Code Model



As before, let's take a look at the translation of the quote from man gcc:



-mcmodel=medium

: . . , -mlarge-data-threshold, bss . , .


Similar to the small code model, the middle model assumes that the entire code is arranged in two lower gigabytes. However, the data is divided into โ€œsmall dataโ€ supposedly arranged in the lower two gigabytes and unlimited in memory โ€œbig dataโ€. Data falls into the large category when they exceed the limit, by definition, equal to 64 kilobytes.



It is also important to note that when working with an average code model for big data, by analogy with sections .dataand .bss, special sections are created: .ldataand .lbss. This is not so important in the prism of the topic of the current article, but I'm going to deviate from it a little. More details on this issue can be found in the ABI.



Now it becomes clear why those arrays appeared in the example._big: they are needed by the middle model for interpreting the "big data" that they are, at a size of 200 kilobytes each. Below you can see the result of disassembly:



int main(int argc, const char* argv[])
{
  15: 55                      push   %rbp
  16: 48 89 e5                mov    %rsp,%rbp
  19: 48 83 ec 20             sub    $0x20,%rsp
  1d: 89 7d ec                mov    %edi,-0x14(%rbp)
  20: 48 89 75 e0             mov    %rsi,-0x20(%rbp)
    int t = global_func(argc);
  24: 8b 45 ec                mov    -0x14(%rbp),%eax
  27: 89 c7                   mov    %eax,%edi
  29: b8 00 00 00 00          mov    $0x0,%eax
  2e: e8 00 00 00 00          callq  33 <main+0x1e>
  33: 89 45 fc                mov    %eax,-0x4(%rbp)
    t += global_arr[7];
  36: 8b 05 00 00 00 00       mov    0x0(%rip),%eax
  3c: 01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr[7];
  3f: 8b 05 00 00 00 00       mov    0x0(%rip),%eax
  45: 01 45 fc                add    %eax,-0x4(%rbp)
    t += global_arr_big[7];
  48: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  4f: 00 00 00
  52: 8b 40 1c                mov    0x1c(%rax),%eax
  55: 01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr_big[7];
  58: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  5f: 00 00 00
  62: 8b 40 1c                mov    0x1c(%rax),%eax
  65: 01 45 fc                add    %eax,-0x4(%rbp)
    return t;
  68: 8b 45 fc                mov    -0x4(%rbp),%eax
}
  6b: c9                      leaveq
  6c: c3                      retq


Pay attention to how the arrays are accessed: the arrays _bigare accessed through the methods of the large code model, while the rest of the arrays are accessed through the methods of the small model. The function is also called using the small code model method, and relocations are so similar to the previous examples that I won't even demonstrate them.



The medium code model is a skillful trade-off between large and small models. It is unlikely that the program code will turn out to be too large [4], so only large chunks of data statically linked into it can move it beyond the two gigabyte limit, perhaps as part of some kind of voluminous table search. Since the middle code model filters out such large chunks of data and processes them in a special way, calls by the code of functions and small symbols will be as efficient as in the small code model. Only calls to large characters, by analogy with a large model, will require the code to use the full 64-bit method of the large model.



Small PIC Code Model



Now let's look at the PIC variants of the code models, and as before we start with the small model. [5] Below you can see an example of code compiled through a small PIC model:



int main(int argc, const char* argv[])
{
  15:   55                      push   %rbp
  16:   48 89 e5                mov    %rsp,%rbp
  19:   48 83 ec 20             sub    $0x20,%rsp
  1d:   89 7d ec                mov    %edi,-0x14(%rbp)
  20:   48 89 75 e0             mov    %rsi,-0x20(%rbp)
    int t = global_func(argc);
  24:   8b 45 ec                mov    -0x14(%rbp),%eax
  27:   89 c7                   mov    %eax,%edi
  29:   b8 00 00 00 00          mov    $0x0,%eax
  2e:   e8 00 00 00 00          callq  33 <main+0x1e>
  33:   89 45 fc                mov    %eax,-0x4(%rbp)
    t += global_arr[7];
  36:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax
  3d:   8b 40 1c                mov    0x1c(%rax),%eax
  40:   01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr[7];
  43:   8b 05 00 00 00 00       mov    0x0(%rip),%eax
  49:   01 45 fc                add    %eax,-0x4(%rbp)
    t += global_arr_big[7];
  4c:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax
  53:   8b 40 1c                mov    0x1c(%rax),%eax
  56:   01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr_big[7];
  59:   8b 05 00 00 00 00       mov    0x0(%rip),%eax
  5f:   01 45 fc                add    %eax,-0x4(%rbp)
    return t;
  62:   8b 45 fc                mov    -0x4(%rbp),%eax
}
  65:   c9                      leaveq
  66:   c3                      retq


Relocations:



Relocation section '.rela.text' at offset 0x62ce8 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000002f  001600000004 R_X86_64_PLT32    0000000000000000 global_func - 4
000000000039  001100000009 R_X86_64_GOTPCREL 0000000000000000 global_arr - 4
000000000045  000300000002 R_X86_64_PC32     0000000000000000 .data + 1b8
00000000004f  001200000009 R_X86_64_GOTPCREL 0000000000000340 global_arr_big - 4
00000000005b  000300000002 R_X86_64_PC32     0000000000000000 .data + 31098


Since the differences between large and small data do not play any role in a small code model, we will focus on the points that are important when generating code through PIC: the differences between local (static) and global characters.



As you can see, there is no difference between the code generated for static arrays and the code in the non-PIC case. This is one of the advantages of the x64 architecture: thanks to IP-relative data access, we get a PIC as a bonus, at least until external access to symbols is required. All commands and relocations remain the same, so there is no need to process them again.



It is interesting to pay attention to global arrays: it is worth recalling that in PIC global data must pass through the GOT, because at some point they can be stored, or shared, by shared libraries [6]. Below you can see the code to access global_arr:



  t += global_arr[7];
36:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax
3d:   8b 40 1c                mov    0x1c(%rax),%eax
40:   01 45 fc                add    %eax,-0x4(%rbp)


The relocation we are interested in is R_X86_64_GOTPCREL: the position of the input of the symbol in the GOT plus the term, minus the shift for applying the relocation. In other words, the command is patching the relative offset between the RIP (next instruction) and the global_arrslot reserved for the GOT. Thus, the actual address is placed raxin the command by 0x36address global_arr. This step is followed by a reset of the reference to the address global_arrplus an offset to its seventh element in eax.



Now let's take a look at the function call:



  int t = global_func(argc);
24:   8b 45 ec                mov    -0x14(%rbp),%eax
27:   89 c7                   mov    %eax,%edi
29:   b8 00 00 00 00          mov    $0x0,%eax
2e:   e8 00 00 00 00          callq  33 <main+0x1e>
33:   89 45 fc                mov    %eax,-0x4(%rbp)


It has a relocation of the operand callqaddress 0x2e, R_X86_64_PLT32: PLT entry address for the symbol plus term negative shift for the application of relocation. In other words, the callqPLT must correctly call trampoline for global_func.



Note what implicit assumptions the compiler makes: that GOT and PLT can be accessed through RIP-relative addressing. This will be important when comparing this model with other PIC variants of code models.



Large PIC Code Model



Disassembling:



int main(int argc, const char* argv[])
{
  15: 55                      push   %rbp
  16: 48 89 e5                mov    %rsp,%rbp
  19: 53                      push   %rbx
  1a: 48 83 ec 28             sub    $0x28,%rsp
  1e: 48 8d 1d f9 ff ff ff    lea    -0x7(%rip),%rbx
  25: 49 bb 00 00 00 00 00    movabs $0x0,%r11
  2c: 00 00 00
  2f: 4c 01 db                add    %r11,%rbx
  32: 89 7d dc                mov    %edi,-0x24(%rbp)
  35: 48 89 75 d0             mov    %rsi,-0x30(%rbp)
    int t = global_func(argc);
  39: 8b 45 dc                mov    -0x24(%rbp),%eax
  3c: 89 c7                   mov    %eax,%edi
  3e: b8 00 00 00 00          mov    $0x0,%eax
  43: 48 ba 00 00 00 00 00    movabs $0x0,%rdx
  4a: 00 00 00
  4d: 48 01 da                add    %rbx,%rdx
  50: ff d2                   callq  *%rdx
  52: 89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr[7];
  55: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  5c: 00 00 00
  5f: 48 8b 04 03             mov    (%rbx,%rax,1),%rax
  63: 8b 40 1c                mov    0x1c(%rax),%eax
  66: 01 45 ec                add    %eax,-0x14(%rbp)
    t += static_arr[7];
  69: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  70: 00 00 00
  73: 8b 44 03 1c             mov    0x1c(%rbx,%rax,1),%eax
  77: 01 45 ec                add    %eax,-0x14(%rbp)
    t += global_arr_big[7];
  7a: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  81: 00 00 00
  84: 48 8b 04 03             mov    (%rbx,%rax,1),%rax
  88: 8b 40 1c                mov    0x1c(%rax),%eax
  8b: 01 45 ec                add    %eax,-0x14(%rbp)
    t += static_arr_big[7];
  8e: 48 b8 00 00 00 00 00    movabs $0x0,%rax
  95: 00 00 00
  98: 8b 44 03 1c             mov    0x1c(%rbx,%rax,1),%eax
  9c: 01 45 ec                add    %eax,-0x14(%rbp)
    return t;
  9f: 8b 45 ec                mov    -0x14(%rbp),%eax
}
  a2: 48 83 c4 28             add    $0x28,%rsp
  a6: 5b                      pop    %rbx
  a7: c9                      leaveq
  a8: c3                      retq


Relocations: This time around, the differences between big and small data still don't matter, so we'll focus on and . But first you need to pay attention to the prolog in this code, previously we did not encounter this:



Relocation section '.rela.text' at offset 0x62c70 contains 6 entries:

Offset Info Type Sym. Value Sym. Name + Addend

000000000027 00150000001d R_X86_64_GOTPC64 0000000000000000 _GLOBAL_OFFSET_TABLE_ + 9

000000000045 00160000001f R_X86_64_PLTOFF64 0000000000000000 global_func + 0

000000000057 00110000001b R_X86_64_GOT64 0000000000000000 global_arr + 0

00000000006b 000800000019 R_X86_64_GOTOFF64 00000000000001a0 static_arr + 0

00000000007c 00120000001b R_X86_64_GOT64 0000000000000340 global_arr_big + 0

000000000090 000900000019 R_X86_64_GOTOFF64 0000000000031080 static_arr_big + 0


static_arrglobal_arr



1e: 48 8d 1d f9 ff ff ff    lea    -0x7(%rip),%rbx
25: 49 bb 00 00 00 00 00    movabs $0x0,%r11
2c: 00 00 00
2f: 4c 01 db                add    %r11,%rbx


Below you can read the translation of the related quote from the ABI:



( GOT) AMD64 IP- . GOT . GOT , AMD64 ISA 32 .


Let's take a look at how the prolog described above calculates the GOT address. First, the command at the address 0x1eloads its own address into the rbx. Then, together with the relocation, R_X86_64_GOTPC64an absolute 64-bit step is performed in r11. This relocation means the following: take the address of the GOT, subtract the shifted shift and add the term. Finally, the command at address 0x2fadds both results together. The result is the GOT's absolute address rbx. [7]



Why bother calculating the GOT address? Firstly, as noted in the quote, in a large code model, we cannot assume that a 32-bit RIP-relative shift will be enough for GOT addressing, which is why we need a full 64-bit address. Second, we still want to work with the PIC variation, so we cannot simply put the absolute address in a register. Rather, the address itself must be computed relative to the RIP. For this, we need a prologue: it performs a 64-bit RIP-relative calculation.



Anyway, since we rbxnow have a GOT address, let's take a look at how to access static_arr:



  t += static_arr[7];
69:       48 b8 00 00 00 00 00    movabs $0x0,%rax
70:       00 00 00
73:       8b 44 03 1c             mov    0x1c(%rbx,%rax,1),%eax
77:       01 45 ec                add    %eax,-0x14(%rbp)


The relocation of the first command is R_X86_64_GOTOFF64: the symbol plus the minus GOT term. In our case, this is the relative offset between the address static_arrand the GOT address. The following instruction adds the result to rbx(absolute GOT address) and resets the offset by reference 0x1c. For ease of visualization of such a calculation, a pseudo-C example can be found below:



// char* static_arr
// char* GOT
rax = static_arr + 0 - GOT;  // rax now contains an offset
eax = *(rbx + rax + 0x1c);   // rbx == GOT, so eax now contains
                             // *(GOT + static_arr - GOT + 0x1c) or
                             // *(static_arr + 0x1c)


Note an interesting point: the GOT address is used as a binding to static_arr. Usually a GOT does not contain a symbol address, and since it is static_arrnot an external symbol, there is no reason to store it inside a GOT. However, in this case, the GOT is used as a binding to the relative symbol address of the data section. This address, which, among other things, is independent of location, can be found with a full 64-bit shift. The linker is able to handle this relocation, so there is no need to modify the section of code at load time.



But what about global_arr?



  t += global_arr[7];
55:       48 b8 00 00 00 00 00    movabs $0x0,%rax
5c:       00 00 00
5f:       48 8b 04 03             mov    (%rbx,%rax,1),%rax
63:       8b 40 1c                mov    0x1c(%rax),%eax
66:       01 45 ec                add    %eax,-0x14(%rbp)


This code is slightly longer, and the relocation is different from the usual one. In fact, GOT is used in a more traditional way: relocation R_X86_64_GOT64for movabsonly tells the function to place the shift in the GOT where the raxaddress is located global_arr. The command at the address 0x5ftakes the address global_arrfrom the GOT and puts it in rax. The following command resets the link to global_arr[7]and places the value in eax.



Now let's take a look at the code link for global_func. Recall that in a large code model, we could not make assumptions about the size of the code sections, so we should assume that even to access the PLT we need an absolute 64-bit address:



  int t = global_func(argc);
39: 8b 45 dc                mov    -0x24(%rbp),%eax
3c: 89 c7                   mov    %eax,%edi
3e: b8 00 00 00 00          mov    $0x0,%eax
43: 48 ba 00 00 00 00 00    movabs $0x0,%rdx
4a: 00 00 00
4d: 48 01 da                add    %rbx,%rdx
50: ff d2                   callq  *%rdx
52: 89 45 ec                mov    %eax,-0x14(%rbp)


The relocation we are interested in is R_X86_64_PLTOFF64: the PLT address of the input for global_funcminus the GOT address. The result is placed in rdxwhere it is then placed rbx(absolute GOT address). As a result, we get an address for the PLT entry global_funcin rdx.



Note that again the GOT is used as a binding, this time to provide an address-independent reference to the shift of the input PLT.



Average PIC Code Model



Finally, we will analyze the code generated for the average PIC model:



int main(int argc, const char* argv[])
{
  15:   55                      push   %rbp
  16:   48 89 e5                mov    %rsp,%rbp
  19:   53                      push   %rbx
  1a:   48 83 ec 28             sub    $0x28,%rsp
  1e:   48 8d 1d 00 00 00 00    lea    0x0(%rip),%rbx
  25:   89 7d dc                mov    %edi,-0x24(%rbp)
  28:   48 89 75 d0             mov    %rsi,-0x30(%rbp)
    int t = global_func(argc);
  2c:   8b 45 dc                mov    -0x24(%rbp),%eax
  2f:   89 c7                   mov    %eax,%edi
  31:   b8 00 00 00 00          mov    $0x0,%eax
  36:   e8 00 00 00 00          callq  3b <main+0x26>
  3b:   89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr[7];
  3e:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax
  45:   8b 40 1c                mov    0x1c(%rax),%eax
  48:   01 45 ec                add    %eax,-0x14(%rbp)
    t += static_arr[7];
  4b:   8b 05 00 00 00 00       mov    0x0(%rip),%eax
  51:   01 45 ec                add    %eax,-0x14(%rbp)
    t += global_arr_big[7];
  54:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax
  5b:   8b 40 1c                mov    0x1c(%rax),%eax
  5e:   01 45 ec                add    %eax,-0x14(%rbp)
    t += static_arr_big[7];
  61:   48 b8 00 00 00 00 00    movabs $0x0,%rax
  68:   00 00 00
  6b:   8b 44 03 1c             mov    0x1c(%rbx,%rax,1),%eax
  6f:   01 45 ec                add    %eax,-0x14(%rbp)
    return t;
  72:   8b 45 ec                mov    -0x14(%rbp),%eax
}
  75:   48 83 c4 28             add    $0x28,%rsp
  79:   5b                      pop    %rbx
  7a:   c9                      leaveq
  7b:   c3                      retq


Relocations:



Relocation section '.rela.text' at offset 0x62d60 contains 6 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000021  00160000001a R_X86_64_GOTPC32  0000000000000000 _GLOBAL_OFFSET_TABLE_ - 4
000000000037  001700000004 R_X86_64_PLT32    0000000000000000 global_func - 4
000000000041  001200000009 R_X86_64_GOTPCREL 0000000000000000 global_arr - 4
00000000004d  000300000002 R_X86_64_PC32     0000000000000000 .data + 1b8
000000000057  001300000009 R_X86_64_GOTPCREL 0000000000000000 global_arr_big - 4
000000000063  000a00000019 R_X86_64_GOTOFF64 0000000000030d40 static_arr_big + 0


First, let's remove the function call. Similar to the small model, in the middle model we assume that the code references do not exceed the limits of the 32-bit RIP shift, therefore, the code for the call is global_funccompletely similar to the same code in the small PIC model, as well as for small data arrays static_arrand global_arr. Therefore, we will focus on big data arrays, but first let's talk about the prologue: here it differs from the prologue of a large data model.



1e:   48 8d 1d 00 00 00 00    lea    0x0(%rip),%rbx


This is the whole prologue: in order to use the relocation to R_X86_64_GOTPC32put the GOT address in rbx, it took only one team (compared to three in the large model). What's the difference? The fact is that since in the middle model the GOT is not part of the "big data sections", we assume that it is available within the 32-bit shift. In the large model, we could not make such assumptions, and had to use full 64-bit shift.



Of interest is the fact that the access code global_arr_bigis similar to the same code in the small PIC model. This is for the same reason that the middle model prologue is shorter than the large model prologue: we assume that the GOT is available within 32-bit RIP relative addressing. Indeed, to the veryglobal_arr_bigyou cannot get such access, but this case still covers the GOT, since in fact global_arr_bigit is located in it, moreover, in the form of a full 64-bit address.



The situation, however, is different for static_arr_big:



  t += static_arr_big[7];
61:   48 b8 00 00 00 00 00    movabs $0x0,%rax
68:   00 00 00
6b:   8b 44 03 1c             mov    0x1c(%rbx,%rax,1),%eax
6f:   01 45 ec                add    %eax,-0x14(%rbp)


This case is similar to the large PIC model of the code, because here we still get the absolute address of the character, which is not in the GOT itself. Since this is a large symbol, which cannot be assumed to be in the lower two gigabytes, we, as in the large model, require a 64-bit PIC shift.



Notes:



[1] Do not confuse code models with 64-bit data models and Intel memory models , these are all different topics.



[2] It is important to remember: the compiler creates the commands themselves, and the addressing modes are fixed at this step. The compiler cannot know which programs or shared libraries the object module will fall into, some may be small, while others may be large. The linker knows the size of the final program, but it's too late: the linker can only patch the shift of commands with relocation, and not change the commands themselves. Thus, the "convention" of the code model must be "signed" by the programmer at compile time.



[3] If something remains unclear, check out the next article .



[4] However, volumes are gradually increasing. When I last checked the Debug + Asserts build of Clang, it almost reached one gigabyte, thanks in large part to the autogenerated code.



[5] If you still do not know how PIC works (both in general and, in particular, for x64 architecture), it's time to familiarize yourself with the following articles on the topic: once and twice .



[6] Thus, the linker cannot resolve the links on its own, and has to shift the GOT processing to the dynamic loader.



[7] 0x25 - 0x7 + GOT - 0x27 + 0x9 = GOT









All Articles