Create EXE

Self-isolation is a great time to start something that takes a lot of time and effort. So I decided to do what I always wanted - write my own compiler.



Now he is able to build Hello World, but in this article I want to talk not about parsing and the internal structure of the compiler, but about such an important part as the byte-by-byte assembly of an exe file.



Start



Want a spoiler? Our program will be 2048 bytes.



Usually, working with exe files is to study or modify their structure. The executable files themselves are formed by the compilers, and this process seems a bit magical for developers.



But now we will try to fix it!



To build our program, we need any HEX editor (I personally used HxD).



Let's take pseudocode to start:



Source
func MessageBoxA(u32 handle, PChar text, PChar caption, u32 type) i32 ['user32.dll']
func ExitProcess(u32 code) ['kernel32.dll']

func main()
{
	MessageBoxA(0, 'Hello World!', 'MyApp', 64)
	ExitProcess(0)
}




The first two lines indicate functions imported from WinAPI libraries . The MessageBoxA function displays a dialog box with our text, and ExitProcess informs the system about the end of the program.

It makes no sense to consider the main function separately, since it uses the functions described above.



DOS Header



First, we need to generate the correct DOS Header, this is a header for DOS programs and should not affect the launch of exe under Windows.



I noted more or less important fields, the rest are filled with zeros.



IMAGE_DOS_HEADER structure
Struct IMAGE_DOS_HEADER
{
     u16 e_magic	// 0x5A4D	"MZ"
     u16 e_cblp		// 0x0080	128
     u16 e_cp		// 0x0001	1
     u16 e_crlc
     u16 e_cparhdr	// 0x0004	4
     u16 e_minalloc	// 0x0010	16
     u16 e_maxalloc	// 0xFFFF	65535
     u16 e_ss
     u16 e_sp		// 0x0140	320
     u16 e_csum		
     u16 e_ip
     u16 e_cs
     u16 e_lfarlc	// 0x0040	64
     u16 e_ovno
     u16[4] e_res
     u16 e_oemid
     u16 e_oeminfo
     u16[10] e_res2
     u32 e_lfanew	// 0x0080	128
}




Most importantly, this header contains the e_magic field, which means that this is an executable file, and e_lfanew, which indicates the offset of the PE header from the beginning of the file (in our file, this offset is 0x80 = 128 bytes).



Great, now that we know the DOS Header structure, let's write it to our file.



(1) RAW DOS Header (Offset 0x00000000)
4D 5A 80 00 01 00 00 00  04 00 10 00 FF FF 00 00
40 01 00 00 00 00 00 00  40 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 80 00 00 00










, , .



, (Offset) .



, 0x00000000, 64 (0x40 16- ), 0x00000040 ..

Done, the first 64 bytes are written. Now you need to add 64 more, this is the so-called DOS Stub (Stub). When launched from under DOS, it must notify the user that the program is not designed to run in this mode.



But in general, this is a small DOS program that prints a line and exits the program.

Let's write our Stub to a file and consider it in more detail.



(2) RAW DOS Stub (Offset 0x00000040)
0E 1F BA 0E 00 B4 09 CD  21 B8 01 4C CD 21 54 68
69 73 20 70 72 6F 67 72  61 6D 20 63 61 6E 6E 6F
74 20 62 65 20 72 75 6E  20 69 6E 20 44 4F 53 20
6D 6F 64 65 2E 0D 0A 24  00 00 00 00 00 00 00 00






And now the same code, but in disassembled form



Asm DOS Stub
0000	push cs			;  Code Segment(CS) (    )
0001	pop ds			;   Data Segment(DS) = CS
0002	mov dx, 0x0E	;     DS+DX,      $( ) 
0005	mov ah, 0x09	;   ( )
0007	int 0x21		;    0x21
0009	mov ax, 0x4C01	;   0x4C (  ) 
						;     0x01 ()
000c	int 0x21		;    0x21
000e	"This program cannot be run in DOS mode.\x0D\x0A$" ;  




It works like this: first, the stub prints a line stating that the program cannot be started, and then exits the program with code 1. Which is different from normal termination (Code 0).



The stub code may differ slightly (from compiler to compiler) I compared gcc and delphi, but the general meaning is the same.



It's also funny that the stub line ends with \ x0D \ x0D \ x0A $. Most likely the reason for this behavior is that c ++ opens the file in text mode by default. As a result, the character \ x0A is replaced with the sequence \ x0D \ x0A. As a result, we get 3 bytes: 2 bytes of carriage return (0x0D) which is meaningless, and 1 for line feed (0x0A). In binary mode (std :: ios :: binary), this substitution does not occur.



To check the correctness of writing the values, I will use Far with the ImpEx plugin:







NT Header



After 128 (0x80) bytes, we got to the NT header (IMAGE_NT_HEADERS64), which also contains the PE header (IMAGE_OPTIONAL_HEADER64). Despite the name IMAGE_OPTIONAL_HEADER64 is required, but different for x64 and x86 architectures.



IMAGE_NT_HEADERS64 structure
Struct IMAGE_NT_HEADERS64
{
	u32 Signature	// 0x4550 "PE"
	
	Struct IMAGE_FILE_HEADER 
	{
		u16 Machine	// 0x8664  x86-64
		u16 NumberOfSections	// 0x03     
		u32 TimeDateStamp		//   
		u32 PointerToSymbolTable
		u32 NumberOfSymbols
		u16 SizeOfOptionalHeader //  IMAGE_OPTIONAL_HEADER64 ()
		u16 Characteristics	// 0x2F 
	}
	
	Struct IMAGE_OPTIONAL_HEADER64
	{
		u16 Magic	// 0x020B      PE64
		u8 MajorLinkerVersion
		u8 MinorLinkerVersion
		u32 SizeOfCode
		u32 SizeOfInitializedData
		u32 SizeOfUninitializedData	
		u32 AddressOfEntryPoint	// 0x1000 
		u32 BaseOfCode	// 0x1000 
		u64 ImageBase	// 0x400000 
		u32 SectionAlignment	// 0x1000 (4096 )
		u32 FileAlignment	// 0x200
		u16 MajorOperatingSystemVersion	// 0x05	Windows XP
		u16 MinorOperatingSystemVersion	// 0x02	Windows XP
		u16 MajorImageVersion
		u16 MinorImageVersion
		u16 MajorSubsystemVersion	// 0x05	Windows XP
		u16 MinorSubsystemVersion	// 0x02	Windows XP
		u32 Win32VersionValue
		u32 SizeOfImage	// 0x4000
		u32 SizeOfHeaders // 0x200 (512 )
		u32 CheckSum
		u16 Subsystem	// 0x02 (GUI)  0x03 (Console)
		u16 DllCharacteristics
		u64 SizeOfStackReserve	// 0x100000
		u64 SizeOfStackCommit	// 0x1000
		u64 SizeOfHeapReserve	// 0x100000
		u64 SizeOfHeapCommit	// 0x1000
		u32 LoaderFlags
		u32 NumberOfRvaAndSizes // 0x16 
		
		Struct IMAGE_DATA_DIRECTORY [16] 
		{
			u32 VirtualAddress
			u32 Size
		}
	}
}




Let's see what is stored in this structure:



Description IMAGE_NT_HEADERS64
Signature β€” PE



IMAGE_FILE_HEADER x86 x64.



Machine β€” x64

NumberOfSections β€” ( )

TimeDateStamp β€”

SizeOfOptionalHeader β€” IMAGE_OPTIONAL_HEADER64, IMAGE_OPTIONAL_HEADER32.



Characteristics β€” , , (EXECUTABLE_IMAGE) 2 RAM (LARGE_ADDRESS_AWARE), ( ) (RELOCS_STRIPPED | LINE_NUMS_STRIPPED | LOCAL_SYMS_STRIPPED).



SizeOfCode β€” ( .text)

SizeOfInitializedData β€” ( .rodata)

SizeOfUninitializedData β€” ( .bss)

BaseOfCode β€”

SectionAlignment β€”

FileAlignment β€”

SizeOfImage β€”

SizeOfHeaders β€” (IMAGE_DOS_HEADER, DOS Stub, IMAGE_NT_HEADERS64, IMAGE_SECTION_HEADER[IMAGE_FILE_HEADER.NumberOfSections]) FileAlignment

Subsystem β€” GUI Console

MajorOperatingSystemVersion, MinorOperatingSystemVersion, MajorSubsystemVersion, MinorSubsystemVersion β€” exe, . 5.2 Windows XP (x64).

SizeOfStackReserve β€” . 1 , 1. Rust , C++ .

SizeOfStackCommit β€” 4 . .

SizeOfHeapReserve β€” . 1 .

SizeOfHeapCommit β€” 4 . SizeOfStackCommit, .



IMAGE_DATA_DIRECTORY β€” . , , 16 . .



, , . :

Export(0) β€” . DLL. .



Import(1) β€” DLL. VirtualAddress = 0x3000 Size = 0xB8. , .



Resource(2) β€” (, , ..)

.



Now that we have looked at what the NT header consists of, we will also write it to a file by analogy with the others at 0x80.



(3) RAW NT-Header (Offset 0x00000080)
50 45 00 00 64 86 03 00  F4 70 E8 5E 00 00 00 00
00 00 00 00 F0 00 2F 00  0B 02 00 00 3D 00 00 00
13 00 00 00 00 00 00 00  00 10 00 00 00 10 00 00
00 00 40 00 00 00 00 00  00 10 00 00 00 02 00 00
05 00 02 00 00 00 00 00  05 00 02 00 00 00 00 00
00 40 00 00 00 02 00 00  00 00 00 00 02 00 00 00
00 00 10 00 00 00 00 00  00 10 00 00 00 00 00 00
00 00 10 00 00 00 00 00  00 10 00 00 00 00 00 00
00 00 00 00 10 00 00 00  00 00 00 00 00 00 00 00
00 30 00 00 B8 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00




As a result, we get this kind of IMAGE_FILE_HEADER, IMAGE_OPTIONAL_HEADER64 and IMAGE_DATA_DIRECTORY headers:















Next, we describe all the sections of our application according to the IMAGE_SECTION_HEADER structure



IMAGE_SECTION_HEADER structure
Struct IMAGE_SECTION_HEADER
{
	i8[8] Name
	u32 VirtualSize
	u32 VirtualAddress
	u32 SizeOfRawData
	u32 PointerToRawData
	u32 PointerToRelocations
	u32 PointerToLinenumbers
	u16 NumberOfRelocations
	u16 NumberOfLinenumbers
	u32 Characteristics
}




Description of IMAGE_SECTION_HEADER
Name β€” 8 ,

VirtualSize β€”

VirtualAddress β€” SectionAlignment

SizeOfRawData β€” FileAlignment

PointerToRawData β€” FileAlignment

Characteristics β€” (, , , , .)



In our case, we will have 3 sections.



Why Virtual Address (VA) starts from 1000, and not from scratch, I don't know, but all compilers that I considered do this. As a result, 1000 + 3 sections * 1000 (SectionAlignment) = 4000 which we wrote in SizeOfImage. This is the total size of our program in virtual memory. Probably used to allocate space for a program in memory.



 Name	| RAW Addr	| RAW Size	| VA	| VA Size | Attr
--------+---------------+---------------+-------+---------+--------
.text	| 200		| 200		| 1000	| 3D	  |   CER
.rdata	| 400		| 200		| 2000	| 13	  | I   R
.idata	| 600		| 200		| 3000	| B8	  | I   R


Decoding of attributes:



I - Initialized data, initialized data

U - Uninitialized data, not initialized data

C - Code, contains executable code

E - Execute, allows executing

R - Read code , allows reading data from section

W - Write, allows writing data to section



.text (.code) - stores executable code (the program itself), CE attributes

.rdata (.rodata) - stores read-only data, for example, constants, strings, etc., IR attributes

.data - stores data that can be read and written, such as static or global variables. IRW Attributes

.bss - Stores non-initialized data such as static or global variables. In addition, this section usually has zero RAW size and non-zero VA Size, so it does not take up space in the file. URW

.idata attributes - a section containing functions imported from other libraries. IR Attributes



An important point, the sections must follow each other. Moreover, both in the file and in memory. At least when I changed their order arbitrarily, the program stopped running.



Now that we know what sections our program will contain, we will write them to our file. Here the offset ends at 8 and recording will start from the middle of the file.



(4) RAW Sections (Offset 0x00000188)
                         2E 74 65 78 74 00 00 00
3D 00 00 00 00 10 00 00  00 02 00 00 00 02 00 00
00 00 00 00 00 00 00 00  00 00 00 00 20 00 00 60
2E 72 64 61 74 61 00 00  13 00 00 00 00 20 00 00
00 02 00 00 00 04 00 00  00 00 00 00 00 00 00 00
00 00 00 00 40 00 00 40  2E 69 64 61 74 61 00 00
B8 00 00 00 00 30 00 00  00 02 00 00 00 06 00 00
00 00 00 00 00 00 00 00  00 00 00 00 40 00 00 40






The next write address will be 00000200 which corresponds to the SizeOfHeaders field of the PE-Header. If we added another section, which is plus 40 bytes, then our headers would not fit into 512 (0x200) bytes and would have to use 512 + 40 = 552 bytes aligned by FileAlignment, that is, 1024 (0x400) bytes. And everything that remains from 0x228 (552) to the address 0x400 needs to be filled with something, better of course with zeros.



Let's take a look at what a block of sections looks like in Far:







Next, we will write the sections themselves to our file, but there is one nuance.



As you can see from the SizeOfHeaders example, we can't just write the header and move on to the next section. Since in order to record a heading, we need to know how long all headings will take together. As a result, we need to either calculate in advance how much space is needed, or write empty (zero) values, and after writing all the headers, return and write down their actual size.



Therefore, programs are compiled in several passes. For example, the .rdata section comes after the .text section, while we cannot find out the virtual address of the variable in the .rdata, because if the .text section grows by more than 0x1000 (SectionAlignment) bytes, it will occupy addresses 0x2000 of the range. And accordingly, the .rdata section will no longer be located at 0x2000, but at 0x3000. And we will need to go back and recalculate the addresses of all variables in the .text section that comes before .rdata.



But in this case, I have already calculated everything, so we will immediately write down the blocks of code.



.Text section



Asm segment .text
0000	push rbp
0001	mov rbp, rsp
0004	sub rsp, 0x20
0008	mov rcx, 0x0
000F	mov rdx, 0x402000
0016	mov r8, 0x40200D
001D	mov r9, 0x40
0024	call QWORD PTR [rip + 0x203E]
002A	mov rcx, 0x0
0031	call QWORD PTR [rip + 0x2061]
0037	add rsp, 0x20
003B	pop rbp
003C	ret




Specifically for this program, the first 3 lines, exactly like the last 3, are not required.

The last 3 will not even be executed, since the program will exit at the second call function.



But let's say this, if it were not the main function, but a subfunction, it should be done this way.



But the first 3 in this case, although not necessary, are desirable. For example, if we did not use MessageBoxA, but printf, then without these lines we would get an error.



According to the calling convention for 64-bit MSDN systems, the first 4 parameters are passed in registers RCX, RDX, R8, R9. If they fit there and are not, for example, a floating point number. And the rest are passed through the stack.



In theory, if we pass 2 arguments to a function, then we must pass them through registers and reserve two places in the stack for them, so that, if necessary, the function can push the registers onto the stack. Also, we should not expect that these registers will be returned to us in their original state.



So the problem with the printf function is that if we pass only 1 argument to it, it will still overwrite all 4 places on the stack, although it seems to have to overwrite only one, by the number of arguments.



Therefore, if you do not want the program to behave strangely, always reserve at least 8 bytes * 4 arguments = 32 (0x20) bytes if you pass at least 1 argument to the function.



Consider a block of code with function calls



MessageBoxA(0, 'Hello World!', 'MyApp', 64)
ExitProcess(0)


First we pass our arguments:



rcx = 0

rdx = the absolute address of the string in memory ImageBase + Sections [". Rdata"]. VirtualAddress + Offset of the string from the beginning of the section, the string is read to byte zero

r8 = similar to the previous

r9 = 64 (0x40) MB_ICONINFORMATION , information icon



And then there is a call to the MessageBoxA function, with which everything is not so simple. The point is that compilers try to use the shortest possible commands. The smaller the instruction size, the more such instructions will fit into the processor's cache, respectively, there will be fewer cache misses, overloads, and the higher the speed of the program. For more information on commands and the inner workings of the processor, refer to the Intel 64 and IA-32 Architectures Software Developer's Manuals.



We could call the function at the full address, but that would take at least (1 opcode + 8 address = 9 bytes), and with a relative address, the call command takes only 6 bytes.



Let's take a closer look at this magic: rip + 0x203E is nothing more than a function call at the address specified by our offset.



I looked a little ahead and found out the addresses of the offsets we need. For MessageBoxA it is 0x3068 and for ExitProcess it is 0x3098.



It's time to turn magic into science. Every time an opcode hits the processor, it calculates its length and adds it to the current instruction address (RIP). Therefore, when we use RIP inside an instruction, this address indicates the end of the current instruction / the beginning of the next one.

For the first call, the offset will indicate the end of the call command, this is 002A. Do not forget that in memory this address will be at the offset Sections [". Text"]. VirtualAddress, i.e. 0x1000. Therefore, the RIP for our call will be 102A. The address we need for MessageBoxA is at 0x3068. Consider 0x3068 - 0x102A = 0x203E . For the second address, everything is the same as 0x1000 + 0x0037 = 0x1037, 0x3098 - 0x1037 = 0x2061 .



It is these offsets that we saw in the assembler commands.



0024	call QWORD PTR [rip + 0x203E]
002A	mov rcx, 0x0
0031	call QWORD PTR [rip + 0x2061]
0037	add rsp, 0x20


Let's write the .text section to our file, adding zeros to the address 0x400:



(5) RAW .text section (Offset 0x00000200-0x00000400)
55 48 89 E5 48 83 EC 20  48 C7 C1 00 00 00 00 48
C7 C2 00 20 40 00 49 C7  C0 0D 20 40 00 49 C7 C1
40 00 00 00 FF 15 3E 20  00 00 48 C7 C1 00 00 00
00 FF 15 61 20 00 00 48  83 C4 20 5D C3 00 00 00
........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00


4 . FileAlignment. 0x000003F0, 0x00000400, . 1024 , ! .




.Rdata section



This is perhaps the simplest section. We'll just put two lines here, adding zeros to 512 bytes.



.rdata
0400	"Hello World!\0"
040D	"MyApp\0"




(6) RAW .rdata section (Offset 0x00000400-0x00000600)
48 65 6C 6C 6F 20 57 6F  72 6C 64 21 00 4D 79 41
70 70 00 00 00 00 00 00  00 00 00 00 00 00 00 00
........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00




.Idata section



Well, here's the last section, which describes imported functions from libraries.



The first thing that awaits us is the new structure IMAGE_IMPORT_DESCRIPTOR



IMAGE_IMPORT_DESCRIPTOR structure
Struct IMAGE_IMPORT_DESCRIPTOR
{
	u32 OriginalFirstThunk (INT)
	u32 TimeDateStamp
	u32 ForwarderChain
	u32 Name
	u32 FirstThunk (IAT)
}




Description IMAGE_IMPORT_DESCRIPTOR
OriginalFirstThunk β€” , Import Name Table (INT)

Name β€” ,

FirstThunk β€” , Import Address Table (IAT)



First, we need to add 2 imported libraries. Recall:



func MessageBoxA(u32 handle, PChar text, PChar caption, u32 type) i32 ['user32.dll']
func ExitProcess(u32 code) ['kernel32.dll']


(7) RAW IMAGE_IMPORT_DESCRIPTOR (Offset 0x00000600)
58 30 00 00 00 00 00 00  00 00 00 00 3C 30 00 00
68 30 00 00 88 30 00 00  00 00 00 00 00 00 00 00
48 30 00 00 98 30 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00




We use 2 libraries, and to say that we have finished listing them. The last structure is filled with zeros.



 INT	| Time	 | Forward  | Name   | IAT
--------+--------+----------+--------+--------
0x3058	| 0x0    | 0x0      | 0x303C | 0x3068
0x3088	| 0x0    | 0x0      | 0x3048 | 0x3098
0x0000	| 0x0    | 0x0      | 0x0000 | 0x0000


Now let's add the names of the libraries themselves:



Library names
063	"user32.dll\0"
0648	"kernel32.dll\0"




(8) RAW library names (Offset 0x0000063C)
                                     75 73 65 72
33 32 2E 64 6C 6C 00 00  6B 65 72 6E 65 6C 33 32
2E 64 6C 6C 00 00 00 00




Next, let's describe the user32 library:



(9) RAW user32.dll (Offset 0x00000658)
                         78 30 00 00 00 00 00 00 
00 00 00 00 00 00 00 00  78 30 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 4D 65 73 73 61 67 
65 42 6F 78 41 00 00 00




The Name field of the first library points to 0x303C if we look a little higher, we will see that at the address 0x063C there is a library "user32.dll \ 0".



Hint, remember that the .idata section corresponds to file offset 0x0600 and memory offset 0x3000. For the first library, INT is 3058, which means it will be offset 0x0658 in the file. At this address, we see the entry 0x3078 and the second zero. Signifying the end of the list. 3078 refers to 0x0678 this is the RAW string



"00 00 4D 65 73 73 61 67 65 42 6F 78 41 00 00 00"



The first 2 bytes are of no interest to us and are equal to zero. And then there is a line with the name of the function, ending in zero. That is, we can represent it as "\ 0 \ 0MessageBoxA \ 0".



In this case, the IAT refers to a structure similar to the IAT table, but only the function addresses will be loaded into it when the program starts. For example, for the first entry 0x3068 in memory, there will be a value other than 0x0668 in the file. There will be the address of the MessageBoxA function loaded by the system to which we will refer through the call call in the program code.



And the last piece of the puzzle, the kernel32. And don't forget to add zeros to SectionAlignment.



(10) RAW kernel32.dll (Offset 0x00000688-0x00000800)
                         A8 30 00 00 00 00 00 00 
00 00 00 00 00 00 00 00  A8 30 00 00 00 00 00 00 
00 00 00 00 00 00 00 00  00 00 45 78 69 74 50 72 
6F 63 65 73 73 00 00 00  00 00 00 00 00 00 00 00 
........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00






We check that Far was able to correctly identify which functions we imported:







Great! Everything was fine, so now our file is ready to run.

Drumroll…



The final







Congratulations, we did it!



The file occupies 2 KB = Headers 512 bytes + 3 sections of 512 bytes.



The number 512 (0x200) is nothing more than the FileAlignment, which we specified in the header of our program.



Additionally:

If you want to go a little deeper, you can replace the inscription "Hello World!" to something else, just do not forget to change the address of the line in the program code (section .text). The address in memory is 0x00402000, but the file will have reverse byte order 00 20 40 00.



Or the quest is a little more complicated. Add another MessageBox call to the code. To do this, you will have to copy the previous call, and recalculate the relative address (0x3068 - RIP) in it.



Conclusion



The article turned out to be rather crumpled, it would, of course, consist of 3 separate parts: Headings, Program, Import Table.



If someone has compiled their exe, then my work was not in vain.



I'm thinking of creating an ELF file in a similar way soon, would such an article be interesting?)



Links:






All Articles