How the server starts up: UEFI



Previously, we have already analyzed the server startup sequence using the legacy Legacy as an example. Now is the time to get to know UEFI better.



The first version of what is now known as the Unified Extensive Firmware Interface (UEFI) was developed in the 90s of the last millennium specifically for Intel® Itanium® systems and was called the Intel Boot Initiative, and later - EFI.



The desire to "update" the boot process was expected. PC-BIOS , now called Legacy, offers to work in 16-bit real mode, addresses only 1 MB of RAM, and the bootloader together with the partition table must be located in the first 512 bytes of the drive. Moreover, PC-BIOS transfers control to the first found bootloader without the possibility of returning back. In this case, the handling of cases with several operating systems is placed on the shoulders of the bootloader.



The bootloader size limitation dictates the use of the Master Boot Record (MBR) markup, which appeared in 1983. The MBR is not standardized, but many vendors are “traditional”. MBR has serious limitations: only 4 partitions are supported by default and the storage capacity is no more than 2.2 TB.



In December 2000, the first widespread EFI specification was released under version 1.02. Five years later, Intel transferred EFI to the UEFI Forum, adding Unified to the title to highlight the change. The UEFI specification is publicly available and consists of several documents:



  • ACPI Specification;
  • UEFI Specification;
  • UEFI Shell Specification;
  • UEFI Platform Initialization Specification;
  • UEFI Platform Initialization Distribution Packaging Specification.


The fun begins in the UEFI Platform Initialization Specification , which describes all the phases of loading the platform.



UEFI is universal, but in this article we will rely on the standard, looking towards processors on the x86_64 architecture.



Wake up, Neo!



UEFI Boot Phase Sequence ( UEFI Platform Initialization Specification source )

After platform power-up is initiated, the power supply waits until the transients are completed, and then sets the signal to the Power_Good line. And the first to start operating is not the central processor, but the autonomous subsystem Intel® Management Engine (ME) or AMD Secure Technology (ST) similar to it. This subsystem performs its own operations, and then prepares and launches the first core of a single processor, called the Bootstrap Processor (BSP) .

In accordance with the accepted terminology, the core / processor thread will hereinafter be referred to as the bootstrap processor or application processor.
As in Legacy, the processor starts executing the first instruction at the end of the address space at 0xFFFFFFF0 . This instruction is a jump to the first phase of platform initialization - SEC.



SEC (Security) phase



In this phase, the following tasks should be solved:



  • handling the enable event;
  • initializing enough memory for the next phase;
  • establishing the root of trust in the system;
  • transfer of necessary information and control to the next phase.


The x86_64 processors start in 16-bit real mode , and during the initial initialization, the BSP is placed in 32-bit protected mode . Then the microcode of all available processors is updated.



Next is the processing of the enable event. This means the aggregation of information about the state of the equipment so that in the next phase some modules can draw conclusions about the "health" and the general state of the platform.



During the SEC phase, no RAM initialization occurs. Instead, the free processor cache is marked as non-flushable and is converted to temporary RAM. This mode is called no-eviction mode (NEM)... A stack is created in the allocated memory, which will allow modules from the next phases to use stack programming languages ​​before initializing the main RAM.



Further, all application processors (Application Processors, AP) are initialized with a special sequence of inter-processor interrupts (Inter-Processor Interrupt, IPI) sent to them. The Init IPI sequence - Start-up IPI - wakes up the application processor and starts the Built-In Self-Test (BIST) on it . Test results are recorded and passed on for analysis.



At the end of the Security phase, you need to find the Boot Firmware Volume (BFV) section, on which the executable code of the next phase is located, as well as, if possible, find other, minor, sections with the code (Firmware Volume, FV).



To justify the name of the Security phase and become a root of trust, during the execution of this phase, the code to which we plan to transfer control can be checked for unauthorized changes and malicious parts of the program.



At the end of the SEC execution, the following information is collected:



  • size and address of Boot Firmware Volume (BFV);
  • the size and addresses of other Firmware Volumes (FV);
  • size and address of temporary RAM;
  • size and address of the stack.


Then the next stage begins - Pre EFI Initialization.



PEI (Pre EFI Initialization) phase



The PEI Phase on a SuperMicro Motherboard

The purpose of the Pre EFI Initialization phase is to gather information about the connected devices and prepare the minimum required amount of hardware to start the full initialization process.



By design, the PEI phase should be lightweight, since the processor cache memory is limited. In addition, the PEI phase can recover from a failure, so there is a need to place the PEI phase code in a more resilient storage.



This phase consists of a core called the PEI Foundation and the PEI Module (PEIM) plugins . The central part of the kernel is the module manager, PEI Dispatcher, which controls the order of execution of modules, and also organizes inter-module interaction (PEIM-to-PEIM Interface, PPI).



Note that the SEC phase was executed from the flash memory on the motherboard, and only at the beginning of PEI, the executable code necessary for this phase is copied into temporary RAM.



Next comes the PEI Dispatcher. It launches PEI modules in a specific order: first, modules without dependencies, then dependent on the first, and so on until the modules run out.



The architecture of the PEI phase allows you to develop your own modules that can transfer the results of their activities to the next phase. Information transfer occurs through a special Hand-off Block (HOB) data structure .



During the launch of PEI modules, note the following:



  • CPU PEIM - processor initialization;
  • Platform PEIM - initialization of the North (including Memory Controller Hub) and South (I / O Controller Hub) bridges;
  • Memory Initialization PEIM - initialization of the main RAM and transfer of data from temporary memory to RAM.


Previously, the inclusion was received from the SEC phase. If the power on event is S3 Resume , then S3 BootScript is executed next , which restores the saved state of the processors and all connected devices, and then transfers control directly to the OS.

The S3 (Suspend to RAM) state is a sleep state in which the processor and part of the chipset are shut down with a loss of context. Upon awakening from this state, the processor starts executing as if it were a normal power-on. But instead of full initialization and passing all tests, the system is limited to restoring the state of all devices.
When started from any other state, control is transferred to the Driver Execution Environment phase.



DXE (Driver eXecution Environment) phase



DXE

Phase AHCI Initialization The Driver Execution Environment (DXE) phase is focused on initializing the remaining devices. By the time the DXE phase starts, the processor and main memory are ready for work, and the DXE drivers are not subject to strict resource limits.



Similar to the PEI Foundation, this phase has its own core - the DXE Foundation . The kernel creates the necessary interfaces and loads three kinds of DXE services:



  • UEFI Boot Services - boot time services;
  • UEFI Runtime Services - runtime services;
  • DXE Services are special services required by the DXE core.


After the services are initialized, the DXE Dispatcher starts working . It finds and loads DXE drivers, which, in turn, complete the hardware initialization.

In UEFI, there is no dedicated phase where hardware passes POST (Power-On Self-Test). Instead, each PEI and DXE phase module conducts its own set of tests and communicates this via POST codes to the user and via HOBs in the following phases.
Among the many loaded drivers on x86_64 processors, it is worth paying attention to the System Management Mode Init (SMM Init) driver. This driver prepares everything for System Management Mode (SMM) to work . SMM is a special privileged mode that allows you to suspend the execution of the current code (including the operating system) and execute the program from the protected area of SMRAM in its own context.

SMM is unofficially considered to be the -2 protection ring . The OS kernel runs on ring 0, and the more restricted protection rings are numbered from 1 to 3. Officially, ring zero is considered the most privileged. However, a hardware-virtualized hypervisor is conventionally called ring -1, and Intel ME and AMD ST are called ring -3.
Additionally, we note the Compatibility Support Module (CSM) , which ensures compatibility with Legacy and allows you to boot OS without UEFI support. We'll look at this module in more detail later.



After initializing all equipment, it is time to select a boot device.



BDS (Boot Device Select) phase



The Boot Device Select phase implements the UEFI application boot policy. Although this is a separate phase, all services, including the dispatcher, created during the DXE phase remain available.



The purpose of the BDS phase is to accomplish the following tasks:



  • initialization of console devices;
  • search for devices from which you can boot;
  • an attempt to boot from found devices in order of priority.


PCIe BIOS of the LSI add-

in card The Boot Manager looks for bootable areas on devices. Some expansion cards, such as network cards and RAID controllers, may have their own "BIOS" called Option ROM , or OpROM . The contents of the OpROM devices are started immediately after detection, and after execution, control returns to the Boot Manager.



All partitions containing download areas are stored in the boot manager's memory and are ordered according to the boot order. If no application is found, Boot Manager can call the DXE manager, in case the manager has loaded additional drivers during the search and new devices may "open" to the boot manager.



As noted earlier, using the Master Boot Record markup imposes restrictions on the size of partitions and their number on the drive, and also causes certain inconveniences in the maintenance of several operating systems. The solution to all these problems is part of the UEFI specification - GUID Partition Table.



GPT (GUID Partition Table)



GUID Partition Table is a standardized partition table layout format that replaces the legacy MBR.



First, GPT uses Logical Block Addressing (LBA) instead of Cylinder, Head, Sector (CHS) addressing. Changing the addressing method allows GPT to work with drives up to 9.4 ZB (9.4 * 10 21 bytes) versus 2.2 TB for MBR.



Secondly, the partition table has undergone changes, and now you can create up to 2 64 partitions within a single drive , although operating systems support no more than 128 in the case of Microsoft Windows and 256 in the case of Linux.



Thirdly, each section has its own type identifier, which describes the purpose of the section. For example, the identifier C12A7328-F81F-11D2-BA4B-00A0C93EC93B uniquely points to an EFI System Partition (ESP) from which Boot Manager can try to load an application.



During the development of GPT, compatibility with the MBR was not spared. Disk utilities might not recognize the GPT disk and wipe it. To avoid this, during GPT partitioning, the first 512 bytes are filled with Protective MBR (Protective MBR) - a partition from one partition for the entire drive with the system identifier 0xEE. This approach allows UEFI to understand that it is not a real MBR in front of it, but old software without GPT support - to see a partition with data of an unknown type.



GPT has ditched the boot area in favor of ESP partitions that are recognized as bootable. Boot Manager collects information about all ESPs on the disk, which allows you to have multiple bootloaders on the drive without conflicts, one for each ESP.



Loading the operating system



After polling all devices and looking for boot areas, Boot Manager starts booting in boot priority order. In general, control is transferred to the UEFI application, which starts executing its logic. However, for systems with Legacy mode compatibility, there may be an MBR in the boot area list and you will have to go to the CSM, the compatibility support module.



The CSM allows you to run operating systems that do not support UEFI. To load such operating systems, the CSM module emulates the environment in which the "classic" operating system falls:



  • loads the Legacy driver;
  • loads Legacy BIOS;
  • puts video output in Legacy compatible mode;
  • Creates data structures required for Legacy in memory that are not available in UEFI;
  • loads the CompatibilitySmm driver for SMM to work in Legacy.


Recall that in Legacy mode, the OS starts up in 16-bit mode, while in UEFI everything works in 32-bit mode. CSM starts the Legacy bootloader in 16-bit mode and provides communication with 32-bit UEFI drivers as needed.



RT phase (Run Time)



The start of loading the OS or Legacy boot loader leads to the beginning of the Run Time phase. In this phase, all DXE services (except UEFI Runtime Services) are no longer available.



The content of the RT phase can vary. There may be an OS loader familiar from Legacy - for example, GRUB2 or Windows Boot Manager, which puts the processor in 64-bit mode and starts the OS. But there can be independent applications or just the kernel of the operating system.



The Linux kernel starting from version 3.3, if the CONFIG_EFI_STUB flag is present, turns into a regular UEFI application and can be launched from UEFI without using third-party boot loaders.



As in the case of the Legacy, the bootloader or the kernel itself needs to put the processor into 64-bit mode, load all drivers, configure the scheduler and run init. Init, in turn, starts processes in user space, after which the OS login window appears.



Conclusion



Booting to UEFI is a more complex, but standardized, and largely universal process. The similarities with Legacy are observed only in general terms, and the devil, as you know, is in the details.



How soon do you think it will be possible to completely leave Legacy?

Write your opinion in the comments.



All Articles