How DPUs are arranged, coprocessors for data processing



Dedicated ASICs for specific areas are one way to "restart" Moore's Law and overcome the limitations of general purpose CPUs. Now it is a very promising area for the development of microelectronics. Google, Amazon and other companies have their own projects. For example, Google makes Google TPU Tensor Processors , and Amazon data centers run AWS Graviton chips on the ARM core.



The former are ASICs for neural networks, while the latter are general-purpose 64-bit ARMs to optimize the price-performance ratio in compute-intensive workloads.



Another class of general-purpose ASICs, where active experiments have been going on recently, are specialized coprocessors for data processing (data processing unit, DPU), a kind of smart network cards (SmartNIC). Some examples of this species are Nvidia BlueField 2, Fungible and Pensando DSC-25.



What are they like? For what tasks are they suitable? Let's get a look.





What is SmartNIC



Conventional network cards (NICs) are built on a special purpose integrated circuit (ASIC), which is designed to operate as an Ethernet controller. Often these microcircuits are designed to perform secondary functions. For example, Mellanox ConnectX controllers also support the high-speed Infiniband protocol. These are great specialized chips, but their functionality cannot be changed.



Unlike simple network cards, SmartNIC allows the user to download additional software to the controller, that is, after purchasing the hardware. This expands or changes the functionality of the ASIC. The procedure is somewhat similar to buying a smartphone and installing various applications on it.



To make this possible, SmartNICs require more processing power and additional memory than conventional NICs. We are talking about more powerful multi-core ARM processors, the installation of specialized network processors (flow processing cores, FPC) and field programmable gate arrays (FPGA).





Xilinx Alveo U25



Schematic SmartNICs often have a separate ARM core for the control plane, some boards allow loading a modified Linux kernel. These dedicated ARM cores distribute the load across the rest of the compute modules, collect statistics and logs, and monitor the state of the SmartNIC. Direct network traffic does not pass through them.



For what tasks are DPUs suitable



Data coprocessors (DPUs) are a typical extension of SmartNICs that add NVMe or NVMe over Fabrics (NVMe-oF) functionality. Such a board allows you to unload the central processor, taking over all the I / O tasks.



For example, consider the SmartNIC device of the Broadcom NetXtreme-S BCM58800 microcontroller . It works as a programmable network card and supports (NVMe-oF).





Architecture of the Broadcom Stingray card based on the BCM58800 microcontroller



Broadcom Stingray has eight ARM v8 A72 cores at 3 GHz, arguably the highest clock speed of any ARM on any SmartNIC. The network card comes with up to 16GB DDR4 memory. Encryption up to 90 Gbps is supported on the hardware level and some data processing functions are supported: deduplication, which removes encoding from RAID 5 and RAID 6.



The diagram also shows the TruFlow accelerator. It is a Broadcom proprietary technology for hardware acceleration of network operations, including Open vSwitch (OvS) and more.



Nvidia BlueField 2



Nvidia has traditionally specialized in graphics accelerators, but this year it completed a $ 7 billion acquisition of specialized chip maker Mellanox, so it is now seriously targeting a new field - the HPC market in the data center.



Mellanox is one of the pioneers in the development of smart network cards, and the BlueField 2 board , which is marketed as a Data Processing Unit (DPU), is now considered the leading product .





Nvidia / Mellanox BlueField 2 Architecture



Key DPU Applications:



  • Virtual and hardware clouds.
  • NVMe storage in virtual machines.
  • Network Function Virtualization (NFV) applications.
  • Information security applications such as Deep Packet Inspection (DPI).
  • Microservers for edge computing




Nvidia / Mellanox BlueField 2



It features an array of eight ARM v8 A72 cores, a DDR4 memory controller and a dual port Ethernet or InfiniBand network adapter (two at 100 Gbps or one at 200 Gbps), plus specialized ASICs to accelerate various functions: regular expressions, SHA-2 hashing, etc.



Pensando



One of the new startups in the field of SmartNIC is Pensando, which offers the so-called Distributed Services Card on the market, Pensando DSC-25 (for corporate servers) and Pensando DSC-100 (for cloud providers).



Pensando DSC-25 and Pensando DSC-100



The main product is Pensando DSC-25. It is a card with one P4 (Capri) DPU for data processing, additional ARM cores and hardware accelerators for selected functions.





Pensando DSC-25 circuit



The main DPU and ARM cores are connected via a common interconnect bus to a PCIe controller and an array of RAM (up to 4 GB).

The individual hardware accelerators are referred to here as Service Processing Offloads. As with the Mellanox card, they handle encryption, disk processing, and other tasks.



Fungible





Fungible's high-level architecture



Another up-and- coming startup, Fungible, claims that it coined the term DPU in 2016. The company announces a processor called F1 DPU, but the actual architecture of these chips is unknown. Fungible can only demonstrate general schemes so far, as in the illustration above. Some experts have expressed their suspicion that Fungible is simply using the hype term DPU to attract venture capital investments. By the way, $ 500 million has already been invested in it at various rounds.



What's next?



There has been a lot of hype around the DPU concept lately. Not all companies that are trying to enter this market (Intel, Xilinx, and others) are mentioned in this review.



The fact is that the SmartNIC concept has been around for a long time, and large companies like Google and Amazon have developed and implemented their own internal solutions. At the same time, a market was formed, which was filled with third-party players.



The second generation FPGA-based SmartNIC is now emerging. User programmable gate array technology has matured to the point that it can now become the foundational technology for SmartNICs. A decade ago, the market was literally flooded with graphics accelerators - this was the first significant wave in hardware acceleration technology. Now that FPGAs have surpassed the three million logical block mark, these chips are tightly integrated with other building blocks for handling network traffic, memory, storage, and compute cores. SmartNIC and FPGA technologies complement each other perfectly.



Against this background, a second wave of hardware accelerators can be expected. And then a third element will be added to the CPU + GPU set - DPU. The data coprocessor will free server processors from infrastructure tasks. Research shows that in highly virtualized environments, network processes such as OvS transactions can consume over 30% of the CPU time on the host. Imagine disk operations, encryption, DPI, and complex routing done in a separate module. This will potentially remove a significant portion of the load from the CPU.

Startups like Pensando and Fungible have faced technology leaders like Xilinx, Intel, Broadcom, and Nvidia with their innovations. This is a technological competition that is always fun to watch.



All Articles