Hardware video transcoders for YouTube server farms



Two Argos ASICs under a giant aluminum heatsink on a full-size PCI-E card



Google engineers shared information about the Argos project. This is a new type of device - a hardware video encoder or video coding unit (VCU), in the spirit of the current trend towards specializing chips like GPU GPUs and TPU Tensor Processors.



For more information on the Argos design, see the research paper "Warehouse-scale video acceleration: co-design and deployment in the wild" published for the ASPLOS 2021 conference in April 2021 (doi: 10.1145 / 3445814.3446723, pdf ).



Hardware VCUs help you quickly transcode videos to over ten formats supported on YouTube for smoother viewing and bandwidth savings. YouTube's servers are under such a huge strain that the development of proprietary YouTube chips quickly paid off. Naturally, the devices will not go on sale widely, but will continue to be exclusively used in Google data centers. It was only thanks to the publication at the scientific conference that the public learned that Google has been using these unique devices for more than a year.



The authors of the scientific article write that the Argos chip increased the computing efficiency "up to 20-33 times compared to our previous optimized system that ran software on traditional servers."



The VCU package is a full-size PCI-E card, very much like a GPU. Judging by the photo of the prototype (on the KDPV), the card has a separate 8-pin power connector, because the power from the motherboard via PCI-E is not enough to power the VCU.





VCU microcircuit under a microscope, full-size photo





Arrangement of elements on a VCU microcircuit



There are 10 “coding cores” on the chip, and the documentation says that “all other elements are made of ready-made IP-blocks”. One core encodes a 2160p stream in real time, up to 60 FPS.



Maps are specially designed for Google data centers. Each YouTube compute cluster contains an isolated VCU section with new cards, eliminating the need to install them on all servers. VCUs are specially designed after GPU accelerators to fit into existing server unit connectors and trays.



At the moment, "thousands of Argos VCU devices" are already running in Google data centers. It is thanks to them that 4K video on YouTube “is available in a few hours, and not in a few days, as it was before,” says one of the developers of the system in a comment for CNET.



The table compares the performance and cost of ownership of a server setup versus running the algorithm on Intel Skylake chips and Nvidia T4 GPUs.





Comparison of performance of hardware transcoders Intel Skylake, Nvidia T4 and Argos VPU



Today, YouTube generates about a third of Internet traffic in the world. When the service launched in 2005, keeping it up and running was considered an impossible task . Google actually saved the money-losing startup by buying it in 2006 for $ 1.65 billion , and has been actively trying to reduce maintenance costs ever since. To do this, Google had to reformat the structure of the Internet by installing cache servers at ISPs around the world.



Today, YouTube's main infrastructural problem is delivering videos to the user in the highest possible quality for their device and bandwidth. This means a choice of multiple codecs and frame sizes, which requires real-time transcoding. For example, for one specific device, 8K video is available in nine resolutions: 144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 2160p, and native 4320p (8K).



Some of them are also encoded with different codecs. The company wants to offer video in the most advanced and efficient codec, but it is not supported or slows down on legacy mobile devices.





Compare legacy H.264 (left) with modern vp9 (right), the H.264 frame , vp9 frame





Comparison of the perceived quality of H.264 and VP9 video streams according to PSNR tests



For modern devices, effective VP9 video is usually used, and for older devices, H.264 support is retained. Other video codecs have not been disclosed, but Google is talking about support for "low-res clamshell phones." That is, it can be assumed that pre-H.264 codecs such as 3GP are supported for ancient devices.



All videos in the YouTube database constantly have to be re-encoded as new, more efficient codecs become available. It is profitable for Google to do this in order to save traffic.



It is difficult to estimate the total number of videos on YouTube. The company only publishes vague growth figures (like "500 hours of video are uploaded every minute"). We are probably talking about exabytes.





Video transcoding was almost completely switched to hardware transcoders 12 months after their installation.



An additional load on the VCU is imposed by YouTube Live broadcasts, where transcoding is carried out to all formats live with a delay of no more than 100 ms. Additional workload comes from Google Drive and Google Photos.



Codecs are so important to YouTube's success that since the purchase of the service, Google actually bears the bulk of the burden of developing new codecs. In 2009, she bought On2 Technologies (their VP6 codec was used for Flash video in the first version of YouTube), and has been releasing new versions continuously since then. After VP8 and VP9, ​​the next will be AV1, on which great hopes are pinned.



A new version of the transcoder with hardware support for AV1 has been developed for AV1. According to CNET , second-generation chips are already being phased into Google's server farms.



See also:




All Articles