Introducing High Performance Computing (HPC) Optimized VM Image

image



We are pleased to announce the release of a public preview of a CentOS 7-based virtual machine (VM) image optimized for high performance computing (HPC). It is primarily designed for tightly coupled Message Passing Interface (MPI) workloads. This article details the HPC-specific VM image and its benefits. To get straight to creating instances from this image, read the  documentation and quickstart.



In 2020, we  talked about a number of functions and settings  to optimize the MPI interface on the Google Cloud platform. They reduce the  delay in messaging to a few microseconds. and provide delivery of small MPI messages in 10 microseconds or less. MPI optimization improves application scaling and increases the number of tasks that can be performed on the Google Cloud platform. However, to create a VM image with these techniques in mind, you need a deep understanding of Google Cloud systems and platform. Therefore, it is more logical to start working with an image originally calculated and prepared for high performance computing. It allows you to easily deploy a VM instance tuned for optimal CPU and network performance to Google Cloud. The HPC VM image is available in the  Google Cloud Marketplace at no additional cost. 



HPC VM Image Benefits Over Traditional VM Images



By choosing an HPC VM image, you get an out-of-the-box configuration and regular maintenance and the following HPC benefits on the Google Cloud platform:



  1. Easily create virtual machines tailored  for tightly coupled workloads . Easily create a VM for HPC and regularly update its configuration with the latest settings.

  2. Optimizing networks for tightly coupled systems.  Reduce latency for small messages and speed up applications that require point-to-point or shared communication.

  3. More efficient computing.  Improve performance on individual nodes by reducing system vibrations.

  4. Stable and reproducible operation of multiple nodes.  Apply settings that have been proven effective on a variety of HPC tasks.



The HPC VM image easily replaces the standard CentOS 7 based image.



Real-world example: Scaling SDPB Equation Program with CloudyCluster and HPC VM Image



Walter Londry of Caltech  Particle Theory Group  develops research software for the international Bootstrap Collaboration project  . The project uses a  semi-definite program for solving equations (SDPB) . With its help, quantum field theories are investigated in relation to a wide range of problems in theoretical physics, such as the expansion of the early Universe, superconductors, the quantum Hall effect and phase transitions.

To expand the computing power of the project, Londri decided to scale the SDPB program on the Google Cloud platform. Using  Omnibond CloudyCluster and a VM image for HPC, he was able to bring the project to levels of performance and scalability comparable to a local cluster at Yale, based on computers with Intel Xeon Gold 6240 processors and Infiniband FDR technology.



1.jpg



Instance C2-Standard-60 for Google Cloud uses 2nd Generation Intel Xeon Scalable Processors. C2 instances support placement rules that reduce latency between nodes, so they are great for tightly coupled MPI workloads. CloudyCluster natively includes an HPC VM image and placement rules for the C2 family, so researchers don't need to do anything extra. Tests have shown that Google Cloud can scale low latency workloads across multiple instances.

If you want to see this for yourself,  visit the Google Cloud MarketplaceAn updated version of Omnibond's CloudyCluster is available with an HPC VM image. This release also includes the Open OnDemand application  ,  which is distributed by the Ohio State Supercomputing Center and funded by NSF. It allows system administrators to easily provide web access to HPC resources.



High performance computing VM image capabilities 



Settings and optimization. The current HPC VM image focuses on tuning for tightly coupled workloads and uses the following MPI performance enhancements:



  • Intel Hyper-Threading. Intel Hyper-Threading . .

  • MPI. MPI MPI. MPI Intel, MPI.

  • tcp_*mem. C2 32 / TCP Linux.

  • busy polling. busy polling , , .

  • . , () , , , .

  • Disable Linux firewalls  and SELinux technology. SELinux engine and firewall, which are enabled by default for CentOS Linux images on Google Cloud, are not used in the HPC VM image. This improves the performance of MPI.

  • Disable the CPUIdle utility. The C2 virtual machines maintain a CPU idle state and can enter a low power mode. By disabling the CPUIdle utility, you can bring the latency to a consistently low level.



The effectiveness of these settings depends on the specific application. We recommend that you test them in practice to find the most powerful and economical configuration.



Comparative analysis of the effectiveness of images



We compared the performance of an HPC VM image and a CentOS 7 standard image using Intel MPI Benchmarks and real-world finite element analysis (ANSYS LS-DYNA), fluid dynamics (ANSYS Fluent), and weather (WRF) applications. 



In this section, the following versions of the HPC VM image and the CentOS image were taken for comparison:



  • HPC VM image : hpc-centos-7-v20210119 (--nomitigation and mpitune settings applied as recommended in  documentation )

  • CentOS image : centos-7-v20200811



Intel MPI Benchmark (IMB) Ping-Pong  - Used to measure the latency of a fixed-size message between two ranks across a pair of virtual machines. It turned out that when using a VM image for HPC, the latency is on average 50% less compared to the standard CentOS 7 image.



Test configuration:



  • 2 VM C2-standard-60 with compact placement rules

  • MPI Library : Intel MPI Library 2018 Update 4

  • Launch command: mpirun -genv I_MPI_PIN = 1 -genv I_MPI_PIN_PROCESSOR_LIST = 0 -hostfile <hostfile> -np 2 -ppn 1 IMB-MPI1 Pingpong -iter 50000



Results



2.jpg



The Intel MPI Benchmark (IMB) AllReduce test is used to measure the collective latency when transferring data between several ranks through a VM. It shrinks a fixed-length vector using the MPI_SUM operation. Results are shown for one PPN (process per node), where there is 1 MPI rank per node and 30 threads per rank, and results for 30 PPNs, when there are 30 MPI ranks per node and 1 thread per rank. Compared to the standard CentOS 7 image, the HPC VM image has been found to reduce AllReduce latency for 240 MPI ranks across 8 nodes (30 processes per node) by up to 40%.



Test configuration:



  • 8 VM C2-standard-60 with compact placement rules

  • MPI Library : Intel MPI Library 2018 Update 4

  • a : mpirun -tune -genv I_MPI_PIN=1 -genv I_MPI_FABRICS ‘shm:tcp’ -hostfile <hostfile> -np <#vm*ppn> -ppn <ppn> IMB-MPI1 AllReduce -iter 50000 -npmin <#vm*ppn>



Results



3.jpg



4.jpg



Tests HPC applications: LS-DYNA, Fluent and WRF. Using an HPC VM image at the application layer, up to 25% performance gains were observed over the 3-car collision simulation in ANSYS LS-DYNA analysis (240 MPI ranks were used in 8 Intel Xeon processor-based C2s) ... As part of ANSYS Fluent and WRF analysis, the HPC VM image provided a 6% performance improvement over the standard CentOS image.



Test configuration:



  • ANSYS LS-DYNA (“3 cars” model) : 8 VM C2-standard-60 with compact placement rules, using LS-DYNA MPP binary code compiled with AVX-2 

  • ANSYS Fluent ( “aircraft_wing_14m”): 12  C2-standard-60

  • WRF V3 Parallel Benchmark (12 KM CONUS): 16  C2-standard-60

  • MPI: Intel MPI Library 2018 ( 4)







5.jpg



? SchedMD Slurm Linux



We will be expanding the list of partner solutions that use the default HPC VM image. Starting next month, all Slurm customers will be able to run clusters with the default HPC VM image (preview available here ).



Good news for anyone looking for an enterprise version of Linux for high performance computing! SUSE is working with Google to develop a SUSE Enterprise HPC VM image optimized for the Google Cloud. If you would like more information or request other integrations and Linux distributions,  please contact us...



Get started today!



A preview of the HPC VM image is now available to all users in the  Google Cloud Marketplace  . For information on how to instantiate using an HPC VM image, see the  documentation and quickstart.   We also remind you that when you first register with Google Cloud: bonuses in the amount of $ 300 are available to you and more than 20 free products are always available. You can try GCP at the dedicated link .




Special thanks to colleagues Jiu Xiao Liu, Tanner Love, Yang Jian, Hong Bo Lu and Pallawi Feng for their help in preparing the material.



All Articles