TOP500 has released the latest edition of its supercomputer rankings, which revealed that the Japanese Fugaku is the most powerful supercomputer in the world.
Riken and Fujitsu developed Fugaku, and the machine is based on Fujitsu’s custom ARM A64FX processor.
It has a High-Performance Linpack (HPL) benchmark score of 442 petaFLOPS per second, meaning it can perform 442 quadrillion floating-point operations per second.
There was one newcomer in the top 10 — the Perlmutter system at NERSC at the DOE Lawrence Berkeley National Laboratory.
The machine is based on the HPE Cray “Shasta” platform and a heterogeneous system with both GPU-accelerated and CPU-only nodes.
Perlmutter achieved 64.6 Pflop/s, putting the supercomputer at number 5 in the new list.
Outside of the top 10, there were a few instances of Microsoft Azure and Amazon EC2 Cloud clusters performing well.
Pioneer-EUS, the computer in the number 24 spot and Pioneer-WUS2 at number 27 use Azure. The Amazon EC2 Instance Cluster at number 41 uses Amazon EC2.
The latest supercomputer rankings revealed a marked increase in the use of AMD processors.
Perlmutter, for instance, uses AMD Epyc 7763 processors. At number 6 on the list, Selene also has an AMD processor using the AMD Epyc 7742.
Another point of interest is that this list saw fewer systems in China than expected.
Chinese machines accounted for 186 supercomputers on the TOP500 list. The previous edition of the list saw 212 machines out of China, which is a significant drop.
There also wasn’t much change in the variety of system interconnects.
Ethernet is still used in around half of the systems, Infiniband was around a third of the machines, OmniPath interconnects made up less than one-tenth, and only one system relied on Myrinet.
Custom interconnects accounted for 37 systems, while proprietary networks were found on six systems.
Below is a summary of the top 10 supercomputers in the world.
Fugaku has 7,630,848 cores which allowed it to achieve an HPL benchmark score of 442 Pflop/s. This puts it 3x ahead of the No. 2 system in the list.
Summit is an IBM-built system at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA. It is the fastest system in the U.S. with a performance of 148.8 Pflop/s.
Summit has 4,356 nodes, each housing two Power9 CPUs with 22 cores each and six Nvidia Tesla V100 GPUs, each with 80 streaming multiprocessors (SM). The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.
Sierra is a system at the Lawrence Livermore National Laboratory, CA, USA. It is built with 4,320 nodes with two Power9 CPUs and four Nvidia Tesla V100 GPUs. Sierra achieved 94.6 Pflop/s.
Sunway TaihuLight is a system developed by China’s National Research Center of Parallel Computer Engineering and Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi. It achieved 93 Pflop/s.
Perlmutter is based on the HPE Cray “Shasta” platform. It is a heterogeneous system with AMD Epyc-based nodes and 1536 Nvidia A100 accelerated nodes. Perlmutter achieved 64.6 Pflop/s.
Selene is an Nvidia DGX A100 SuperPOD installed in-house at Nvidia in the USA. The system is based on an AMD Epyc processor with Nvidia A100 for acceleration and a Mellanox HDR InfiniBand for the network. It achieved 63.4 Pflop/s.
Tianhe-2A (Milky Way-2A), a system developed by China’s National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzhou, China, achieved 61.4 Pflop/s.
JUWELS Booster Module is a BullSequana system build by Atos is installed at the Forschungszentrum Juelich (FZJ) in Germany. The system uses an AMD Epyc processor with Nvidia A100 for acceleration and a Mellanox HDR InfiniBand as a network similar to the Selene System. It is the most powerful system in Europe, with 44.1 Pflop/s.
HPC5 is a PowerEdge system build by Dell and installed by the Italian company Eni S.p.A. It achieves a performance of 35.5 Pflop/s due to using Nvidia Tesla V100 as accelerators and a Mellanox HDR InfiniBand as a network.
Frontera is a Dell C6420 system installed at the Texas Advanced Computing Center of the University of Texas. It achieved 23.5 Pflop/s using 448,448 of its Intel Xeon cores.