In case you missed it, yesterday the HPC community announced the "world's fastest computer" (based on Linpack) is based on the Fujitsu A64FX, which is the first to implement the ARM Scalable Vector Extension (SVE).
It has 7,299,072 cores, 48 cores per chip, or ~160,000 chips running at 28 MegaWatts and delivering 415 PetaFLOPS/s. (For 16b arithmetic useful in some ML app, it exceeds 1 ExaFLOPS/s.)
Moreover, at ~15 GFLOPS/s/Watt, it is number 9 on the Green 500.
Most of the rest of the top 10 in the Top500 and in the Green500 use GPUs, which may not go down in history as the pinnacle of architectural excellence.