Supercomputing News and Discussions

weatheriscool
Posts: 13582
Joined: Sun May 16, 2021 6:16 pm

Re: Supercomputing News and Discussions

Post by weatheriscool »

User avatar
wjfox
Site Admin
Posts: 8942
Joined: Sat May 15, 2021 6:09 pm
Location: London, UK
Contact:

Re: Supercomputing News and Discussions

Post by wjfox »

NVIDIA's Eos supercomputer just broke its own AI training benchmark record

Wed, Nov 8, 2023, 5:00 PM GMT · 5 min read

Depending on the hardware you're using, training a large language model of any significant size can take weeks, months, even years to complete. That's no way to do business — nobody has the electricity and time to be waiting that long. On Wednesday, NVIDIA unveiled the newest iteration of its Eos supercomputer, one powered by more than 10,000 H100 Tensor Core GPUs and capable of training a 175 billion-parameter GPT-3 model on 1 billion tokens in under four minutes. That's three times faster than the previous benchmark on the MLPerf AI industry standard, which NVIDIA set just six months ago.

Eos represents an enormous amount of compute. It leverages 10,752 GPUs strung together using NVIDIA's Infiniband networking (moving a petabyte of data a second) and 860 terabytes of high bandwidth memory (36PB/sec aggregate bandwidth and 1.1PB sec interconnected) to deliver 40 exaflops of AI processing power. The entire cloud architecture is comprised of 1344 nodes — individual servers that companies can rent access to for around $37,000 a month to expand their AI capabilities without building out their own infrastructure.

In all, NVIDIA set six records in nine benchmark tests: the 3.9 minute notch for GPT-3, a 2.5 minute mark to train a Stable Diffusion model using 1,024 Hopper GPUs, a minute even to train DLRM, 55.2 seconds for RetinaNet, 46 seconds for 3D U-Net and the BERT-Large model required just 7.2 seconds to train.

https://www.engadget.com/nvidias-eos-su ... 42546.html


Image
NVIDIA
Tadasuke
Posts: 549
Joined: Tue Aug 17, 2021 3:15 pm
Location: Europe

Re: Supercomputing News and Discussions

Post by Tadasuke »

wjfox wrote: Thu Nov 09, 2023 9:50 pm Eos represents an enormous amount of compute. It leverages 10,752 GPUs strung together using NVIDIA's Infiniband networking (moving a petabyte of data a second) and 860 terabytes of high bandwidth memory (36PB/sec aggregate bandwidth and 1.1PB sec interconnected) to deliver 40 exaflops of AI processing power
It's just a number. I would like to see it actually bring results that would improve what matters.

Even if DLSS in video games could finally start working well, that would be something positive.

Or if our daily computers would finally be smart in a useful way, so they stop being frustrating.

Or if they managed to bring humanity some super awesome genetic engineering that works well.
Global economy doubles in product every 15-20 years. Computer performance at a constant price doubles nowadays every 4 years on average. Livestock-as-food will globally stop being a thing by ~2050 (precision fermentation and more). Human stupidity, pride and depravity are the biggest problems of our world.
weatheriscool
Posts: 13582
Joined: Sun May 16, 2021 6:16 pm

Re: Supercomputing News and Discussions

Post by weatheriscool »

China Stuns With New Homegrown Supercomputer Announcement
China isn't saying what kind of CPUs it's using, but it may have breached the exascale barrier with them.
By Josh Norem December 12, 2023
https://www.extremetech.com/computing/c ... nouncement
The modern world's supercomputers operate out in the open, as countries brag about their performance and enter them into standardized benchmarking competitions to prove their engineering chops. China doesn't play this game, however. Its entire supercomputer program is mostly kept secret because it's not supposed to have access to advanced technology. Despite its desire to keep its cards close to its vest, it recently announced a new supercomputer that could break the exascale barrier—all while using homegrown CPUs, which shouldn't be possible under the sanctions levied against it.

The new supercomputer is named Tianhe Xingyi, state news agency Xinhua reports (via Reuters). The release is unsurprisingly vague since China doesn't release numbers or hard info. It states only that it was built with "domestic advanced computing architecture, high-performance multi-core processors, high-speed interconnection networks, and large-scale storage." The release says that compared with Tianhe-2 (above), China has doubled many aspects of its performance. That's unsurprising, as Tianhe-2 first debuted on the Top500 list in 2013 and was the world's fastest supercomputer for several years after that, only being displaced by TaihuLight, another computer from China in 2016.
weatheriscool
Posts: 13582
Joined: Sun May 16, 2021 6:16 pm

Re: Supercomputing News and Discussions

Post by weatheriscool »

AMD to Build 2 New Supercomputers in Germany
One will use its new MI300, the other will be based on whatever AMD has ready in 2025.
By Josh Norem December 21, 2023
https://www.extremetech.com/computing/a ... in-germany
The proverbial paint is still dry on AMD's new Instinct MI300 chips, and yet the company has already said they're being used for a new supercomputer in Germany. AMD has announced "Exascale Supercomputing Is Coming to Stuttgart" and will build two computers: one that will upgrade an existing system to 39 PFLOPS and a future exascale machine similar to its current Frontier supercomputer. The two machines will be known as Hunter and Herder, with the former coming online in 2025 and the latter poised for a 2027 launch.

The two new supercomputers result from a new contract signed by the University of Stuttgart and Hewlett Packard Enterprise. It will see the organization upgrade its existing Hawk supercomputer and install a second system in the future at the HLRS, which is a research institute and supercomputer center in Stuttgart. The big news here is this is the first supercomputer contract for AMD's all-new MI300A chip, which combines a CPU, GPU, and high-bandwidth memory onto the same package. These data center "APUs" will go into Hawk, the center's current flagship supercomputer at 26 PFLOPS, which is nothing to sneeze at. This computer debuted at #16 on the Top500 list in 2020, so it's neither old nor slow. That said, we certainly understand the itch to upgrade a PC, so there's no shade coming from this direction.
User avatar
wjfox
Site Admin
Posts: 8942
Joined: Sat May 15, 2021 6:09 pm
Location: London, UK
Contact:

Re: Supercomputing News and Discussions

Post by wjfox »

Europe plans to build the world’s fastest supercomputer in 2024

Europe will get its first exascale supercomputer next year, called JUPITER, and it should allow simulations that are currently possible only on a few machines worldwide

28 December 2023

The first exascale computer in Europe, called JUPITER, should be completed next year, and it may even become the most powerful computer in the world. It will allow experiments and simulations currently only possible on a tiny number of machines in the US and China.

Exascale machines can carry out a billion billion operations per second, an exaflop. Currently, there are – officially – only two supercomputers in the world capable of those sorts of calculations.

https://www.newscientist.com/article/23 ... r-in-2024/


Image
The exascale supercomputer JUPITER will be hosted at the Jülich Supercomputing Centre in Germany
Credit: Forschungszentrum Jülich/Sascha Kreklau
weatheriscool
Posts: 13582
Joined: Sun May 16, 2021 6:16 pm

Re: Supercomputing News and Discussions

Post by weatheriscool »

China Developing 1.57 Exaflop Supercomputer With China Made CPU-GPU Chip
February 14, 2024 by Brian Wang
There are reports that China has a new superchip MT-3000 processor designed by the National University of Defense Technology (NUDT). The MT-3000 has general-purpose CPU cores, control cores, and matrix accelerator cores. NUDT’s MT-3000 processor features a multi-zone structure that packs 16 general-purpose CPU cores with 96 control cores and 1,536 accelerator cores.

The MT-3000 processor reportedly achieves 11.6 FP64 TFLOPS of peak performance and demonstrates a power efficiency of 45.4 GigaFLOPS/Watt at an operational frequency of 1.20 GHz.

The Tianhe-3 a new supercomputer reported to be able to reachi 1.57 ExaFLOPS on LINPACK benchmarks. Tianhe-3 would use the MT-3000 at its core. The top US supercomputer is the Frontier with 1.102 ExaFLOPS of performance.
https://www.nextbigfuture.com/2024/02/c ... -chip.html
weatheriscool
Posts: 13582
Joined: Sun May 16, 2021 6:16 pm

Re: Supercomputing News and Discussions

Post by weatheriscool »

Nvidia Unveils Its Eos Supercomputer for AI Training
It's already ranked as the ninth-fastest supercomputer in the world.
By Josh Norem February 16, 2024
https://www.extremetech.com/computing/n ... i-training
In November of last year, Nvidia raised a few eyebrows by suddenly appearing in the 9th spot on the Top500 list of the world's fastest supercomputers with a system named Eos. Named after the Greek Goddess who opened the gates of dawn every day, Eos is Nvidia's enterprise-scale system for AI training, and the company has now released a video showing it off to the public for the first time.

Eos is essentially Nvidia's very own supercomputer that its employees get to use every day for things like AI training and playing Crysis on their lunch breaks. It comprises a cluster of 576 DGX H100 servers, and since each one features eight H100 GPUs, there's a total of 4,608 H100s linked together with its Quantum-2 InfiniBand technology. It's basically Nvidia showing off an extreme version of its DGX SuperPod design, which is AI training at an enterprise scale, which it hopes to sell to companies with huge budgets and massive AI models to train.

Nvidia describes Eos as a system that can power an "AI factory," as it's a very large-scale SuperPod DGX H100 system. The company says it is what allows it to develop its own AI breakthroughs and shows the power of Nvidia's latest technology when scaled up to ludicrous size.

The DGX H100 servers use Intel Xeon Platinum 8480C CPUs, which feature 56 cores and 112 threads. Combined with the 4,608 H100 GPUs, it offers 121 PetaFLOPS of Linpack performance, which was only good enough for 9th on the Top500, but that's more of a generic metric. When measured purely for AI training, it's easily one of the fastest systems in the world currently.
weatheriscool
Posts: 13582
Joined: Sun May 16, 2021 6:16 pm

Re: Supercomputing News and Discussions

Post by weatheriscool »



Nvidia and Amazon Upgrade Project Ceiba AI Supercomputer to Blackwell


https://www.extremetech.com/computing/n ... -blackwell
The change will increase the AI performance of the upcoming machine by 6x, according to Nvidia.

Nvidia is fresh off the unveiling of its new Blackwell AI superchip, and it's wasting no time making plans to roll that hardware out. Nvidia and Amazon partnered up last year to build what was to be one of the fastest supercomputers in the world, known as Project Ceiba. Now, the companies have said Project Ceiba will get a Blackwell upgrade to make it up to six times faster than originally envisioned.

The version of Project Ceiba discussed last year was still a beast, featuring more than 16,000 H100 Hopper AI accelerators. Nvidia predicted the machine would have offered 65 exaflops of AI processing power when complete. The current leading supercomputer is the US Department of Energy's Frontier machine, which can hit 1.1 exaFLOPS with thousands of AMD Epyc CPUs and Radeon GPUs.
weatheriscool
Posts: 13582
Joined: Sun May 16, 2021 6:16 pm

Re: Supercomputing News and Discussions

Post by weatheriscool »

Russia Is Working on a 128-Core Supercomputing Platform: Report
The country's notoriously ancient computer systems are due for an upgrade.
By Josh Norem April 22, 2024
Russia has always lagged behind the rest of the industrial world when it comes to information technology, and now sanctions from its war on Ukraine have held it back even further. Despite this situation, the country is reportedly in the early stages of deploying a new supercomputing and cloud platform that will feature up to 128 CPU cores per server cluster. It's unknown where these computer parts will be made, however, as Russia isn't known for running advanced silicon fabs.

The details about Russia's plans come from CNews, which appears to be a Russian news site. The site notes a state-owned company named Roselectronics has been developing this new computing platform called Basis using "domestic technologies." The platform is both scalable and a fusion of software and hardware. Each Basis module includes three servers with up to 128 CPU cores, along with 2TB of memory, though the architecture used for the CPUs isn't disclosed. It's unknown if it will feature a monolithic or chiplet design.
https://www.extremetech.com/computing/r ... orm-report
Post Reply