By Michael Irving
May 29, 2023
https://newatlas.com/computers/nvidia-a ... dgx-gh200/
While AI systems amaze and alarm the world in equal measure, they’re about to get even more powerful. Nvidia has announced a new class of supercomputer that will train the next generation of AI models, and put us all out of work far faster.
The new system is known as the Nvidia DGX GH200, and it will apparently be capable of a massive 1 exaflop of performance. Between the 256 GH200 “superchips” it’s made of, the system will pack an astonishing 144 TB of shared memory, which is 500 times more than Nvidia’s previous supercomputer, the DGX A100, unveiled just three years ago.
To wring out every last drop of power, each GH200 superchip is made up of the company’s Grace CPU and H100 Tensor Core GPU in one package, letting them communicate with each other seven times faster than a PCIe connection and using just one-fifth of the electricity. They’ll all be connected together through the Nvidia NVLink Switch System, to function together as one big GPU.
The resulting supercomputer will be used to train the successors to ChatGPT and other generative AI and large language models. That most famous of AI systems was trained on a custom supercomputer that Microsoft built out of tens of thousands of Nvidia’s earlier A100 GPUs. The company is once again among the first in line for the new gear, along with Meta and Google Cloud.
Nvidia isn’t just supplying other companies with equipment though – it’s also announced plans to build its own DGX GH200-based supercomputer named Helios. Expected to fire up by the end of 2023, Helios will be made up of four DGX GH200 systems, or 1,024 GH200 superchips, networked together. That would make it capable of a total of 4 exaflops of performance, which sounds like an eye-watering amount of power.