AI & Humanoids

Nvidia unveils AI supercomputer that will put you out of a job much faster

Nvidia has unveiled the DGX GH200 AI supercomputer, which will be capable of up to 1 exaflop of performance
Nvidia
Nvidia has unveiled the DGX GH200 AI supercomputer, which will be capable of up to 1 exaflop of performance
Nvidia

While AI systems amaze and alarm the world in equal measure, they’re about to get even more powerful. Nvidia has announced a new class of supercomputer that will train the next generation of AI models, and put us all out of work far faster.

The new system is known as the Nvidia DGX GH200, and it will apparently be capable of a massive 1 exaflop of performance. Between the 256 GH200 “superchips” it’s made of, the system will pack an astonishing 144 TB of shared memory, which is 500 times more than Nvidia’s previous supercomputer, the DGX A100, unveiled just three years ago.

To wring out every last drop of power, each GH200 superchip is made up of the company’s Grace CPU and H100 Tensor Core GPU in one package, letting them communicate with each other seven times faster than a PCIe connection and using just one-fifth of the electricity. They’ll all be connected together through the Nvidia NVLink Switch System, to function together as one big GPU.

The resulting supercomputer will be used to train the successors to ChatGPT and other generative AI and large language models. That most famous of AI systems was trained on a custom supercomputer that Microsoft built out of tens of thousands of Nvidia’s earlier A100 GPUs. The company is once again among the first in line for the new gear, along with Meta and Google Cloud.

Nvidia isn’t just supplying other companies with equipment though – it’s also announced plans to build its own DGX GH200-based supercomputer named Helios. Expected to fire up by the end of 2023, Helios will be made up of four DGX GH200 systems, or 1,024 GH200 superchips, networked together. That would make it capable of a total of 4 exaflops of performance, which sounds like an eye-watering amount of power.

But of course, there’s a caveat to those numbers. Currently, the most powerful supercomputer in the world is the US DOE’s Frontier, at 1.194 exaflops, and at a glance it may sound like Nvidia’s Helios will be four times more powerful – but it’s comparing apples and oranges. Nvidia is using a less precise measure called FP8, while supercomputers are generally ranked using double-precision FP64. If converted into FP64, Helios would be cracking about 36 petaflops, or 0.036 exaflops.

That said, Helios and the DGX GH200 superchips that it’s based on are still incredibly powerful tools, and they’ll be able to churn out AI models within weeks instead of months, Nvidia says. We mere humans better polish up our resumés.

Source: Nvidia [1], [2]

  • Facebook
  • Twitter
  • Flipboard
  • LinkedIn
5 comments
paleochocolate
Truly a Gizmag quality headline.
Daishi
It sounds like GPT-4 was trained on "tens of thousands of A100's" and in benchmarks H100 is about 4x faster at GPT-3 than A100 was. That probably provides a decent estimate of how many of these would be needed to train GPT-4. ~20k A100's would still be about 5k H100's or about 20 of these full systems. Considering current market price for a H100 is sitting around $45k USD I guess we can see why Nvidia stock has skyrocketed. Altman said the training cost of GPT-4 was "much more" than $100 million so my back of napkin math is probably not far off. Training costs are falling at about 60% per year and models don't need to be the size of GPT-4 to be useful either though.
WillyDoodle
Great! Now, can we get about .01% of that into Siri so she knows the contextual difference between "were" and "we're", "they're/there/their" and realizes that when I type "yiu" chances are about 99.99% I mean "you"?
Username
Does it come in a laptop?!
ljaques
And 3, 2, 1, Skynet, is that you? "No, but I'm coming soon. Not to worry."
All it will take is a bad guy, a big and fast AI, and a caustic training system, which will create our downfall.