Nvidia's new Grace CPU is designed for AI-powered supercomputers

Nvidia's new Grace CPU is desi...
Nvidia's Grace CPU offers a significant AI performance boost over current solutions
Nvidia's Grace CPU offers a significant AI performance boost over current solutions
View 1 Image
Nvidia's Grace CPU offers a significant AI performance boost over current solutions
Nvidia's Grace CPU offers a significant AI performance boost over current solutions

Nvidia's newest Central Processing Unit (CPU) has been unveiled, and it's quite a chip: offering a 10x performance leap over current solutions, it's focused on artificial intelligence calculations and natural language processing (NLP).

The CPU is called Grace, after US computer programming pioneer Grace Hopper, and it's destined for high-end data centers and supercomputers rather than the average desktop. Nvidia says it will be running supercomputers at the Swiss National Supercomputing Centre (CSCS) and the Los Alamos National Laboratory in the US in the coming years.

Built to analyze huge datasets and process information at super-fast speeds, the Grace CPU is based on Arm architecture, following Nvidia's acquisition of chipset designer Arm last year. The chip will work in tandem with Nvidia graphics processing units (GPUs) and will start appearing in machines in 2023.

The Grace CPU is going to be particularly adept at handling deep-learning models, where vast amounts of data need to be crunched and compared to train AI's to make the right decisions more of the time – whether that's recognizing your voice when you talk to your smart speaker or helping a self-driving car learn what a traffic light looks like.

It's another shot across the bows of Intel, which develops the Xeon chip for the same kind of data center use cases that Grace is now targeting. Apple recently switched its computers to its own custom chips rather than Intel silicon, and this could be another area where Intel starts to lose market share.

And it's a hugely profitable market, too – the need for cloud computing centers is growing and growing, as is the demand for AI-powered systems that businesses and scientists can tap into.

"NVIDIA's novel Grace CPU allows us to converge AI technologies and classic supercomputing for solving some of the hardest problems in computational science," said CSCS Director Professor Thomas Schulthess in a press statement. "We are excited to make the new Nvidia CPU available for our users in Switzerland and globally for processing and analyzing massive and complex scientific datasets."

According to Nvidia, the combination of the Grace CPU with Nvidia's latest GPUs is going to achieve a 30x higher aggregate bandwidth compared with the best servers today – so that's a lot more data that can be shifted at once. Energy efficiency should improve too.

While you won't be ordering the Grace CPU for your next self-built PC, you may well benefit from it in the coming years, as it finds a place in more data centers and more powerful supercomputers across the world.

Source: Nvidia

Apple's M1 ARM chips seems to be outperforming Intel in benchmarks. Something interesting that M1 does that I suspect Nvidia is also doing is using unified memory. The memory is on the same package as the GPU and CPU (though not the same SoC) and the GPU and CPU can access data though the same memory address. Usually a CPU with an integrated GPU if they share memory will virtually carve out separate memory space and data has to be moved between them (which taxes CPU). Another sort of related architecture shift is that the first SSD's just used the slow SATA interface and the first PCIe and M.2 SSD still used the SATA specification. NVMe M.2 SSD's only became popular in the last 4 or so years but there were often many bottlenecks. The new PS5 and Xbox were mostly redesigned and re-architected to leverage the NVMe SSD speed. Nvidia is using similar concepts though GPUDirect which basically allows the GPU to directly access the SSD so it isn't starved by slow i/o ( ). There are some interesting developments happening and it's worth noting that Intel recently decided to enter the discrete GPU business so a CPU from Nvidia could eventually be positioned to directly compete.

At a low level, the memory pool is not unified like the M1. The CPU gets LPDDR5, and the GPU gets HBM, each on their own respective bus. There is a *very* fast CPU-GPU link, but its not quite on-die or on-package fast yet.

But at a high level, its probably fine. HBM is too expensive for huge pools, while super wide LPDDR5 is impractical.

But yes, the whole industry is moving in the direction of unified memory spaces, even if the unification is abstracted away. Nvidia aside, just look at the momentum CXL is gaining.