IBM has come up with a way to make training and running generative AI models five times faster and much more energy efficient by swapping out copper wires for beams of light to connect data center components.
The paradox of our digital age is that as computers become smaller, the problems become bigger. As chips cram on more transistors until they number in the billions, they've made possible the massive data centers and the processing power needed to make modern generative AI models possible. However, these models require more and more processing power as they evolve, the data centers have become major energy consumers, and the chips themselves are pushing not only at the limits of their technology, but the laws of physics.
Two technological and physical bottlenecks for these data centers are the simple copper wire and the speed that electrons can flow along it. These are one of the major reasons electronics are so compact. It isn't just for convenience. It's because as computers become faster and more powerful, the time that data takes to go from one component to another becomes a major performance factor.
In fact, data in the form of electrons have become such a bottleneck that most CPUs spend most of their time sitting idle and consuming energy while waiting for the next data packet to arrive.
To speed things up, IBM has developed what it claims is the next generation of optical technology. Using optics to shift data around isn't new. It's been employed to move information from place to place by means of fiber optic cables for decades. However, this has mainly been for long distances. Once the data arrives and enters the computer itself, it's back to copper wires.
To overcome this, IBM is turning to a new process for creating Co-Packaged Optics (CPO) in the form of a Polymer Optical Waveguide (PWG) that routes optical signals between the photonic integrated circuits (PICs) and external connections like single-mode fibers (SMFs). The company says that tests of the PWG show that if it was used in data centers they would require five times less power than conventional versions and allow cable connections to stretch from one meter to hundreds, allowing for more flexible architecture while carrying terabits of data per second.
IBM's claim is that the power reduction for training one AI model would be enough to run 5,000 US homes for a year and using light would reduce the time for training an AI Large Language Model from three months to three weeks thanks to having 80 times the bandwidth of conventional systems.
"As generative AI demands more energy and processing power, the data center must evolve – and co-packaged optics can make these data centers future-proof," said Dario Gil, SVP and Director of Research at IBM. "With this breakthrough, tomorrow’s chips will communicate much like how fiber optics cables carry data in and out of data centers, ushering in a new era of faster, more sustainable communications that can handle the AI workloads of the future."
Source: IBM