Adopting a new, highly automated and reconfigurable approach to hardware acceleration, researchers at the University of California, San Diego (UCSD) have come up with a way to harness the unused silicon real estate in smartphones – the so-called "dark silicon" – as special-purpose processors dynamically optimized to perform the most common tasks in an efficient way.
The dark side of the chip
Keeping pace with Moore's Law is a major concern for chip manufacturers, as they try to cram more and more transistors onto a chip. As researchers are gradually approaching the physical limits of silicon, the search is on for alternative materials that would allow us to build even tinier transistors.
But all of these efforts could come next to useless as, particularly in portable, battery-operated devices, another problem has emerged and is limiting the processing power in today's microchips: as successive generations of microprocessors feature more and more transistors, the "power budget" available to operate them is growing at a much slower rate, meaning that chips don't have enough power to activate all of their transistors and exploit all of their resources... and the situation is only going to get worse.
With each successive process generation, these power constraints cause the percentage of a chip that can actively switch to drop exponentially, giving way to more and more inactive transistors – the so-called "dark silicon." This limit, which the UCSD call the "utilization wall," is changing the way manufacturers build chips. An example is Intel's Nehalem architecture, whose turbo boost makes some cores run faster as others are temporarily switched off.
Conservation cores: hardware accelerators with a twist
The main thesis behind the team's work is that, as transistors keep getting smaller and the percentage of unused silicon keeps growing, the chip area becomes secondary in importance to the amount of power used. The idea of using specialized processors to perform the most often recurring functions – which the team dubbed "hot code" – in a highly optimized, power-saving fashion, therefore starts making more and more sense.
The idea isn't by any means new. Specialized hardware for heavily used functions has been present in electronics for decades. The algorithms for the most common arithmetical functions, for instance, are hardwired into every processor's ALU. But what's new about the team's approach is its great versatility. The chips aren't just a bunch of wires more or less stuck in a certain pattern, but can dynamically figure out what software to optimize and how to do it, given the program's source code.
These specialized, reprogrammable low-power processors ("conservation cores," or "c-cores") sit alongside the main microprocessor and take care of the "hot code" while the main microprocessor executes the remaining instructions. The end result of this specialization is a drastic power saving: c-cores can consume as little as 8 picojoules per instruction, compared with the 91 picojoules that's typical for a MIPS microprocessor.
Chip makers can produce similar types of specialized processors by hand, but the UCSD team developed a fully automated system that generates the blueprints for the c-cores directly from source code extracted from applications. This means that as software upgrades are being installed, the types of functions assigned to the c-core can be changed dynamically, without manual intervention.
GreenDroid (or: Android, 11x more power-efficient)
The researchers don't plan to release a commercial chip using this technology anytime soon, although they are working on GreenDroid, a prototype chip aimed at the Android smartphone platform. The chip will consist of 16 blocks, each measuring one square millimeter in size and containing a MIPS microprocessor and 6 to 10 intercommunicating Android c-cores.
The chip is meant to be used for mobile applications in smartphones, which have very strict power constraints. By using this architecture, the team was able to improve energy efficiency by a factor of 11 in the c-cores, and by a factor of 7.5 when accounting for the code running in the main microprocessor.