Computers

Software tweak doubles computer processing speed, halves energy use

Software tweak doubles computer processing speed, halves energy use
You may not need to upgrade hardware to achieve enormous processing and efficiency gains, according to new research
You may not need to upgrade hardware to achieve enormous processing and efficiency gains, according to new research
View 5 Images
You may not need to upgrade hardware to achieve enormous processing and efficiency gains, according to new research
1/5
You may not need to upgrade hardware to achieve enormous processing and efficiency gains, according to new research
Comparison between how (a) conventional heterogeneous computers, (b) conventional heterogeneous computers with software pipelining, and (c) SHMT execute functions
2/5
Comparison between how (a) conventional heterogeneous computers, (b) conventional heterogeneous computers with software pipelining, and (c) SHMT execute functions
SHMT speedup with various scheduling policies (relative to baseline GPU)
3/5
SHMT speedup with various scheduling policies (relative to baseline GPU)
The SHMT prototype platform
4/5
The SHMT prototype platform
Energy consumption and energy delay products
5/5
Energy consumption and energy delay products
View gallery - 5 images

Existing processors in PCs, smartphones and other devices can be supercharged for enormous power and efficiency gains using a new parallel processing software framework designed to eliminate bottlenecks and use multiple chips at once.

Most modern computers, from smartphones and PCs to data center servers, contain graphics processing units (GPUs) and hardware accelerators for AI and machine learning. Well-known commercial examples include Tensor Cores on NVIDIA GPUs, Tensor Processing Units (TPUs) on Google Cloud servers, Neural Engines on Apple iPhones, and Edge TPUs on Google Pixel phones.

Each of these components processes information separately, shuffling information from one processing unit to the next, and this often creates bottlenecks in the data flow. In a new study, researchers from the University of California Riverside (UCR) demonstrate a method where existing diverse components operate simultaneously to greatly improve processing speed and reduce energy consumption.

“You don’t have to add new processors because you already have them,” said Hung-Wei Tseng, a UCR associate professor of electrical and computer engineering and co-lead author on the study.

The researchers’ framework, called simultaneous and heterogeneous multithreading (SHMT), moves away from traditional programming models that can only delegate a region of code exclusively to one kind of processor, leaving other resources idling and not contributing to the current function.

Instead, SHMT exploits the diversity – or heterogeneity – of multiple components, breaking the computational function up to share it among them. In other words, it’s a type of parallel processing.

Comparison between how (a) conventional heterogeneous computers, (b) conventional heterogeneous computers with software pipelining, and (c) SHMT execute functions
Comparison between how (a) conventional heterogeneous computers, (b) conventional heterogeneous computers with software pipelining, and (c) SHMT execute functions

How it works

Feel free to skip this bit, but for the more computer-science-minded, here’s a (still very basic) overview of how SHMT works. A set of virtual operations (VOPs) allows a CPU program to ‘offload’ a function to a virtual hardware device. During program execution, a runtime system drives SHMT’s virtual hardware, gauging the hardware resource’s ability to make scheduling decisions.

SHMT uses a quality-aware work-stealing (QAWS) scheduling policy that doesn't hog resources, but helps maintain quality control and workload balance. The runtime system divides VOPs into one or more high-level operations (HLOPs) to simultaneously use multiple hardware resources.

Then, SHMT’s runtime system allocates these HLOPs to the task queues of the target hardware. Because HLOPs are hardware-independent, the runtime system can adjust the task assignment as required.

Prototype testing and results

To test the concept, the researchers built a system with the sorts of chips and processing power you'd find in any decent late-model smartphone, with a couple of tweaks made so they could also test what it might be capable of doing in a data center.

The SHMT prototype platform
The SHMT prototype platform

Specifically, they custom-built an embedded system platform using NVIDIA’s Jetson Nano module containing a quad-core ARM Cortex-A57 processor (CPU) and 128 Maxwell architecture GPU cores. A Google Edge TPU was connected to the system via its M.2 Key E slot.

The CPU, GPU and TPU exchanged data via the onboard PCIe interface – a standardized interface for motherboard components like graphics cards, memory and storage devices. The system’s main memory – 4 GB 64-bit LPDDR4, 1,600 MHz at 25.6 GB/s – hosted the shared data. The Edge TPU additionally contains an 8 MB device memory, and Ubuntu Linux 18.04 was used as the operating system.

They tested the SHMT concept using benchmark applications, and found that the framework with the best-performing QAWS policy knocked it out of the park, with a 1.95X boost to speed and a remarkable 51% cut in energy consumption compared to the baseline method.

SHMT speedup with various scheduling policies (relative to baseline GPU)
SHMT speedup with various scheduling policies (relative to baseline GPU)

What does it all mean?

The researchers say the implications for SHMT are huge. Yes, software apps on your existing phones, tablets, desktops and laptops could use this new software library to achieve some pretty wild performance gains. But it could also reduce the need for expensive, high-performance components, leading to cheaper and more efficient devices.

Since it reduces energy use, and by extension cooling requirements, the approach could optimise two key line items if you're running a data center, while also bringing carbon emissions and water use down.

Energy consumption and energy delay products
Energy consumption and energy delay products

As always, further research is needed regarding system implementation, hardware support, and what kinds of applications stand to benefit most, but with these kinds of results we imagine the team will have little trouble attracting resources to get it out there.

The study was presented at MICRO 2023, the 56th Annual IEEE/ACM International Symposium on Microarchitecture.

Source: UCR

View gallery - 5 images
3 comments
3 comments
Thony
(As always sorry for ma english i don't like use of ggl translate) What i want to know about this is if it can be added in my current pc. Withing an update in Os or Bios. I do not want it to turn half or full vapoware for any (political or economic or business or Loby) reason.... I learned the long hard way how to be extra warry of too good to be true news like this one. But as you said at the end of post, it will stil need some more researchs... Lets hope for the best.
BigBilby
This sort of advancement is excellent, but just being able to do this does not mean that it can be done securely. Once the hackers of the world have been banging on it for at least 12 months, and patches have been applied to mitigate the inevitable vulnerabilities, then it will be time to celebrate.
Vladimir Getov
This work is clearly interesting but the results are applicable ONLY for heterogeneous multithreading - e.g. it will not work for an ordinary laptop like mine or Thony's PC unless it includes different processors (CPU, GPU, TPU, etc). Therefore, the beginning of this article - "You may not need to upgrade hardware to achieve enormous processing and efficiency gains" - is valid ONLY for heterogeneous architectures.