Computers

New cache management approach boosts application speeds by 9.5 percent

New cache management approach boosts application speeds by 9.5 percent
A new technique called Dense Footprint Cache (DFC) can boost application speed and cut energy use
A new technique called Dense Footprint Cache (DFC) can boost application speed and cut energy use 
View 1 Image
A new technique called Dense Footprint Cache (DFC) can boost application speed and cut energy use
1/1
A new technique called Dense Footprint Cache (DFC) can boost application speed and cut energy use 

By improving the efficiency with which computer processors find and retrieve the data they need from memory, researchers from Samsung Electronics and North Carolina State University (NC State) have given computer applications a speed boost of over nine percent, while reducing energy use by over four percent.

Though computers store all data to be manipulated off-chip in main memory (aka RAM), data required regularly by the processor is also temporarily stored in a die-stacked DRAM (dynamic random access memory) cache that allows the data to be retrieved more quickly. This data is stored in large blocks, or macroblocks, that allows the processor to locate the data it needs, but means additional, unwanted data contained in the macroblocks is also retrieved, wasting time and energy.

By getting the cache to learn over time the specific data from each macroblock the processor requires, researchers from Samsung and NC State were able to improve the efficiency of data retrieval in a couple of ways. Firstly, it speeds up data retrieval by allowing the cache to compress the macroblock so it only contains the relevant data and, secondly, the compressed macroblocks free up space in the cache for other data the processor is more likely to need.

This technique is called Dense Footprint Cache (DFC) and was compared to current state-of-the-art die-stacked DRAM management methods using a processor and memory simulator. The researchers found, after running 3 billion instructions for each of the applications tested, a boost in speed of 9.5 percent and reduction in energy use of 4.3 percent.

The researchers also found the Dense Footprint Cache approach significantly reduced the incidence of last-level cache (LLC) misses. This is when the processor attempts to retrieve data from the cache that isn't there, meaning the data needs to be retrieved from off-chip main memory, which wastes time and energy. In testing, Dense Footprint Cache reduced LLC miss ratios by 43 percent.

The team will present its paper (PDF) on DFC at the International Symposium on Memory Systems being held in Washington DC from Oct. 3-6.

Source: NC State

2 comments
2 comments
Knut
Please learn what a cache is before posting. This is no novelty, a real cache is associative memory, there are no "other things" and there are no "blocks", it is made up so that you present a pattern, and if this pattern is present, the content is returned - and sets a "used bit", that allows cache content not used to be cleared. This is ancient technology, from the early days of computers, around 1980, and the technology has been patented and improved beyond this. The main problem with cache is with multiprocessors, not just cores but say disk DMA transfers and dedicated typical video hardware. This "other" hardware can set the content of the memory - "RAM" and bypass the CPU - or "cores". The usual solution is to separate this into Video RAM and IO RAM - where speed then is a percent of the usual speed, and all updates go right through the cache. Another way to optimise is allowing the cache to understand some instructions, and fetch the next instruction before the first is executed - that is a "pre-fetch" and the cache can then be loaded with the next instruction. Well a branch "if...then... else..." will demand of the prefetch to load two next instructions. The success of a Norwegian maker of computers were based on this - up to the next 5 instructions could be loaded into cache, and 3 on branches making the CPUs run on full speed all the time - typically 20 times faster than their US competitors. On special programs, FORTAN code was analysed to see if code could be executed in parallel between the CPUs. The design catered for 256 "processors" and allowed all memory to be shared, although it has a special "Window" for the operating system only. The memory sharing technology was finalised for Sun Microsystems and patented and used by all the US main server manufacturers. This allows resilience to be implemented - if one core fails, then this can just be disables, all the other cores can and will just use the same memory and nobody notice a thing - even the prefetch makes the next instructions available to the others. The problem is North Carolina State University is lack of research in the past, and willingness to search for others that may have solved things already and way faster than any US company. I admit that the peaceful Norwegians have enabled both India and Pakistan to acquire nuclear technology, but denied Israel the use of this hardware. But, you never know if one of those that licensed the technology has broken their promise. I was shown their chipset by the Chinese, who had used it to make their supercomputer. So: Take a look abroad first - who cares what happens at a symposium about archaic memory design in NC?
DaveCummins
Seems like they deliberately contrived a test that would show maximum benefit. Further, cache is only cache, it does not learn. They might have included a coprocessor or changed processor architecture but didn't get the cache itself to learn anything.