New 36-core chip takes design clues from internet routers

New 36-core chip takes design clues from internet routers
A new design for multicore processors advanced at MIT solves the problem of cache coherence (Image: MIT)
A new design for multicore processors advanced at MIT solves the problem of cache coherence (Image: MIT)
View 1 Image
A new design for multicore processors advanced at MIT solves the problem of cache coherence (Image: MIT)
A new design for multicore processors advanced at MIT solves the problem of cache coherence (Image: MIT)

Researchers at MIT are experimenting with a radically new design for multicore microchips that takes hints from the way internet routers work to make data flow between cores faster and more reliably. The ideas are now being put to the test on an innovative 36-core chip that might soon see commercial applications.

The problem with buses

Power consumption in a microchip is directly proportional to its clock speed. Chip designers have in recent years stopped pushing CPU frequencies as high as possible, and are instead electing to improve performance by increasing the number of cores, or processing units, inside a chip.

Multiple-core chips are usually faster than single-cores because they can split up the computational load into many chunks and run them in parallel rather than in sequence. But not every computing task can be seamlessly split into equal parts and carried out independently. In order to effectively complete its chunk of computations, each core needs to be able to share data with the other cores. Normally this happens through a single bundle of wires called a "bus."

The problem is that as two cores talk to each other through a bus, the bus becomes unavailable for other cores, meaning that this architecture won't scale up for massively multicore chips.

Current chips have two to eight cores, which already stretches the limit of the single-bus architecture. The ten-core chips in high-end servers include a second bus, but adding more buses simply won't work if the chips of the future are to include hundreds, perhaps even thousands of cores, because driving long wires through that many cores would drain too much power.


A research group headed by MIT's Li-Shiuan Peh is proposing a new way for cores to talk to each other which has the advantage of being much more scalable. The team's approach takes hints from the way routers send data through the internet, and is able to find multiple paths for data to travel between cores, making communication much faster and more fluid.

Rather than relying on a single bus, Peh and colleagues envision a system in which each core in the chip can only communicate with the four cores immediately next to it, and is able to route data as needed. This means driving much shorter wires, which allow for lower voltage and significantly lower power consumption for inter-core communication.

Like with an internet router, the paths through which data travels are flexible and can easily find alternate routes to get to the intended destination. In this context, this is useful in cases where a section of the bus is already being hogged up for the communication between two cores.

The researchers have designed and built their own 36-core chip featuring this architecture to test its performance. They will use this prototype to see whether their "network-on-a-chip" also solves one of the big problems that other teams have faced in similar attempts – that of maintaining cache coherence.

Remaining coherent

Sending data from a core all the way to and from main memory takes a relatively long time. In order to speed up computations, each core includes its own cache, a very small but very high-speed memory that the core can access for temporary calculations.

However, because several cores may be modifying the same data at the same time, there needs to be a way to maintain consistency between the cache memories embedded within all the various cores.

Conventional computer chips do this using a "bus sniffing" protocol in which each core monitors the bus for communications coming from other cores that might invalidate the data in its local memory.

If a core updates the data in its local cache, it immediately sends a communication "warning" the other cores that it has updated that particular piece of data. So now, if another core wants to access the updated value, it needs to broadcast a request over the bus asking for the updated value, and whichever core has the up-to-date value sends it back through the bus. Because there's only one shared bus and only one inter-core communication at a time is possible, keeping data synchronized becomes relatively straightforward.

But if you take away the single bus and have data flying everywhere in unsynchronized packets, as is the case with Peh's network-on-a-chip, maintaining cache coherence becomes a lot harder. The researchers solved this problem by adding to the core network a second, "shadow" network of synchronized circuits that send notifications throughout the chip as soon as one core requests a piece of data from another core.

Each router knows which requests were issued, and by which core. Because each of the chip's 36 cores are given a different priority, this hierarchical order simulates the chronological order in which the requests would be sent over a standard bus, meaning that the bus sniffing protocol still works, but it now becomes easily scalable to chips that could have hundreds, even thousands of cores.

What's next?

The researchers are planning to test their 36-core chip using a modified version of the Linux operating system, evaluating the chip's performance and testing the accuracy of the team's speed claims.

After that, the team will release the blueprints for the chip as open-source code, which raises the possibility that we might see such a commercial chip design in the near future.

Source: MIT

1 comment
1 comment
Michael Dexter
This architecture is very similar to an existing chip called the parrallela which is produced by a company called adapteva. The chips have between 16 and 64 cores and each core and communicate with the four cores next to them just like in this article. Their Epiphany IV chip has 64 cores and can achieve 102 GFLOPS while consuming only 2 watts of power!!