For nearly 10 years, computer processors have been getting faster by using multiple cores rather than raising their individual speeds. This measure makes our PCs and smartphones more power-efficient, but also makes it much trickier to write programs that take full advantage of their hardware. Swarm, a new chip design developed at MIT, could now come to the rescue and unleash the full power of parallel processing for up to 75-fold speedups, while requiring programmers to write a fraction of the code.
Developed by Prof. Daniel Sanchez and team, Swarm is a 64-core chip that includes specialized circuitry for both executing and prioritizing tasks in a simple and efficient manner, taking the onus off software developers.
Writing software for a multi-core chip is a lot like coordinating a complex team project: not all tasks can be delegated, and the ones that can must be carefully split among team members. With software, this sort of planning can be complicated, time-consuming, and add substantial overheads that end up slowing the software's execution. For this reason, parallel programming is usually convenient only for large tasks that number thousands of instructions.
In addition, developers must also face the fickle problem of handling data conflicts. A task might request to access and edit a piece of data that is currently being worked on by another task. To avoid corrupting the data, programmers must manually handle priorities between tasks and make sure shared resources can only be accessed by one task at a time (forcing other tasks to idly wait for their turn).
The Swarm architecture tackles these problems by featuring specialized circuitry for delegating even the smallest of tasks very efficiently and enforcing a strict priority among them. As a result, programmers can execute tasks in parallel with little overhead, making software run up to tens times faster.
"Swarm has two advantages over conventional multicores," Sanchez tells us. "First, Swarm supports tiny tasks, as small as tens of instructions, efficiently. By contrast, current multicores need larger tasks (thousands of instructions or more) to operate efficiently. Supporting smaller tasks allows more parallelism, simply because there often is a lot of parallelism inside each large task.
"Second, Swarm enforces a global order among these tasks. By contrast, current multicores cannot support ordered execution efficiently, especially with small tasks."
This global order is very useful for handling data conflicts. After tasks are automatically prioritized (according to a metric set by the developer), Swarm starts working on the highest-priority subroutines in parallel, taking advantage of its 64 cores. If data conflicts arise, they can now be handled automatically: for instance, if a low-priority task modifies data that is later accessed by a high-priority task, the data value is temporarily reverted to allow the critical tasks to complete sooner.
To test their new architecture, Sanchez and team compared Swarm versions of six common algorithms to their highly-optimized parallel counterparts. Remarkably, the Swarm software executed the same task three to 18 times faster, despite requiring only about one tenth of the code. In one case, the system was able to achieve an impressive 75-fold speedup on an algorithm that computer scientists had so far failed to parallelize.
The researchers suspect that the complexity of developing software for multi-core systems may have been one of the key reasons why chip manufacturers have been holding back on the number of cores. Swarm could now solve this problem and pave the way for general-purpose chips with a huge number of cores.
"In an upcoming paper, we demonstrate scalability to hundreds of cores," Sanchez tells us. "In principle, this style of architecture can scale to even larger systems (e.g., multiple chips and boards) as long as the application has enough parallelism."
The researchers are currently working on techniques to make Swarm even more efficient by reducing data movement and exploring new programming models.
A paper describing the advance appears in the latest edition of the journal IEEE Micro.