Chips as mini Internets

The data-routing techniques that undergird the Internet could increase the efficiency of multicore chips while lowering their power requirements.


Computer chips have stopped getting faster. In order to keep increasing chips’ computational power at the rate to which we’ve grown accustomed, chipmakers are instead giving them additional “cores,” or processing units.

Best of 2012

Today, a typical chip might have six or eight cores, all communicating with each other over a single bundle of wires, called a bus. With a bus, however, only one pair of cores can talk at a time, which would be a serious limitation in chips with hundreds or even thousands of cores, which many electrical engineers envision as the future of computing.

Li-Shiuan Peh, an associate professor of electrical engineering and computer science at MIT, wants cores to communicate the same way computers hooked to the Internet do: by bundling the information they transmit into “packets.” Each core would have its own router, which could send a packet down any of several paths, depending on the condition of the network as a whole.

At the Design Automation Conference in June, Peh and her colleagues will present a paper she describes as “summarizing 10 years of research” on such “networks on chip.” Not only do the researchers establish theoretical limits on the efficiency of packet-switched on-chip communication networks, but they also present measurements performed on a test chip in which they came very close to reaching several of those limits.

Last stop for buses

In principle, multicore chips are faster than single-core chips because they can split up computational tasks and run them on several cores at once. Cores working on the same task will occasionally need to share data, but until recently, the core count on commercial chips has been low enough that a single bus has been able to handle the extra communication load. That’s already changing, however: “Buses have hit a limit,” Peh says. “They typically scale to about eight cores.” The 10-core chips found in high-end servers frequently add a second bus, but that approach won’t work for chips with hundreds of cores.

For one thing, Peh says, “buses take up a lot of power, because they are trying to drive long wires to eight or 10 cores at the same time.” In the type of network Peh is proposing, on the other hand, each core communicates only with the four cores nearest it. “Here, you’re driving short segments of wires, so that allows you to go lower in voltage,” she explains.

In an on-chip network, however, a packet of data traveling from one core to another has to stop at every router in between. Moreover, if two packets arrive at a router at the same time, one of them has to be stored in memory while the router handles the other. Many engineers, Peh says, worry that these added requirements will introduce enough delays and computational complexity to offset the advantages of packet switching. “The biggest problem, I think, is that in industry right now, people don’t know how to build these networks, because it has been buses for decades,” Peh says.

Forward thinking

Peh and her colleagues have developed two techniques to address these concerns. One is something they call “virtual bypassing.” In the Internet, when a packet arrives at a router, the router inspects its addressing information before deciding which path to send it down. With virtual bypassing, however, each router sends an advance signal to the next, so that it can preset its switch, speeding the packet on with no additional computation. In her group’s test chips, Peh says, virtual bypassing allowed a very close approach to the maximum data-transmission rates predicted by theoretical analysis.

The other technique is something called low-swing signaling. Digital data consists of ones and zeroes, which are transmitted over communications channels as high and low voltages. Sunghyun Park, a PhD student advised by both Peh and Anantha Chandrakasan, the Joseph F. and Nancy P. Keithley Professor of Electrical Engineering, developed a circuit that reduces the swing between the high and low voltages from one volt to 300 millivolts. With its combination of virtual bypassing and low-swing signaling, the researchers’ test chip consumed 38 percent less energy than previous packet-switched test chips. The researchers have more work to do, Peh says, before their test chip’s power consumption gets as close to the theoretical limit as its data transmission rate does. But, she adds, “if we compare it against a bus, we get orders-of-magnitude savings.”

Luca Carloni, an associate professor of computer science at Columbia University who also researches networks on chip, says “the jury is always still out” on the future of chip design, but that “the advantages of packet-switched networks on chip seem compelling.” He emphasizes that those advantages include not only the operational efficiency of the chips themselves, but also “a level of regularity and productivity at design time that is very important.” And within the field, he adds, “the contributions of Li-Shiuan are foundational.”


Topics: Computer chips, Computer Science and Artificial Intelligence Laboratory (CSAIL), Computer science and technology, Design, Electrical engineering and electronics, Microchips, Multicore, Chip architecture, Packet switching

Comments

Will be Great improvement to increase the efficiency of multicore chips while lowering their power requirements. Dr.A.Jagadeesh Nellore(AP),India
In the late 1970s, at Xerox Parc and Stanford, we looked briefly into packet switching among chip components, but of course the chips were too small. Now they are big. The trade-offs between store-and-forward packet routing and broadcast packet switching can now be made among cores. If I were you, I'd consider the various possible forms of on-chip Ethernet (;->).
A multi-processor operating system may swap interrupted task information to shared storage, accessible to all processors. This contrasts with networked processors, which do not typically directly share any physical memory devices. To dispatch an interrupted task on a multicore chip, I suspect that the OS merely selects the next task from a shared dispatch queue when a processor becomes available. The processor then directly loads the swapped task's programs and data from shared memory. I expect that direct inter-processor communication is limited to serialization of direct memory access and other highly specialized functions. I doubt that a generalized networking protocol would improve the efficiency of inter processor communications within a multicore chip. Perhaps I'm overlooking something...
Dear Miss Li Shihuan Peh In this case the packet will have to be very lean with no error correction. However for multiple communication the packet can carry a bit which indicates for how many routers this message is intended for. But the problem in this case is how will the routers recognise for whom the packet is intended for. Regards Siddharth Madanlal Modi
Dear Miss Li Shihuan Peh This is the second part of my chip on networks . We can use a central node to route the packet to the routers. The packet will contain a bit for the count the message is intended for the routers and there will be one bit for the address of the routers.. Regards Siddharth
Back to the top