Ever since Steven Spielberg’s 2002 sci-fi movie Minority Report, in which a black-clad Tom Cruise stands in front of a transparent screen manipulating a host of video images simply by waving his hands, the idea of gesture-based computer interfaces has captured the imagination of technophiles. Academic and industry labs have developed a host of prototype gesture interfaces, ranging from room-sized systems with multiple cameras to detectors built into laptops’ screens. But MIT researchers have developed a system that could make gestural interfaces much more practical. Aside from a standard webcam, like those found in many new computers, the system uses only a single piece of hardware: a multicolored Lycra glove that could be manufactured for about a dollar.
Other prototypes of low-cost gestural interfaces have used reflective or colored tape attached to the fingertips, but “that’s 2-D information,” says Robert Wang, a graduate student in the Computer Science and Artificial Intelligence Laboratory who developed the new system together with Jovan Popović, an associate professor of electrical engineering and computer science. “You’re only getting the fingertips; you don’t even know which fingertip [the tape] is corresponding to.” Wang and Popović’s system, by contrast, can translate gestures made with a gloved hand into the corresponding gestures of a 3-D model of the hand on screen, with almost no lag time. “This actually gets the 3-D configuration of your hand and your fingers,” Wang says. “We get how your fingers are flexing.”
The most obvious application of the technology, Wang says, would be in video games: Gamers navigating a virtual world could pick up and wield objects simply by using hand gestures. But Wang also imagines that engineers and designers could use the system to more easily and intuitively manipulate 3-D models of commercial products or large civic structures.
Robert Wang demonstrates the speed and precision with which the system can gauge hand position in three dimensions — including the flexing of individual fingers — as well as a possible application in mechanical engineering.
Video: Robert Y. Wang/Jovan Popović
The glove went through a series of designs, with dots and patches of different shapes and colors, but the current version is covered with 20 irregularly shaped patches that use 10 different colors. The number of colors had to be restricted so that the system could reliably distinguish the colors from each other, and from those of background objects, under a range of different lighting conditions. The arrangement and shapes of the patches was chosen so that the front and back of the hand would be distinct but also so that collisions of similar-colored patches would be rare. For instance, Wang explains, the colors on the tips of the fingers could be repeated on the back of the hand, but not on the front, since the fingers would frequently be flexing and closing in front of the palm.
Technically, the other key to the system is a new algorithm for rapidly looking up visual data in a database, which Wang says was inspired by the recent work of Antonio Torralba, the Esther and Harold E. Edgerton Associate Professor of Electrical Engineering and Computer Science in MIT’s Department of Electrical Engineering and Computer Science and a member of CSAIL. Once a webcam has captured an image of the glove, Wang’s software crops out the background, so that the glove alone is superimposed upon a white background. Then the software drastically reduces the resolution of the cropped image, to only 40 pixels by 40 pixels. Finally, it searches through a database containing myriad 40-by-40 digital models of a hand, clad in the distinctive glove, in a range of different positions. Once it’s found a match, it simply looks up the corresponding hand position. Since the system doesn’t have to calculate the relative positions of the fingers, palm, and back of the hand on the fly, it’s able to provide an answer in a fraction of a second.
Of course, a database of 40-by-40 color images takes up a large amount of memory — several hundred megabytes, Wang says. But today, a run-of-the-mill desktop computer has four gigabytes — or 4,000 megabytes — of high-speed RAM memory. And that number is only going to increase, Wang says.
Changing the game
“People have tried to do hand tracking in the past,” says Paul Kry, an assistant professor at the McGill University School of Computer Science. “It’s a horribly complex problem. I can’t say that there’s any work in purely vision-based hand tracking that stands out as being successful, although many people have tried. It’s sort of changing the game a bit to say, ‘Hey, okay, I’ll just add a little bit of information’” — the color of the patches — “‘and I can go a lot farther than these purely vision-based techniques.’” Kry particularly likes the ease with which Wang and Popović’s system can be calibrated to new users. Since the glove is made from stretchy Lycra, it can change size significantly from one user to the next; but in order to gauge the glove’s distance from the camera, the system has to have a good sense of its size. To calibrate the system, the user simply places an 8.5-by-11-inch piece of paper on a flat surface in front of the webcam, presses his or her hand against it, and in about three seconds, the system is calibrated.
Wang initially presented the glove-tracking system at last year’s Siggraph, the premier conference on computer graphics. But at the time, he says, the system took nearly a half-hour to calibrate, and it didn’t work nearly as well in environments with a lot of light. Now that the glove tracking is working well, however, he’s expanding on the idea, with the design of similarly patterned shirts that can be used to capture information about whole-body motion. Such systems are already commonly used to evaluate athletes’ form or to convert actors’ live performances into digital animations, but a system based on Wang and Popović’s technique could prove dramatically cheaper and easier to use.