By Rob Knies
March 8, 2007 6:00 AM PT
For all the mind-boggling advances in information technology over the past couple of decades, much work remains—and a significant amount of it occurs behind the scenes, in areas near invisible to the casual computer user.
Some computer-science work gets lots of publicity; witness the explosion of interest in search technology in recent years. Social networking has been in the spotlight for quite a while now, and peer-to-peer networks continue to garner our attention. And then there are those world-conquering mobile phones.
Such technologies gain their high profiles because they are, in a real or virtual sense, tangible. You can hold a cellphone, and while popular Web sites and services might not be literally tangible, they certainly seem that way when they become woven into people’s day-to-day lives.
But not all computer scientists can—or choose to—labor under the media glare. There are plenty of challenging problems to go around: difficult, necessary work every bit as important to advancing the state of the art as their more flashy counterparts.
Want proof? How about computer security? Bet that got your attention. Oh, and managing vast amounts of data—encountered that one lately? If you’re like most of us, of course you have.
Those happen to be just two of the many hard problems being tackled by Microsoft Research personnel, and during TechFest, Microsoft Research’s annual showcase of leading-edge projects, being held March 7-8 in Redmond, rather ingenious solutions are proposed for each of these key challenges.
Business-intelligence applications often need to match table entries that represent the same real-world entity, such as a customer name. Are Daniel Smith and Dan Smith the same person? Product databases pose similar scenarios; products may be labeled differently in various places, making reconciliation a must to the maintenance of reliable information.
Such record matching is vital to accurate data analysis but can be challenging to achieve. Matching logic might need to compare multiple portions of an entry—and their combinations—and might need to consider similarities. But Venky Ganti, a researcher in the Data Management, Exploration and Mining Group within Microsoft Research Redmond, is offering a potential solution.
“It is often much easier for a programmer to provide a set of example matching and non-matching record pairs,” Ganti says, “than it is to design an accurate query from scratch and work manually through the gamut of available choices. Assisting programmers by automatically creating accurate record-matching programs with the help of examples, which they can then review, edit, and execute, is of great help.”
Such example-based programs have been tried before but have failed to generate programs that proved scalable for large collections of relational data.
“This is where our technique shines,” Ganti says, “and thus helps take an important step in reducing the difficulty of data cleaning over large data warehouses.”
Ganti’s work offers two significant benefits:
This approach, which uses SQL Server™ Integration Services as a platform, makes it easier to create initial record-matching programs, composed over a basic set of primitive operators, by specifying a set of examples. The resulting program then can be modified to meet the requirements of the scenario being addressed.
“Our technology,” Ganti says, “helps significantly reduce the time required to develop scalable and accurate record-matching programs. In fact, we demonstrate that the programs generated using our technology are comparable in accuracy to handcrafted, domain-specific commercial technology.”
For some time now, computer security analysts have investigated the use of fingerprints as authentication tools, physical “passwords” unique to an individual that don’t have to be memorized or written down. It seems logical, and fingerprint readers are being used to provide access credentials in various business and home scenarios.
One problem, though: In the wrong hands—and, unfortunately, there are a lot of wrong hands out there—fingerprints used for authentication also could be used for nefarious purposes. And there you have it: Another promising, useful idea scuttled by the Web’s illicit element.
Perhaps not. Ramarathnam Venkatesan and Mariusz Jakubowski, principal researcher and senior researcher, respectively, for Microsoft Research Redmond’s Cryptography and Anti-Piracy Group, has devised a technique called Biometric Authentication via Fingerprint Hashing that offers the same benefits of unique, handy fingerprint “passwords” without the vulnerability attached.
“If we store the fingerprint on a computer,” Venkatesan says, “or pass it on a network or show it to someone, they can use the image of the fingerprint to misuse it. So the question is: Can we verify the fingerprint by using some data in a way in which that data does not disclose the fingerprint itself?”
His idea is to use fingerprint hashes, summaries of information contained in human fingerprints. The method calculates and aggregates various metrics over fingerprint images, producing short summaries that cannot be used to reconstruct source fingerprints absent a key to the metrics.
“We propose a way to represent fingerprints such that it is not easy to figure out what the fingerprint is,” Venkatesan explains. “That representation allows us to compare fingerprints even when there are small changes due to the way in which people register their fingerprints in a scanner.”
Indeed, with the technique’s resistance to minor distortions, fingerprint hashes can help provide biometric authentication and thus can augment or replace traditional passwords. As a result, the security and the usability of Web services and other client-server systems are enhanced.
“There are fewer than 7 billion people on the planet,” Venkatesan says, “but a typical security system requires a much larger number of combinations. The challenge was to use randomness in a secret key to derive a representation that combines both the fingerprint and the random key.”
The key, it seems, is the key.
“Given the fingerprint representation, it is not easy to figure out the fingerprint,” Venkatesan concludes. “The use of randomness makes it possible to have a fingerprint verified without revealing what exactly it is.”