|
Bio
Paul manages a
team of over 120 engineers on 4 continents. We deliver the algorithms that
interpret users' queries to create the Search result page for Bing. This
involves matching the query to web results, as well as other types of
“structured” data. Our team also measures the overall quality of
Bing results across the entire page.
Paul came to Microsoft as a Researcher in
2002. Before moving to Search, Paul
and his team worked on numerous efforts to use machine
learning in the analysis of documents, emails, and web pages. Results of this work can be seen in
products like Windows, Live Search, and Microsoft Dynamics. In collaboration with the Live Toolbar team we built the
technology behind “smart
menus”. The Tablet PC team uses our technology
to extract the structure in handwritten
ink notes. East Asian Office is using his
technology to extract contact information from
incoming emails. Dynamics/
Paul has served on the program committees of
conferences such as Neural Information Process Systems (NIPS), Computer Vision
and Pattern Recognition (CVPR), and the International Conference on Computer
Vision (ICCV). He has received the
Marr Prize for the best
paper in computer vision (at ICCV 2003). An earlier paper on medical image processing received
an honorable mention for the Marr prize in 1995. He received an honorable mention for
best paper at AAAI 2004.
While at MIT he received the NSF Career award as one of the top junior faculty
members in Computer Science.
Paul’s interest in intelligent systems
goes back quite a ways. As an
Paul’s thesis work on the registration
of images from various medical sensors has been widely used and reimplemented (his thesis
has been referenced more than 800 times).
It is now a standard technique that appears in many commercial
products, and is widely considered
the best and most reliable registration technique for assistance in surgical
planning.
In 1995, Dr. Viola returned to MIT as an
assistant professor and later an associate professor. His work focused on statistical learning
for image processing and computer vision. In the area of computer graphics
his work with Jeremy De Bonet is considered the first effective texture synthesis algorithm
for complex textural patterns.
Other work included techniques for image
database retrieval and 3D
reconstruction.
While visiting Compaq Cambridge research labs
in 2000, he
created the world’s first
real-time face detection system.
This system has been widely adopted and reimplemented:
For more or less complete lists of my
references please try: DBLP,
IEEE,
SCHOLAR,
or CITESEER.
Old Research Overview
My past work (with a wide range of
collaborators both inside of MSR and in the product groups) is at the
intersection of Machine Learning, Natural Language Processing, and Computer
Vision. We have constructed systems
which understand documents which can be used to route them to
the correct recipient, extract structured information, or to repurpose them for
other tasks. For example, the Tablet PC makes it easy to jot down
notes or to derive equations. I am
working with the Tablet team to understand these ink documents so that they can
be reused and edited.
We have built a number of systems:
|
|
Fax Routing. (DAS 2004 paper) We
have created a system that can routed incoming fax
images. Optical character
recognition finds the words, and they are then evaluated to determine which
are relevant. For
example, words are relevant if they are near the word “TO”. The relevant words are then compared
to a database of recipients using a fuzzy matching algorithm. |
|
|
Contact Parsing (AAAI 2004 paper, SIGIR 2005 paper) Given
an address block from the bottom of an email, web page, or scanned document,
automatically extract the key fields and fill them into a form. The system works along with a novel UI
which makes correcting errors easy. See
also the internal web
site. Send mail if you would
like to download a demo. |
|
|
Ink Outline and List Analysis (IWFHR 2004 paper)
Processes handwritten notes from the tablet PC to find list and outline
structure. Once found, the
structure allows you to provide more powerful editing (like opening and
closing sub-trees). It is also
easier to import the notes into Word and OneNote. |
|
|
Recognition and Grouping of Ink . (DAS 2004 paper) Given
a page of ink strokes there are two related challenges. First you must group the stokes on the
page into valid sets (i.e. group the 3 strokes in an H). Second you must recognize the
groups. This is difficult to do
well unless you perform both tasks simultaneously. |
|
|
Document Structure Extraction (ICDAR 2005 paper, ICCV 2005 paper) From a
document scan, or a PDF file, the
words and lines can be extracted accurately. What is missing is higher level
information about the document.
Is it one or two columns?
Where is the title? Is
this block a part of a footnote,
or a section of the main text?
If you had this information it is easy to import the text
+ structure back into Word to make editing easy. |
|
Older Work Robust Real-time Object Detection We have created
a new visual object detection framework that is capable of processing images
extremely rapidly while achieving high detection rates. There are three key
contributions. The first is the introduction of a new image
representation called the ``Integral Image'' which allows the features used
by our detector to be computed very quickly. The second is a learning
algorithm, based on AdaBoost, which selects a small
number of critical visual features and yields extremely efficient
classifiers. The third contribution is a method for combining
classifiers in a ``cascade'' which allows background regions of the
image to be quickly discarded while spending more computation on promising
object-like regions. A set of experiments in the domain of face
detection are presented. The system yields face detection performace
comparable to the best previous systems. Implemented on a
conventional desktop, face detection proceeds at 15 frames per second. The best overview of the approach is available in
these papers: IJCVor CVPR
2001 (shorter) . We also proposed a new learning algorithm called AsymBoost which improves performance of the cascade: NIPS 14, Dec 2001
. This work grew out of earlier research on image
database retrieval CVPR 2000
(see below ). Mutual Information Matching In 1995 we
developed a new approach for solving computer vision problems based on
entropy. This approach can be used to derive algorithms for pose estimation,
object recognition, shape from shading, and lightness compensation. Each of
these algorithms is based on a simple non-parametric estimate for the entropy
of a signal. My thesis
contains a good overview of these ideas. Other papers include: IJCV-97 and Medical Image Analysis-96. Complex Feature Recognition In 1996 we
developed a new Bayesian framework for visual object recognition which is
based on the insight that images of objects can be modeled as a conjunction
of local features. This framework can be used to both derive an object
recognition algorithm and an algorithm for learning the features themselves.
The overall approach, called complex feature recognition or Instead of a single simple feature such as an
edge, A paper describing Non-parametric Multi-scale Model
for Images In 1997 we created
a novel multi-scale statisitical model for images.
One of the original motivations for this work was a flaw in the mutual
information approach described above. In that framework the entropy of the
image and model were estimated as if the pixels were independent. This
multi-scale approach provided a much more powerful model for the dependencies
in image. While there have been many proposed approaches to
the principled statistical modeling of images, each has been limited in
either the complexity of the models or the complexity of the images. Our
approach is much more general and can be used for recognition, image
de-noising, and in a ``generative mode'' to synthesize high quality textures.
Several papers describing this approach can be found here: NIPS-97, SIGGRAPH-97 and CVRP-98. Image Database Retrieval (and Text too!) Starting in 1997
we began to study the role of high dimensional representations in image
database retrieval. Contrary to most work in the field, we created a very
large set of features from each image. These features were designed to be
very selective--each only responds to a very small percentage of images. At first it might seem that the introduction of
tens of thousands of features could only make the query learning process
infeasible. How can a problem which is difficult given ten to twenty features
become tractable with 10,000. Two recent results in machine learning argue
that this is not necessarily a terrible mistake: ``support vector machines ( The best paper in this area appeared in IJCV in 2003. A paper describing a early version of this approach
was published in NIPS-97.
Satisfyingly very similar ideas have proven
valuable in text retrieval: NIPS-98
(PDF) . Handwritten Mathematical Expression
Recognition We have built a
number of systems that can parse and interpret handwritten mathematical
expressions. What makes this hard is that the semantics of a mathematical
expression comes from the spatial arrangement of the symbols. In a sense this
is computer vision problem. A paper describing a early version of this approach was published in AAAI-98 More
recently, Nick Matsakis has written a Master's thesis
describing these ideas. Nick has also put together a demo and
some other some other related information. The Computer Vision Macroscope At MIT my
students and I constructed a a real-time 3D
reconstruction and event recording suite. Our first paper in this area describes a very
fast algorithm for 3D reconstruction which uses prior information to improve
the results of silhouette intersection. Silhouette intersection is one
approach for reconstructing the 3-dimensional shape of an object from
multiple views. Using this approach, the task is to produce a binary labeling
of a set of voxels, that determines which voxels are filled and which are
empty. In this paper, we give an energy minimization formulation of the
silhouette intersection problem. The global minimum of this energy can be
rapidly computed with a single graph cut, using a result due to Greig, Porteous and Seheult. CVPR-00 . |
|
|