|
Bio
Paul
is taking a hiatus from research to dive into the incredibly hard problems that
arise from internet search. He
recently took the position of “Architect for Machine Learning” on
the Live Search team. There he is
building a team of Applied Researchers and Developers that will build tools for
document processing, query processing, and ranking.
Before
moving to Search, Paul and his team worked on numerous efforts to use machine
learning in the analysis of documents, emails, and web pages. Results of this work can be seen in
products like Windows, Live Search, and Microsoft Dynamics. In collaboration with the Live Toolbar team we built the
technology behind “smart
menus”. The Tablet PC
team uses our technology to extract the structure in handwritten
ink notes. East Asian
Office is using his technology to extract contact information from
incoming emails. Dynamics/
Paul
has served on the program committees of conferences such as Neural Information
Process Systems (NIPS), Computer Vision and Pattern Recognition (CVPR), and the
International Conference on Computer Vision (ICCV). He has received the Marr Prize for the best
paper in computer vision (at ICCV 2003). An earlier paper
on medical image processing received an honorable mention for the Marr
prize in 1995. He received an
honorable mention for best paper
at AAAI 2004. While at MIT he received the NSF Career award as one of the top
junior faculty members in Computer Science.
Paul’s
interest in intelligent systems goes back quite a ways. As an
Paul’s
thesis work on the registration of images from various medical sensors has been
widely used and reimplemented (his thesis
has been referenced more than 800 times).
It is now a standard technique that appears in many commercial
products, and is widely considered
the best and most reliable registration technique for assistance in surgical
planning.
In
1995, Dr. Viola returned to MIT as an assistant professor and later an
associate professor. His work
focused on statistical learning for image processing and computer vision. In the area of computer graphics
his work with Jeremy De Bonet is considered the first
effective texture synthesis algorithm for complex textural
patterns. Other work included
techniques for image
database retrieval and 3D
reconstruction.
While
visiting Compaq Cambridge research labs in 2000, he created the world’s first
real-time face detection system.
This system has been widely adopted and reimplemented:
For
more or less complete lists of my references please try: DBLP,
IEEE,
SCHOLAR,
or CITESEER.
Old Research Overview
My past work (with a wide range of
collaborators both inside of MSR and in the product groups) is at the
intersection of Machine Learning, Natural Language Processing, and Computer
Vision. We have constructed systems
which understand documents which can be used to route them to
the correct recipient, extract structured information, or to repurpose them for
other tasks. For example, the Tablet PC makes it easy to jot down
notes or to derive equations. I am
working with the Tablet team to understand these ink documents so that they can
be reused and edited.
We have built a number of systems:
|
|
Fax Routing. (DAS
2004 paper) We have created a system that can routed incoming fax
images. Optical character
recognition finds the words, and they are then evaluated to determine which
are relevant. For
example, words are relevant if they are near the word “TO”. The relevant words are then compared
to a database of recipients using a fuzzy matching algorithm. |
|
|
Contact Parsing (AAAI
2004 paper, SIGIR
2005 paper) Given an address block from the bottom of an email, web
page, or scanned document, automatically extract the key fields and fill them
into a form. The system works
along with a novel UI which makes correcting errors easy. See also
the internal web
site. Send mail if you would like to
download a demo. |
|
|
Ink Outline and List Analysis (IWFHR
2004 paper) Processes handwritten notes from the tablet PC to find
list and outline structure. Once
found, the structure allows you to provide more powerful editing (like
opening and closing sub-trees).
It is also easier to import the notes into Word and OneNote. |
|
|
Recognition and Grouping of Ink . (DAS
2004 paper) Given a page of ink strokes there are two related challenges. First you must group the stokes on the
page into valid sets (i.e. group the 3 strokes in an H). Second you must recognize the groups. This is difficult to do well unless
you perform both tasks simultaneously. |
|
|
Document Structure Extraction (ICDAR
2005 paper, ICCV
2005 paper) From a
document scan, or a PDF file, the
words and lines can be extracted accurately. What is missing is higher level
information about the document.
Is it one or two columns?
Where is the title? Is
this block a part of a footnote,
or a section of the main text?
If you had this information it is easy to import the text
+ structure back into Word to make editing easy. |
|
Older Work Robust Real-time Object Detection We have created
a new visual object detection framework that is capable of processing images
extremely rapidly while achieving high detection rates. There are three key
contributions. The first is the introduction of a new image
representation called the ``Integral Image'' which allows the features used
by our detector to be computed very quickly. The second is a learning
algorithm, based on AdaBoost, which selects a small number of critical visual
features and yields extremely efficient classifiers. The third
contribution is a method for combining classifiers in a ``cascade''
which allows background regions of the image to be quickly discarded
while spending more computation on promising object-like regions. A set
of experiments in the domain of face detection are presented. The system
yields face detection performace comparable to the best previous systems.
Implemented on a conventional desktop, face detection proceeds at 15 frames
per second. The best overview of the approach is available in
these papers: IJCVor
CVPR
2001 (shorter) . We also proposed a new learning algorithm called
AsymBoost which improves performance of the cascade: NIPS
14, Dec 2001 . This work grew out of earlier research on image
database retrieval CVPR
2000 (see below ). Mutual Information Matching In 1995 we
developed a new approach for solving computer vision problems based on
entropy. This approach can be used to derive algorithms for pose estimation,
object recognition, shape from shading, and lightness compensation. Each of
these algorithms is based on a simple non-parametric estimate for the entropy
of a signal. My thesis
contains a good overview of these ideas. Other papers include: IJCV-97
and Medical
Image Analysis-96. Complex Feature Recognition In 1996 we
developed a new Bayesian framework for visual object recognition which is
based on the insight that images of objects can be modeled as a conjunction
of local features. This framework can be used to both derive an object
recognition algorithm and an algorithm for learning the features themselves.
The overall approach, called complex feature recognition or Instead of a single simple feature such as an
edge, A paper describing Non-parametric Multi-scale Model
for Images In 1997 we
created a novel multi-scale statisitical model for images. One of the
original motivations for this work was a flaw in the mutual information
approach described above. In that framework the entropy of the image and
model were estimated as if the pixels were independent. This multi-scale
approach provided a much more powerful model for the dependencies in image. While there have been many proposed approaches to
the principled statistical modeling of images, each has been limited in
either the complexity of the models or the complexity of the images. Our
approach is much more general and can be used for recognition, image
de-noising, and in a ``generative mode'' to synthesize high quality textures.
Several papers describing this approach can be found here: NIPS-97,
SIGGRAPH-97
and CVRP-98.
Image Database
Retrieval (and Text too!) Starting in 1997
we began to study the role of high dimensional representations in image
database retrieval. Contrary to most work in the field, we created a very
large set of features from each image. These features were designed to be
very selective--each only responds to a very small percentage of images. At first it might seem that the introduction of
tens of thousands of features could only make the query learning process
infeasible. How can a problem which is difficult given ten to twenty features
become tractable with 10,000. Two recent results in machine learning argue
that this is not necessarily a terrible mistake: ``support vector machines ( The best paper in this area appeared in IJCV
in 2003. A paper describing a early version of this approach was published in
NIPS-97.
Satisfyingly very similar ideas have proven
valuable in text retrieval: NIPS-98
(PDF)
. Handwritten Mathematical Expression
Recognition We have built a
number of systems that can parse and interpret handwritten mathematical
expressions. What makes this hard is that the semantics of a mathematical
expression comes from the spatial arrangement of the symbols. In a sense this
is computer vision problem. A paper describing a early version of this
approach was published in AAAI-98
More recently, Nick Matsakis has written a Master's
thesis describing these ideas. Nick has also put together a demo and
some other some other related information. The Computer Vision Macroscope At MIT my
students and I constructed a a real-time 3D reconstruction and event
recording suite. Our first paper in this area describes a very
fast algorithm for 3D reconstruction which uses prior information to improve
the results of silhouette intersection. Silhouette intersection is one
approach for reconstructing the 3-dimensional shape of an object from
multiple views. Using this approach, the task is to produce a binary labeling
of a set of voxels, that determines which voxels are filled and which are
empty. In this paper, we give an energy minimization formulation of the
silhouette intersection problem. The global minimum of this energy can be
rapidly computed with a single graph cut, using a result due to Greig, Porteous
and Seheult. CVPR-00
. |
|
|