![]() |
Hand-drawing Analysis |
![]() |
Martin Szummer, Markus Svensén, Christopher Bishop, Tom Minka
Diagrams and graphics are most naturally drawn with a pen. It has now become easy to capture pen input directly on a computer, with the spread of pen computers such as TabletPCs, PDAs, mobile phones with styluses and drawing tablets. However, the tools for editing, searching and managing such electronic ink are still very limited. The computer is poor at "understanding" the structure or meaning of drawing and writing. Without such understanding it is difficult for the computer to offer intelligent editing of the ink: for example, it cannot recognise whether the user has been writing text or drawing a figure. The main challenge is that users want the freedom to scribble almost anything with the pen, without going through menus or other constraints on input. The resulting input consists of a seemingly infinite variety of types (maps, networks, graphs), in many different styles. Every user is different!
Much of pen input involves mixed drawings containing both graphics and text. Our research is on interpreting diagrams and drawings and the general structure and layout of ink. For example, before one can attempt to recognize hand-writing, we must first distinguish writing from drawing. To make sense of a wide variety of diagrams, we recognize commonly occurring graphical elements, such as containers and connectors.
In the past, people tried to craft ink recognizers by hand: for example, detecting boxes by looking for pen strokes forming corners or parallel lines. Unfortunately, due to the wide variability between users, it is very difficult and tedious to make such recognizers accurate. Machine learning is a more promising way to tackle the ink recognition problem. It is relatively easy to collect real writing and drawing from users. We gather many hand-drawn diagrams and other sketches from users, label them with the desired classes we want to recognize, and then apply new machine learning techniques to build recognizers. Based on the user examples, our techniques learn what aspects or features of the pen strokes are important for recognizing boxes.
Drawing Parsing
We base our model on a Conditional Random Field (CRF). This is a graphical model with nodes corresponding to ink strokes, and edges between nodes that we want to model dependencies (context) between. Thanks to the sparseness of drawings, we are able to perform exact inference in this model, using the junction-tree algorithm. All this is detailed in our CRF paper, which includes noise-robust potentials. Also read about new developments, including joint labeling and partitioning, Bayesian model averaging, and user feedback. |
Text vs. Graphics
|
Shape matching
We can match and recognize arbitrary shapes from a single example, as long as they are rigid and only deform according to affine transformations (scaling, shearing and rotation). We determine the best match among multiple templates (lines, squares, triangles, circles) by employing Bayesian model comparison. Detailed paper about shape matching. |
All our publications relevant to drawing analysis.
Machine Learning and Perception—Machine Learning—Hand-drawing Analysis