The purpose of this thesis was to perform facade image parsing with shape grammars in order to tackle single-view image-based 3D building modeling. The scope of the thesis was lying at the border of Computer Graphics and Computer Vision, both in terms of methods and applications.
Two different and complementary approaches have been proposed: a bottom-up parsing algorithm that aimed at grouping similar regions of a facade image so as to retrieve the underlying layout, and a top-down parsing algorithm based on a very powerful framework: Reinforcement Learning. This novel parsing algorithm uses pixel-wise image supports based on supervised learning in a global optimization of a Markov Decision Process.
Both methods were evaluated quantitatively and qualitatively. The second one was proved to support various architectures, several shape grammars and image supports, and showed robustness to challenging viewing conditions; illumination and large occlusions. The second method outperformed the state-of-the-art both in terms of segmentation and speed performances. It also provides a much more flexible framework, in which many extensions may be envisioned.
The conclusion of this work was that the problem of single-view image-based 3D building modeling could be solved elegantly by using shape grammar as a Rosetta stone to decipher the language of Architecture through a well-suited Reinforcement Learning formulation. This solution was a potential answer to large-scale reconstruction of urban environments from images, but also suggested the possibility of introducing Reinforcement Learning in other vision tasks such as generic image parsing, where it have been barely explored so far.
Keywords: Computer Vision, Computer Graphics, Procedural Modeling, Image Parsing, Reinforcement Learning, Image-based Modeling.
|Institution||Ecole Centrale Paris|