Learning to Discover Structure for 3D Vision

Zihan Zhou

Sponsoring Agency
National Science Foundation


People are able to spontaneously perceive structure, that is, orderly, regular, or coherent patterns and relationships, in images. When walking along a city street, a human can instantly identify parallel lines, rectangles, rotational symmetries, repetitive patterns, and many other types of structure. This project develops a novel data-driven framework for structure discovery in computer vision, leveraging the availability of massive data and recent advances in machine learning techniques. The techniques developed in this project can be applied to a wide spectrum of real-world applications such as 3D reconstruction of man-made environments, virtual and augmented reality, and indoor rescue robots. Further, the ability to understand 3D perceptual organization as humans do can benefit other fields including (i) cognitive science, as it produces new computational models which can be used to test and develop existing theories, and to explore new details and aspects of the brain, (ii) human-robot interaction, as it enables robots to reason in terms of geometric shape, physics, and dynamics, and (iii) architectural engineering, as it facilitates interactions with existing standards for construction and management of buildings.

The project is built upon a formal definition of structure, consisting of (i) the constituent patterns, (ii) the domain of replication or continuation of the patterns, and (iii) any change of the pattern and domain over space and time (Witkin and Tenenbaum, 1983). The research has three aims. The first aim lays the computational foundation by innovating machine learning methods to detect the domain of the structure and its constituent patterns, respectively. The second aim further establishes a unified framework for structure discovery, going beyond the bottom-up scheme and sequential processing. Here, the challenge lies in that structure often spans large spatial extent, and is typically subject to pattern deformation and domain distortion. In the last aim, the researchers incorporate the structure in complex vision systems to demonstrate its advantage in real-world applications. Ultimately, the project strives to significantly improve the effectiveness and efficiency of 3D vision systems, and to enrich the general computer vision principles.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Research Area
Artificial Intelligence and Big Data