Monocular 3D scene modeling and inference: Understanding multi-object traffic scenes

Wojek, Christian; Roth, Stefan; Schindler, Konrad; Schiele, Bernt

2010

Conference Paper

Abstract

Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling, and tracking. In this paper, we present a novel probabilistic 3D scene model that encompasses multi-class object detection, object tracking, scene labeling, and 3D geometric relations. This integrated 3D model is able to represent complex interactions like inter-object occlusion, physical exclusion between objects, and geometric context. Inference allows recovering 3D scene context and performing 3D multiobject tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. In particular, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for two different types of challenging onboard sequences. We first show a substantial improvement to the state-of-the-art in 3D multi-people tracking. Moreover, a similar performance gain is achieved for multi-class 3D tracking of cars and trucks on a new, challenging dataset.

Author(s)

Wojek, Christian

TU Darmstadt

Roth, Stefan

TU Darmstadt GRIS

Schindler, Konrad

TU Darmstadt

Schiele, Bernt

TU Darmstadt

Hauptwerk

Computer Vision. ECCV 2010, 11th European Conference on Computer Vision. Proceedings. Pt.IV

Konferenz

European Conference on Computer Vision (ECCV) 2010

Options

Monocular 3D scene modeling and inference: Understanding multi-object traffic scenes