3D vehicle trajectory reconstruction in monocular video data using environment structure constraints

Bullinger, Sebastian; Bodensteiner, Christoph; Arens, Michael; Stiefelhagen, Rainer

doi:10.1007/978-3-030-01249-6_3

2018

Conference Paper

Abstract

We present a framework to reconstruct three-dimensional vehicle trajectories using monocular video data. We track two-dimensional vehicle shapes on pixel level exploiting instance-aware semantic segmentation techniques and optical ow cues. We apply Structure from Motion techniques to vehicle and background images to determine for each frame camera poses relative to vehicle instances and background structures. By combining vehicle and background camera pose information, we restrict the vehicle trajectory to a one-parameter family of possible solutions. We compute a ground representation by fusing background structures and corresponding semantic segmentations. We propose a novel method to determine vehicle trajectories consistent to image observations and reconstructed environment structures as well as a criterion to identify frames suitable for scale ratio estimation. We show qualitative results using drone imagery as well as driving sequences from the Cityscape dataset. Due to the lack of suitable benchmark datasets we present a new dataset to evaluate the quality of reconstructed three-dimensional vehicle trajectories. The video sequences show vehicles in urban areas and are rendered using the path-tracing render engine Cycles. In contrast to previous work, we perform a quantitative evaluation of the presented approach. Our algorithm achieves an average reconstruction-to-groundtruth-trajectory distance of 0.31 meter using this dataset. The dataset including evaluation scripts will be publicly available on our website.

Author(s)

Bullinger, Sebastian

Bodensteiner, Christoph

Arens, Michael

Stiefelhagen, Rainer

Mainwork

Computer Vision - ECCV 2018

Conference

European Conference on Computer Vision (ECCV) 2018

Options

3D vehicle trajectory reconstruction in monocular video data using environment structure constraints