Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

The MTA Dataset for Multi Target Multi Camera Pedestrian Tracking by Weighted Distance Aggregation

: Köhl, Philipp; Specker, Andreas; Schumann, Arne; Beyerer, Jürgen

Postprint urn:nbn:de:0011-n-5970199 (1.4 MByte PDF)
MD5 Fingerprint: 1a503a59060bcef94c587714652a3b94
© IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Created on: 1.8.2020

Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computer Society:
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020. Proceedings : 14-19 June 2020, virtual
Los Alamitos, Calif.: IEEE Computer Society Conference Publishing Services (CPS), 2020
ISBN: 978-1-7281-9360-1
ISBN: 978-1-7281-9361-8
Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) <2020, Online>
Conference Paper, Electronic Publication
Fraunhofer IOSB ()

Existing multi target multi camera tracking (MTMCT) datasets are small in terms of the number of identities and video length. The creation of new real world datasets is hard as privacy has to be guaranteed and the labeling is tedious. Therefore in the scope of this work a mod for GTA V to record a MTMCT dataset has been developed and used to record a simulated MTMCT dataset called Multi Camera Track Auto (MTA). The MTA dataset contains over 2,800 person identities, 6 cameras and a video length of over 100 minutes per camera. Additionally a MTMCT system has been implemented to provide a baseline for the created dataset. The system’s pipeline consists of stages for person detection, person re-identification, single camera multi target tracking, track distance calculation, and track association. The track distance calculation comprises a weighted aggregation of the following distances: a single camera time constraint, a multi camera time constraint using overlapping camera areas, an appearance feature distance, a homography matching with pairwise camera homographies, and a linear prediction based on the velocity and the time difference of tracks. When using all partial distances, we were able to surpass the results of state-of-the-art single camera trackers by +13% IDF1 score. The MTA dataset, code, and baselines are available at