Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Person re-identification across aerial and ground-based cameras by deep feature fusion

: Schumann, A.; Metzler, Jürgen

Volltext urn:nbn:de:0011-n-4614077 (13 MByte PDF)
MD5 Fingerprint: 1bc99f6c74a798b2e72f146c3a286ebe
Erstellt am: 10.8.2017

Sadjadi, Firooz A. (Ed.) ; Society of Photo-Optical Instrumentation Engineers -SPIE-, Bellingham/Wash.:
Automatic Target Recognition XXVII : 10-11 April 2017, Anaheim, California, United States
Bellingham, WA: SPIE, 2017 (Proceedings of SPIE 10202)
ISBN: 978-1-5106-0905-1
ISBN: 978-1-5106-0906-8
Paper 102020A, 12 S.
Conference "Automatic Target Recognition" <27, 2017, Anaheim/Calif.>
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IOSB ()
person re-identication; aerial; camera network; covariance descriptors; deep learning; fusion; retrieval

Person re-identification is the task of correctly matching visual appearances of the same person in image or video data while distinguishing appearances of different persons. The traditional setup for re-identification is a network of fixed cameras. However, in recent years mobile aerial cameras mounted on unmanned aerial vehicles (UAV) have become increasingly useful for security and surveillance tasks. Aerial data has many characteristics different from typical camera network data. Thus, re-identification approaches designed for a camera network scenario can be expected to suffer a drop in accuracy when applied to aerial data. In this work, we investigate the suitability of features, which were shown to give robust results for re- identification in camera networks, for the task of re-identifying persons between a camera network and a mobile aerial camera. Specifically, we apply hand-crafted region covariance features and features extracted by convolutional neural networks which were learned on separate data. We evaluate their suitability for this new and as yet unexplored scenario. We investigate common fusion methods to combine the hand-crafted and learned features and propose our own deep fusion approach which is already applied during training of the deep network. We evaluate features and fusion methods on our own dataset. The dataset consists of fourteen people moving through a scene recorded by four fixed ground-based cameras and one mobile camera mounted on a small UAV. We discuss strengths and weaknesses of the features in the new scenario and show that our fusion approach successfully leverages the strengths of each feature and outperforms all single features significantly.