Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Adapted deep feature fusion for person re-identification in aerial images

: Schumann, Arne; Metzler, Jürgen


Dudzik, Michael C. (Ed.) ; Society of Photo-Optical Instrumentation Engineers -SPIE-, Bellingham/Wash.:
Autonomous Systems. Sensors, Vehicles, Security, and the Internet of Everything : 16-18 April 2018, Orlando, Florida, United States
Bellingham, WA: SPIE, 2018 (Proceedings of SPIE 10643)
ISBN: 978-1-5106-1797-1
ISBN: 978-1-5106-1798-8
Paper 106430L, 6 pp.
Conference "Autonomous Systems - Sensors, Vehicles, Security, and the Internet of Everything" <2018, Orlando/Fla.>
Conference Paper
Fraunhofer IOSB ()

Person re-identification is the task of matching visual appearances of the same person in image or video data while distinguishing appearances of different persons. With falling hardware costs cameras mounted on unmanned aerial vehicles (UAVs) have become increasingly useful for security and surveillance tasks in recent years. Re-identification approaches have to adapt to the new challenges posed by this type of data, such as unusual and changing viewpoints or camera motion. Furthermore, the characteristics of the data will change between the scenarios the UAV is used in. This requires robust models that can handle a wide range of characteristics. In this work, we train convolutional neural networks for person re-identification. However, datasets of sufficient size for training all consist of data from fixed camera networks. We show that the resulting models, while performing strongly on camera network data, struggle to handle the different characteristics of aerial imagery, likely because of an overfitting to data bias inherent in the training data. To address this issue we combine the deep features with hand-crafted covariance features which introduce a higher degree of invariance into our combined representation. The fusion of both types of features is achieved by including the covariance information into the training process of the deep model. We evaluate the combined representation on a dataset consisting of twelve people moving through a scene recorded by four fixed cameras and one mobile aerial camera. We discuss strengths and weaknesses of the features and show that our combined approach outperforms baselines as well as previous work.