A multitask model for person re-identification and attribute recognition using semantic regions
In recent years, more and more video surveillance cameras are being used both in military and civilian applications. This trend results in large amounts of available image and video footage. An effective manual search and evaluation of this data is difficult due to the large data volume and limited human attention span. This is why automatic algorithms are required to aid in data analysis. A key task in this context is search for persons of interest, i.e., person re-identification. Based on a query image, re-identification methods retrieve further occurrences of the depicted person in large data volumes. The prevailing success of convolutional neural networks (CNNs) in computer vision did not spare person re-identification and has recently led to significant improvements. Current state-of-the-art approaches mostly rely on features extracted from CNNs trained with person images and corresponding identity labels. However, person re-identification still remains a challenging problem due to many task-specific influences such as, e.g., occlusions, incomplete body parts, background clutter, varying camera perspectives, and pose variation. Unlike conventional CNN features, descriptive person attributes represent higher-level semantic information that is more robust to many of these influences. Therefore, person re-identification can be improved by integrating attributes into the algorithms. In this work we investigate approaches for attribute-based person re-identification using deep learning methods with the goal of developing efficient models with the best possible re-identification accuracy. We show that best practices in person re-identification approaches can be transferred to the task of pedestrian attribute recognition to achieve strong baseline results for both tasks. Moreover, we show that leveraging information about semantic clothing and body regions during training of the networks improves the results further. Finally, we combine pedestrian attribute recognition and person re-identification models in a multi-task architecture to build our attribute-based person re-identification approach. We develop our attribute model on the large RAP dataset, which currently offers the largest available number of persons and attributes and thus allows for a differentiated analysis. The final combined attribute and re-identification model is trained on the Market-1501 dataset, which provides person identities and attribute annotations simultaneously. Our results show that baseline re-identification results are surpassed, thus indicating that complementary information from the two different tasks is leveraged.