An Evaluation of Design Choices for Pedestrian Attribute Recognition in Video
Person attribute recognition in surveillance data is a challenging task. Attributes are often visible in very localized regions and recognition thus suffers from poor image quality, changing lighting conditions, viewing angles, and occlusions. Previous research has focused predominantly on recognition in single images. In this work, we investigate the applicability of several recent strategies to include temporal information into the recognition process. We identify the most promising building blocks and create a strong baseline model, which achieves state-of-the-art attribute recognition accuracy in videos and provides a good basis for future research. Finally, we show that the resulting attributes can serve as a basis for description-based person retrieval.