Where are we with Human Pose Estimation in Real-World Surveillance?
The rapidly increasing number of surveillance cameras offers a variety of opportunities for intelligent video analytics to improve public safety. Among many others, the automatic recognition of suspicious and violent behavior poses a key task. To preserve personal privacy, prevent ethnic bias, and reduce complexity, most approaches first extract the pose or skeleton of persons and subsequently perform activity recognition. However, current literature mainly focuses on research datasets and does not consider real-world challenges and requirements of human pose estimation. We close this gap by analyzing these challenges, such as inadequate data and the need for real-time processing, and proposing a framework for human pose estimation in uncontrolled crowded surveillance scenarios. Our system integrates mitigation measures as well as a tracking component to incorporate temporal information. Finally, we provide a detailed quantitative and qualitative analysis on both a scientific and a real-world dataset to highlight improvements and remaining obstacles towards robust real-world human pose estimation in uncooperative scenarios.