A Simple Pyramid Vision Transformer for Human Pose Estimation in Crowds

Cormier, Mickael

doi:10.5445/IR/1000148320

2022

Conference Paper

Abstract

Multi-person Pose Estimation is essential for several computer vision tasks related to motion analysis and anomaly detection. The impressive and continual progress in this field leads to application in uncooperative real-world scenarios such as detecting anomalous and dangerous behavior from individuals or groups within dense crowds in public places. However, reliably detecting poses within crowds in surveillance footage remains a very challenging task, due to diverse occlusions, illumination changes and long processing time. In this work, we present a simple Pyramid Vision Transformer for Human Pose Estimation achieving competitive results on the COCO Keypoints 2017 while requiring significantly less parameters and thus computation time. A significant improvement is reported over the baselines on the more crowded OCHuman, PoseTrack 2018, and CrowdPose datasets.

Author(s)

Cormier, Mickael

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Mainwork

Proceedings of the 2021 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

Conference

Fraunhofer Institute of Optronics, System Technologies and Image Exploitation and Institute for Anthropomatics, Vision and Fusion Laboratory (Joint Workshop) 2021

Options

A Simple Pyramid Vision Transformer for Human Pose Estimation in Crowds