Generalized Hough transform based time invariant action recognition with 3D pose information

Under CopyrightMünch, DavidDavidMünchHübner, WolfgangWolfgangHübnerArens, MichaelMichaelArens2022-03-1228.10.20142014https://publica.fraunhofer.de/handle/publica/38536110.1117/12.206580510.24406/publica-r-385361Human action recognition has emerged as an important field in the computer vision community due to its large number of applications such as automatic video surveillance, content based video-search and human robot interaction. In order to cope with the challenges that this large variety of applications present, recent research has focused more on developing classifiers able to detect several actions in more natural and unconstrained video sequences. The invariance discrimination tradeoff in action recognition has been addressed by utilizing a Generalized Hough Transform. As a basis for action representation we transform 3D poses into a robust feature space, referred to as pose descriptors. For each action class a one-dimensional temporal voting space is constructed. Votes are generated from associating pose descriptors with their position in time relative to the end of an action sequence. Training data consists of manually segmented action sequences. In the detection phase valid human 3D poses are assumed as input, e.g. originating from 3D sensors or monocular pose reconstruction methods. The human 3D poses are normalized to gain view-independence and transformed into (i) relative limb-angle space to ensure independence of non-adjacent joints or (ii) geometric features. In (i) an action descriptor consists of the relative angles between limbs and their temporal derivatives. In (ii) the action descriptor consists of different geometric features. In order to circumvent the problem of time-warping we propose to use a codebook of prototypical 3D poses which is generated from sample sequences of 3D motion capture data. This idea is in accordance with the concept of equivalence classes in action space. Results of the codebook method are presented using the Kinect sensor and the CMU Motion Capture Database.enaction recognitionGeneralized Hough transform004670Generalized Hough transform based time invariant action recognition with 3D pose informationconference paper