Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Generalized Hough transform based time invariant action recognition with 3D pose information

: Münch, David; Hübner, Wolfgang; Arens, Michael

Volltext urn:nbn:de:0011-n-3106332 (1.0 MByte PDF)
MD5 Fingerprint: b18a36f910a654f4c89b616e4e47a2ba
Copyright Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.
Erstellt am: 28.10.2014

Society of Photo-Optical Instrumentation Engineers -SPIE-, Bellingham/Wash.:
Optics and Photonics for Counterterrorism, Crime Fighting, and Defence X and Optical Materials and Biomaterials in Security and Defence Systems Technology XI : 22.09.2014, Amsterdam
Bellingham, WA: SPIE, 2014 (Proceedings of SPIE 9253)
ISBN: 978-1-62841-316-8
Paper 92530K, 11 S.
Conference "Optics and Photonics for Counterterrorism, Crime Fighting, and Defence" <10, 2014, Amsterdam>
Conference "Optical Materials and Biomaterials in Security and Defence Systems Technology" <11, 2014, Amsterdam>
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IOSB ()
action recognition; Generalized Hough transform

Human action recognition has emerged as an important field in the computer vision community due to its large number of applications such as automatic video surveillance, content based video-search and human robot interaction. In order to cope with the challenges that this large variety of applications present, recent research has focused more on developing classifiers able to detect several actions in more natural and unconstrained video sequences. The invariance discrimination tradeoff in action recognition has been addressed by utilizing a Generalized Hough Transform. As a basis for action representation we transform 3D poses into a robust feature space, referred to as pose descriptors. For each action class a one-dimensional temporal voting space is constructed. Votes are generated from associating pose descriptors with their position in time relative to the end of an action sequence. Training data consists of manually segmented action sequences. In the detection phase valid human 3D poses are assumed as input, e.g. originating from 3D sensors or monocular pose reconstruction methods. The human 3D poses are normalized to gain view-independence and transformed into (i) relative limb-angle space to ensure independence of non-adjacent joints or (ii) geometric features. In (i) an action descriptor consists of the relative angles between limbs and their temporal derivatives. In (ii) the action descriptor consists of different geometric features. In order to circumvent the problem of time-warping we propose to use a codebook of prototypical 3D poses which is generated from sample sequences of 3D motion capture data. This idea is in accordance with the concept of equivalence classes in action space. Results of the codebook method are presented using the Kinect sensor and the CMU Motion Capture Database.