Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Multi-modal three-stream network for action recognition

: Khalid, Muhammad Usman; Yu, Jie

Postprint urn:nbn:de:0011-n-5038343 (1.3 MByte PDF)
MD5 Fingerprint: 5653f7857e6196a20c8bcc0bb96b8ea1
© IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Erstellt am: 3.8.2018

Tan, T. ; Institute of Electrical and Electronics Engineers -IEEE-; International Association for Pattern Recognition -IAPR-; Chinese Academy of Sciences, Shenyang Institute of Automation -SIA-:
ICPR 2018, 24th International Conference on Pattern Recognition : Beijing, China, August 20th-24th, 2018
Piscataway, NJ: IEEE, 2018
ISBN: 978-1-5386-3787-6
ISBN: 978-1-5386-3788-3
ISBN: 978-1-5386-3789-0
International Conference on Pattern Recognition (ICPR) <24, 2018, Beijing>
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IPA ()
deep learning; Erkennen; Videoaufzeichnung; Verhaltenswissenschaft; computer vision; maschinelles Sehen; menschliches Verhalten; Videotechnik

Human action recognition in video is an active yet challenging research topic due to high variation and complexity of data. In this paper, a novel video based action recognition framework utilizing complementary cues is proposed to handle this complex problem. Inspired by the successful two stream networks for action classification, additional pose features are studied and fused to enhance understanding of human action in a more abstract and semantic way. Towards practices, not only ground truth poses but also noisy estimated poses are incorporated in the framework with our proposed pre-processing module. The whole framework and each cue are evaluated onvaried benchmarking datasets as JHMDB, sub-JHMDB and Penn Action. Our results outperform state-of-the-art performance on these datasets and show the strength of complementary cues.