Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Analysis of Deep Fusion Strategies for Multi-modal Gesture Recognition

 
: Roitberg, Alina; Pollert, Tim; Haurilet, Monica; Martin, Manuel; Stiefelhagen, Rainer

:
Fulltext (PDF; )

Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computer Society:
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019. Proceedings : 16-20 June 2019, Long Beach, California
Los Alamitos, Calif.: IEEE Computer Society Conference Publishing Services (CPS), 2019
ISBN: 978-1-7281-2507-7
ISBN: 978-1-7281-2506-0
pp.198-206
Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) <32, 2019, Long Beach/Calif.>
Bundesministerium für Bildung und Forschung BMBF (Deutschland)
PaKoS
Personalisierte, adaptive kooperative Systeme für automatisierte Fahrzeuge
English
Conference Paper, Electronic Publication
Fraunhofer IOSB ()

Abstract
Video-based gesture recognition has a wide spectrum of applications, ranging from sign language understanding to driver monitoring in autonomous cars. As different sensors suffer from their individual limitations, combining multiple sources has strong potential to improve the results. A number of deep architectures have been proposed to recognize gestures from e.g. both color and depth data. However, these models conventionally comprise separate networks for each modality, which are then combined in the final layer (e.g. via simple score averaging). In this work, we take a closer look at different fusion strategies for gesture recognition especially focusing on the information exchange in the intermediate layers. We compare three fusion strategies on the widely used C3D architecture: 1) late fusion, combining the streams in the final layer; 2) information exchange in an intermediate layer using an additional convolution layer; and 3) linking information at multiple layers simultaneously using the cross-stitch units, originally designed for multi-task learning. Our proposed C3D-Stitchmodel achieves the best recognition rate, demonstrating the effectiveness of sharing information at earlier stages.

: http://publica.fraunhofer.de/documents/N-583060.html