Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals

Workshop paper presented at CRPV Workshop
: Schlosser, Patrick; Münch, David; Arens, Michael

Postprint urn:nbn:de:0011-n-5521371 (2.1 MByte PDF)
MD5 Fingerprint: 8426060e008b111501334504d9c076e4
Created on: 19.7.2019

2019, 9 pp.
Conference on Computer Vision and Pattern Recognition (CVPR) <2019, Long Beach/Calif.>
Conference Paper, Electronic Publication
Fraunhofer IOSB ()

In this paper, several variants of two-stream architectures for temporal action proposal generation in long, untrimmed videos are presented. Inspired by the recent advances in the field of human action recognition utilizing
3D convolutions in combination with two-stream networks and based on the Single-Stream Temporal Action Proposals (SST) architecture [3], four different two-stream architectures utilizing sequences of images on one stream and
sequences of images of optical flow on the other stream are subsequently investigated. The four architectures fuse the two separate streams at different depths in the model; for each of them, a broad range of parameters is investigated systematically as well as an optimal parametrization
is empirically determined. The experiments on the THUMOS’ 14 [11] dataset – containing untrimmed videos of 20 different sporting activities for temporal action proposals – show that all four two-stream architectures are able to outperform the original single-stream SST and achieve state of the art results. Additional experiments revealed that the improvements are not restricted to one method of calculating optical flow by exchanging the method of Brox [1] with FlowNet2 [10] and still achieving improvements.