Learning spatial interest regions from videos to inform action recognition in still images

Eweiwi, Abdalrahman; Cheema, Muhammad Shahzad; Bauckhage, Christian

2014

Conference Paper

Abstract

Common approaches to human action recognition from images rely on local descriptors for classification. Typically, these descriptors are computed in the vicinity of key points which either result from running a key point detector or from dense or random sampling of pixel coordinates. Such key points are not a-priori related to human activities and thus of limited information with regard to action recognition. In this paper, we propose to identify action-specific key points in images using information available from videos. Our approach does not require manual segmentation or templates but applies non-negative matrix factorization to optical flow fields extracted from videos. The resulting basisflows are found to to be indicative of action specific image regions and therefore allow for an informed sampling of key points. We also present a generative model that allows for characterizing joint distributions of regions of interest and a human actions. In practical experiments, we determine correspondences between regions of interest that were automatically learned from videos and manually annotated locations of human body parts available from independent benchmark image data sets. We observe high correlations between learned interest regions and body parts most relevant for different actions.

Author(s)

Eweiwi, Abdalrahman

Cheema, Muhammad Shahzad

Bauckhage, Christian

Hauptwerk

16th LWA Workshops: KDML, IR and FGWM 2014. Proceedings

Konferenz

Conference "Learning, Knowledge, Adaptation" (LWA) 2014

Workshop "Knowledge Discovery, Data Mining and Machine Learning" (KDML) 2014

Options

Learning spatial interest regions from videos to inform action recognition in still images