A stochastic late fusion approach to human action recognition in unconstrained images and videos
Recognizing human actions in unconstrained videos and still images has attracted considerable interest in recent research. An increasingly popular trend is to use ensembles of multiple features and classifiers in order to cope with different aspects such as motion, scene, pose and context. It has been observed that late fusion of predictions from individual classifiers offers more robustness than the early fusion of feature descriptors. In this paper, we present a novel framework for the late fusion of probabilistic predictions of different classifiers which is based on formulating and solving constrained quadratic optimization problems. In contrast to late fusion methods such as the sum-rule and the linear weighting, our approach binds constraints on mixture coefficients such that they represent the posterior of every participating classifier for each class. Further, unlike fusion by Bayesian inference, the proposed approach minimizes an error function that also considers correlations among different models. Experiments on three video and image action datasets show that our approach outperforms other late fusion techniques. In particular we report 6 %-8 % improvement compared to previously published results on two benchmark datasets.