Deep learning based beat event detection in action movie franchises
Automatic understanding and interpretation of movies can be used in a variety of ways to semantically manage the massive volumes of movies data. ""Action Movie Franchises"" dataset is a collection of twenty Hollywood action movies from five famous franchises with ground truth annotations at shot and beat level of each movie. In this dataset, the annotations are provided for eleven semantic beat categories. In this work, we propose a deep learning based method to classify shots and beat-events on this dataset. The training dataset for each of the eleven beat categories is developed and then a Convolution Neural Network is trained. After finding the shot boundaries, key frames are extracted for each shot and then three classification labels are assigned to each key frame. The classification labels for each of the key frames in a particular shot are then used to assign a unique label to each shot. A simple sliding window based method is then used to group adjacent shots having the same label in order to find a particular beat event. The results of beat event classification are presented based on criteria of precision, recall, and F-measure. The results are compared with the existing technique and significant improvements are recorded.