Understanding Patch-Based Learning of Video Data by Explaining Predictions

Anders, C.J.; Montavon, G.; Samek, W.; Müller, K.-R.

doi:10.1007/978-3-030-28954-6_16

2019

Conference Paper

Abstract

Deep neural networks have shown to learn highly predictive models of video data. Due to the large number of images in individual videos, a common strategy for training is to repeatedly extract short clips with random offsets from the video. We apply the deep Taylor/Layer-wise Relevance Propagation (LRP) technique to understand classification decisions of a deep network trained with this strategy, and identify a tendency of the classifier to look mainly at the frames close to the temporal boundaries of its input clip. This ""border effect"" reveals the model's relation to the step size used to extract consecutive video frames for its input, which we can then tune in order to improve the classifier's accuracy without retraining the model. To our knowledge, this is the first work to apply the deep Taylor/LRP technique on any neural network operating on video data.

Author(s)

Anders, C.J.

Montavon, G.

Samek, W.

Müller, K.-R.

Mainwork

Explainable AI: Interpreting, Explaining and Visualizing Deep Learning

Conference

Workshop "Interpreting, Explaining and Visualizing Deep Learning ... now what?" 2017

Options

Understanding Patch-Based Learning of Video Data by Explaining Predictions