Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Preventing Catastrophic Forgetting in Deep Learning Classifiers

: González, Camila
: Kuijper, Arjan; Mukhopadhyay, Anirban

Darmstadt, 2020, 84 pp.
Darmstadt, TU, Master Thesis, 2020
Master Thesis
Fraunhofer IGD ()
deep learning; neural networks; Lead Topic: Digitized Work; Research Line: Computer vision (CV)

Deep neural networks suffer from the problem of catastrophic forgetting. When a model is trained sequentially with batches of data coming from different domains, it adapts too strongly to properties present on the last batch. This causes a catastrophic fall in performance for data similar to that in the initial batches of training. Regularization-based methods are a popular way to reduce the degree of forgetting, as they have an array of desirable properties. However, they perform poorly when no information about the data origin is present at inference time. We propose a way to improve the performance of such methods which comprises introducing insularoty noise in unimportant parameters so that the model grows robust against them changing. Additionally, we present a way to bypass the need for sourcing information. We propose using an oracle to decide which of the previously seen domains a new instance belongs to. The oracle’s prediction is then used to select the model state. In this work, we introduce three such oracles. Two of these select the model which is most confident for the instance. The first, the cross-entropy oracle, chooses the model with least cross-entropy between the prediction and the one-hot form of the prediction. The second, the MC dropout oracle, chooses the model with lowest standard deviation between predictions resulting from performing an array of forward passes while applying dropout. Finally, the domain identification oracle extracts information about the data distribution for each task using the training data. At inference time, it assesses which task the instance is likeliest to belong to, and applies the corresponding model. For all of our three different datasets, at least one oracle performs better than all regularization-based methods. Furthermore, we show that the oracles can be combined with a sparsification-based approach that significantly reduces the memory requirements.