Reducing Deep Face Recognition Model Size by Knowledge Distillation
Current face recognition models have benefited from the recent advanced development of deep learning techniques achieving very high verification performances. However, most of the recent works pay less attention to the computational efficiency of these models. Hence, deploying such models on low computational powered mobile devices is challenging. Nevertheless, recent studies have also shown an increasing demand for mobile user identity authentication using biometrics modalities i.e. face, fingerprint, iris, etc. As a consequence, large well-performing face recognition models have to become smaller to be deployable on mobile devices. This thesis proposes a solution to enhance the verification performance of small face recognition models via knowledge distillation. Conventional knowledge distillation transfers knowledge from a large teacher network to a small student network by mimicking the classification layer. In addition to that, this thesis adapts the knowledge distillation method to be applicable to the feature level which the used teacher ArcFace tries to optimize. The verification results of this thesis prove that knowledge distillation can enhance the performance of a small face recognition model compared to the same model trained without knowledge distillation. Applying conventional knowledge distillation to a ResNet- 56 model increased the accuracy from 99.267% to 99.3% on LFW and from 93.767% to 93.867% on AgeDB. This accuracy of the ResNet-56 student is only 0.117% below the accuracy of its twelve times larger ResNet-18 teacher on LFW and even higher on AgeDB by 0.067%. Moreover, when matching the objective function of ArcFace with knowledge distillation, the performance of a ResNet-56 model could be further increased to 99.367% on LFW. This implies it exceeded the accuracy of the same face recognition model trained without knowledge distillation by a margin of 0.1%. At the same time, it decreased the FMR on LFW compared to the model trained without knowledge distillation.
Darmstadt, TU, Bachelor Thesis, 2020