Reducing Deep Face Recognition Model Memory Footprint via Quantization
Deep neural network have become one of the popular solutions for face recognition. However, the memory footprint of such models are still quite large for deploying to certain embedded devices or in application with limited memory requirement. In this thesis, post training static quantization and quantize aware training are utilized to reduce the memory footprint of face recognition models together with analyzing the effect of quantization on the performance of those models. Both quantization methods will be implemented with built-in functions from Pytorch to generate quantization models using 8 bits integer to represent model parameters. The result showed that both quantization methods can help to reduce almost 4 times the model memory footprint of big face recognition models like Resnet18, Resnet34, Resnet50 and Resnet100 [17, 8] with the average accuracies of quantized model still within 99% of original un-quantized models. For compact face recognition models like VarGFaceNet , MixFaceNet  and MobileFaceNet , these two quantization methods can make the model memory footprint 2-3 times smaller than the original size, but model accuracy will be affected significantly. Specifically, post training static quantization models of these three models showed significant drop in accuracy, around 5%, 9% and 40% for VarGFaceNet, MobileFaceNet and MixFaceNet respectively with MegaFace as evaluation metric, but quantize aware training models displayed much better accuracy, only 2%, 4% and 13% drop in accuracy for VarGFaceNet, MobileFaceNet and MixFaceNet respectively also with MegaFace as evaluation metric. ShuffleFaceNet  is also a compact face recognition model, but its quantized models accuracy is very close to the accuracy of original model, less than 1% drop for all evaluation metrics, and the model memory footprint is reduced from 10.64MB to 3.18MB.
Darmstadt, TU, Bachelor Thesis, 2021