D-ID-Net: Two-Stage Domain and Identity Learning for Identity-Preserving Image Generation from Semantic Segmentation
Training functionality-demanding AR/VR systems require accurate and robust gaze estimation and tracking solutions. Achieving such a performance requires the availability of diverse eye image data that might only be acquired by the means of image generation. Works addressing the generation of such images did not target realistic and identity-specific images, nor did they address the practical relevant case of generation from semantic labels. Therefore, this work proposes a solution to generate realistic and identity-specific images that correspond to semantic labels, given samples of a specific identity. Our proposed solution consists of two stages. In the first stage, a network is trained to transform the semantic label into a corresponding eye image of a generic identity. The second stage is an identity-specific network that induces identity details on the generic eye image. The results of our D-ID-Net solutions shows a high degree of identity-preservation and similarity to the ground-truth images, with an RMSE of 7.235.