Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Hybrid human modeling: Making volumetric video animatable

: Eisert, P.; Hilsmann, A.


Magnor, M.:
Real VR - Immersive Digital Reality : How to Import the Real World into Head-Mounted Immersive Displays
Cham: Springer Nature, 2020 (Lecture Notes in Computer Science 11900)
ISBN: 978-3-030-41815-1 (Print)
ISBN: 978-3-030-41816-8 (Online)
ISBN: 3-030-41815-4
Book Article
Fraunhofer HHI ()

Photo-realistic modeling and rendering of humans is extremely important for virtual reality (VR) environments, as the human body and face are highly complex and exhibit large shape variability but also, especially, as humans are extremely sensitive to looking at humans. Further, in VR environments, interactivity plays an important role. While purely computer graphics modeling can achieve highly realistic human models, achieving real photo-realism with these models is computationally extremely expensive. In this chapter, a full end-to-end pipeline for the creation of hybrid representations for human bodies and faces (animatable volumetric video) is investigated, combining classical computer graphics models with image- and video-based as well as example-based approaches: by enriching volumetric video with semantics and animation properties and applying new hybrid geometry- and video-based animation methods, we bring volumetric video to life and combine interactivity with photo-realism. Semantic enrichment and geometric animation ability is achieved by establishing temporal consistency in the 3D data, followed by an automatic rigging of each frame using a parametric shape-adaptive full human body model. For pose editing, we exploit the captured data as much as possible and kinematically deform selected captured frames to fit a desired pose. Further, we treat the face differently from the body in a hybrid geometry- and video-based animation approach, where coarse movements and poses are modeled in the geometry, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in video-based textures. These are processed to be interactively combined to form new facial expressions. On top of that, we learn the appearance of regions that are challenging to synthesize, such as the teeth or the eyes, and fill in missing regions realistically with an autoencoder-based approach.