Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Interactive 3-D video representation and coding technologies

: Smolic, A.; Kauff, P.


Proceedings of the IEEE 93 (2005), Nr.1, S.98-110
ISSN: 0018-9219
Fraunhofer HHI ()

Interactivity in the sense of being able to explore and navigate audio-visual scenes by freely choosing viewpoint and viewing direction, is an important key feature of new emerging audio-visual media. This paper gives an overview of suitable technology for such applications, with a focus an international standards, which are beneficial for consumers, service providers, and manufacturers. We first give a general classification and overview of interactive scene representation formats as commonly used in computer graphics literature. Then, we describe popular standard formats for interactive three-dimensional (3-D) scene representation and creation of virtual environments, the virtual reality modeling language (VRML), and the MPEG-4 Blnary Format for Scenes (BIFS) with some examples. Recent extensions to MPEG-4 BIFS, the Animation Framework eXtension (AFX), providing advanced computer graphics tools, are explained and illustrated. New technologies mainly targeted at reconstruction, modeling, and representation of dynamic real world scenes are further studied. The user shall be able to navigate photorealistic scenes within certain restrictions, which can be roughly defined as 3-D video. Omnidirectional video is an extension of the planar two-dimensional (2-D) image plane to a spherical or cylindrical image plane. Any 2-D view in any direction can be rendered from this overall recording to give the user the impression of looking around. In interactive stereo two views, one for each eye, are synthesized to provide the user with an adequate depth cue of the observed scene. Head motion parallax viewing can be supported in a certain operating range if sufficient depth or disparity data are delivered with the video data. In free viewpoint video, a dynamic scene is captured by a number of cameras. The input data are transformed into a special data representation that enables interactive navigation through the dynamic scene environment.