• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Efficient VVC Encoding Using Hierarchical Parallelization: A Comprehensive Analysis
 
  • Details
  • Full
Options
2024
Journal Article
Title

Efficient VVC Encoding Using Hierarchical Parallelization: A Comprehensive Analysis

Abstract
This paper presents and analyzes different parallelization strategies in VVenC, an open and optimized software encoder implementation of the Versatile Video Coding (VVC) standard. VVC has been developed to address the increasing demand for higher compression of digital video data, and it reduces the bitrate by around 50% for the same perceived quality compared to its predecessor, the High-Efficiency Video Coding (HEVC) standard. However, this increase in compression efficiency comes with an increase in computational complexity, particularly on the encoder side. VVenC integrates algorithmic optimizations for each coding tool in VVC and defines a set of five presets from faster to slower that provide Pareto-optimal tradeoffs between runtime and efficiency. Multithreading is employed to further reduce runtime while preserving most of the compression efficiency of each preset. With a hierarchical combination of pre-processing, picture-level and in-picture parallelization, VVenC achieves a 4× speedup for four threads. Further speedup using a higher number of threads depends on the video resolution and used encoder preset. For 16 threads, it ranges from 6 to 9 for high definition to 10-12 for ultra-high-definition video. Compared to previous work, the usage of temporal prediction for the adaptive loop filter reduces the associated coding efficiency loss from 0.4% to almost zero. A better scaling for higher numbers of threads can be achieved at the cost of a higher coding efficiency loss. In the presented framework, increased speedup by smaller block Coding Tree Unit (CTU) sizes, a combination of wavefront parallel processing and various tiles picture partitioning configurations is examined. Furthermore, results on a 20-core ARM-based Apple M1 computer indicate a better scaling for multithreading compared to ×86-based architectures. The analysis is complemented by profiling, which exhibits the overhead by idle threads and identifies the mode estimation as the main bottleneck of the presented framework.
Author(s)
George, Valeri  
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Brandenburg, Jens
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Hege, Gabriel  
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Hinz, Tobias
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Wieckowski, Adam  
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Bross, Benjamin  
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Marpe, Detlev  
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Journal
International journal of semantic computing : IJSC  
DOI
10.1142/S1793351X2450003X
Language
English
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Keyword(s)
  • Parallelization

  • versatile video coding

  • video encoding

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024