Fuetterling, ValentinValentinFuetterlingLojewski, CarstenCarstenLojewskiPfreundt, Franz-JosefFranz-JosefPfreundtEbert, AchimAchimEbert2022-03-052022-03-052015https://publica.fraunhofer.de/handle/publica/242293The recent push for interactive global illumination (GI) has established the 4-ary bounding volume hierarchy (BVH4) as a highly efficient acceleration structure for incoherent ray queries with single rays. Ray stream techniques augment the fast single-ray traversal with increased utilization of CPU vector units and leverage memory bandwidth for batches of rays. Despite their success, the proposed implementations suffer from high bookkeeping cost and batch fragmentation, especially for small batch sizes. Furthermore, due to the focus on incoherent rays, optimization for highly coherent BVH4 ray queries, such as primary visibility, has received little attention. Our contribution is twofold: For coherent ray sets, we introduce a large packet traversal tailored to the BVH4 that is faster than the original BVH2 variant, and for incoherent ray batches we propose a novel implementation of ray streams which reduces the bookkeeping cost while strictly maintaining the preferred traversal order of individual rays. Both algorithms are designed around a fast traversal order look-up mechanism. We evaluate our work for primary visibility and diffuse GI and demonstrate significant performance gains over current state-of-the-art implementations.en003006519Efficient ray tracing kernels for modern CPU architecturesjournal article