Decoupled Iterative Deep Sensor Fusion for 3D Semantic Segmentation
One of the key tasks for autonomous vehicles or robots is a robust perception of their 3D environment, which is why autonomous vehicles or robots are equipped with a wide range of different sensors. Building upon a robust sensor setup, understanding and interpreting their 3D environment is the next important step. Semantic segmentation of 3D sensor data, e.g. point clouds, provides valuable information for this task and is often seen as key enabler for 3D scene understanding. This work presents an iterative deep fusion architecture for semantic segmentation of 3D point clouds, which builds upon a range image representation of the point clouds and additionally exploits camera features to increase accuracy and robustness. In contrast to other approaches, which fuse lidar and camera features once, the proposed fusion strategy iteratively combines and refines lidar and camera features at different scales inside the network architecture. Additionally, the proposed approach can deal with camera failure as well as jointly predict lidar and camera segmentation. We demonstrate the benefits of the presented iterative deep fusion approach on two challenging datasets, outperforming all range image-based lidar and fusion approaches. An in-depth evaluation underlines the effectiveness of the proposed fusion strategy and the potential of camera features for 3D semantic segmentation.