• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Anderes
  4. RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation
 
  • Details
  • Full
Options
2026
Paper (Preprint, Research Paper, Review Paper, White Paper, etc.)
Title

RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation

Abstract
LiDAR point cloud segmentation is central to autonomous driving and 3D scene understanding. While voxel- and point-based methods dominate recent research due to their compatibility with deep architectures and ability to capture fine-n geometry, they often incur high computational cost, irregular memory access, and limited runtime efficiency due to scaling issues. In contrast, range-view methods, though relatively underexplored - can leverage mature 2D semantic segmentation techniques for fast and accurate predictions. Motivated by the rapid progress in Visual Foundation Models (VFMs) for captioning, zero-shot recognition, and multimodal tasks, we investigate whether SAM2, the current state-of-the-art VFM for segmentation tasks, can serve as a strong backbone for LiDAR point cloud segmentation in the range view representations. We present RangeSAM, to our knowledge, the first range-view framework that adapts SAM2 to 3D segmentation, coupling efficient 2D feature extraction with projection/back-projection to operate on point clouds. To optimize SAM2 for range-view representations, we implement several architectural modifications to the encoder: (1) a novel Stem module that emphasizes horizontal spatial dependencies inherent in LiDAR range images, (2) a customized configuration of Hiera Blocks tailored to the geometric properties of spherical projections, and (3) an adapted Window Attention mechanism in the encoder backbone specifically designed to capture the unique spatial patterns and discontinuities present in range-view pseudoimages. Our approach achieves competitive performance on SemanticKITTI while benefiting from the speed, scalability, and deployment simplicity of 2D-centric pipelines. This work highlights the viability of VFMs as general-purpose backbones for point cloud segmentation and opens a path toward unified, foundation-model-driven LiDAR segmentation. Results let us conclude that range-view segmentation methods using VFMs lead to promising results.
Author(s)
Kühn, Julius
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Nguyen, Duc Anh
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Kuijper, Arjan  orcid-logo
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Sinha, Saptarshi Neil
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Conference
Winter Conference on Applications of Computer Vision 2026  
Open Access
File(s)
Download (874.37 KB)
Rights
Use according to copyright law
DOI
10.24406/publica-7983
Language
English
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Keyword(s)
  • Branche: Manufacturing and Mobility

  • Branche: Infrastructure and Public Services

  • Research Line: Computer graphics (CG)

  • Research Line: Computer vision (CV)

  • Research Line: Machine learning (ML)

  • LTA: Interactive decision-making support and assistance systems

  • LTA: Scalable architectures for massive data sets

  • LTA: Generation, capture, processing, and output of images and 3D models

  • Autonomous driving

  • Rangeview segmentation

  • Foundation models

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024