CC BY-SA 4.0Wang, ZhiyuanZhiyuanWangYang, CongCongYangZhang, YuluYuluZhangBoukhers, ZeydZeydBoukhersSui, WeiWeiSuiJi, YiYiJiLiu, ChunpingChunpingLiu2025-01-092025-01-092024-12-03https://publica.fraunhofer.de/handle/publica/48118610.1145/3696409.3700170Recent advancements in skeleton extraction have significantly improved the process by simplifying the skeleton regression task into graph component detection. Despite the advancements in skeleton topology, accuracy in detailing skeletal parts remains challenging, with specific issues such as jagged edges in high-resolution images. This paper identifies the limitations of current detection models that can adapt during the decomposition and reconstruction phases, which impacts the overall precision of the extraction. In response, we propose an approach that revises the primary focus of the detection tasks. Inspired by the success of pixel-wise binary classification methods, we propose a gradual transition in focus from a coordinate localization regression task to a classification task of predicting points during the training process. This transition can be achieved by adjusting the number of object queries in the Transformer model. Theoretical and experimental evaluations validate the effectiveness of our approach. Our method yields significant improvements in performance over the baseline across various shape and image datasets (e.g., 0.836 vs. 0.826 for BlumNet on the SK1491 dataset).enTransition in Focus of Prediction Tasks for Skeleton Graph Component Detection with Transformerconference paper