A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors
In this paper, we examined heterogeneous architectures, for their suitability to run the scale invariant feature transformation (SIFT) algorithm in real time. The SIFT is one of the most robust as well as one of the most computational intensive algorithms to extract local features in many machine-vision applications. Many ongoing researches presented methods on improving the SIFT execution time. However, described techniques focus only on improving the SIFT execution time on a single homogeneous device. To address the gap in improving SIFT algorithm execution time on multi-device heterogeneous platforms we have prepared the OpenCL-SIFT implementation. We have described techniques to efficiently parallelize the application that contains many different computing cores. By a careful optimization process, we presented the performance portable implementation, for an efficient processing on various multi-device heterogeneous platforms. The experimental results showed that our implementation obtains appropriate accuracy and higher efficiency compared to recent open-source SIFT implementations. Using proposed methods we extracted SIFT features with more than 30 FPS on Full-HD images with different processor architectures. Additionally to increase the performance, we showed efficient (in average speed-up of 2.69×) multi-device scheduling methods for SIFT feature extraction. Finally, we described guidelines to optimize GPGPU-OpenCL programs for ×86 multi-core CPUs. The discussed methods are generic and may be used for the design of other algorithms.