Now showing 1 - 5 of 5
No Thumbnail Available
Publication

ReS2tAC - UAV-borne real-time SGM stereo optimized for embedded ARM and CUDA devices

2021 , Ruf, Boitumelo , Mohrs, Jonas , Weinmann, Martin , Hinz, Stefan , Beyerer, Jürgen

With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV.

No Thumbnail Available
Publication

Deep Cross-Domain Building Extraction for Selective Depth Estimation from Oblique Aerial Imagery

2018 , Ruf, Boitumelo , Thiel, Laurenz , Weinmann, Martin

With the technological advancements of aerial imagery and accurate 3d reconstruction of urban environments, more and more attention has been paid to the automated analyses of urban areas. In our work, we examine two important aspects that allow online analysis of building structures in city models given oblique aerial image sequences, namely automatic building extraction with convolutional neural networks (CNNs) and selective real-time depth estimation from aerial imagery. We use transfer learning to train the Faster R-CNN method for real-time deep object detection, by combining a large ground-based dataset for urban scene understanding with a smaller number of images from an aerial dataset. We achieve an average precision (AP) of about 80% for the task of building extraction on a selected evaluation dataset. Our evaluation focuses on both dataset-specific learning and transfer learning. Furthermore, we present an algorithm that allows for multi-view depth estimation from aerial image sequences in real-time. We adopt the semi-global matching (SGM) optimization strategy to preserve sharp edges at object boundaries. In combination with the Faster R-CNN, it allows a selective reconstruction of buildings, identified with regions of interest (RoIs), from oblique aerial imagery.

No Thumbnail Available
Publication

Real-time dense 3D reconstruction from monocular video data captured by low-cost UAVs

2021 , Hermann, Max , Ruf, Boitumelo , Weinmann, Martin

Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency. In contrast to most real-time capable approaches, our method does not need an explicit depth sensor. Instead, we only rely on a video stream from a camera and its intrinsic calibration. By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content. To create a 3D model of the scene, we rely on a three-stage processing chain. First, we estimate the rough camera trajectory using a simultaneous localization and mapping (SLAM) algorithm. Once a suitable constellation is found, we estimate depth for local bundles of images using a Multi-View Stereo (MVS) approach and then fuse this depth into a global surfel-based model. For our evaluation, we use 55 video sequences with diverse settings, consisting of both synthetic and real scenes. We evaluate not only the generated reconstruction but also the intermediate products and achieve competitive results both qualitatively and quantitatively. At the same time, our method can keep up with a 30 fps video for a resolution of 768 × 448 pixels.

No Thumbnail Available
Publication

Improved UAV-borne 3D mapping by fusing optical and laserscanner data

2013 , Jutzi, Boris , Weinmann, Martin , Meidow, Jochen

In this paper, a new method for fusing optical and laserscanner data is presented for improved UAV-borne 3D mapping. We propose to equip an unmanned aerial vehicle (UAV) with a small platform which includes two sensors: a standard low-cost digital camera and a lightweight Hokuyo UTM-30LX-EW laserscanning device (210 g without cable). Initially, a calibration is carried out for the utilized devices. This involves a geometric camera calibration and the estimation of the position and orientation offset between the two sensors by lever-arm and bore-sight calibration. Subsequently, a feature tracking is performed through the image sequence by considering extracted interest points as well as the projected 3D laser points. These 2D results are fused with the measured laser distances and fed into a bundle adjustment in order to obtain a Simultaneous Localization and Mapping (SLAM). It is demonstrated that an improvement in terms of precision for the pose estimation is derived by fusing optical and laserscanner data.

No Thumbnail Available
Publication

Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery

2020 , Hermann, Max , Ruf, Boitumelo , Weinmann, Martin , Hinz, Stefan

Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy d1:25 of up to 93.5 %. In addition, we have paid particular attention to the generalization of a trained model to unknown data and the self-improving capabilities of our approach. We conclude that, even though the results of monocular depth estimation are inferior to those achieved by conventional methods, they are well suited to provide a good initialization for methods that rely on image matching or to provide estimates in regions where image matching fails, e.g. occluded or texture-less regions.