Now showing 1 - 10 of 12
No Thumbnail Available
Publication

Real-time dense 3D reconstruction from monocular video data captured by low-cost UAVs

2021 , Hermann, Max , Ruf, Boitumelo , Weinmann, Martin

Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency. In contrast to most real-time capable approaches, our method does not need an explicit depth sensor. Instead, we only rely on a video stream from a camera and its intrinsic calibration. By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content. To create a 3D model of the scene, we rely on a three-stage processing chain. First, we estimate the rough camera trajectory using a simultaneous localization and mapping (SLAM) algorithm. Once a suitable constellation is found, we estimate depth for local bundles of images using a Multi-View Stereo (MVS) approach and then fuse this depth into a global surfel-based model. For our evaluation, we use 55 video sequences with diverse settings, consisting of both synthetic and real scenes. We evaluate not only the generated reconstruction but also the intermediate products and achieve competitive results both qualitatively and quantitatively. At the same time, our method can keep up with a 30 fps video for a resolution of 768 × 448 pixels.

No Thumbnail Available
Publication

Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery

2020 , Hermann, Max , Ruf, Boitumelo , Weinmann, Martin , Hinz, Stefan

Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy d1:25 of up to 93.5 %. In addition, we have paid particular attention to the generalization of a trained model to unknown data and the self-improving capabilities of our approach. We conclude that, even though the results of monocular depth estimation are inferior to those achieved by conventional methods, they are well suited to provide a good initialization for methods that rely on image matching or to provide estimates in regions where image matching fails, e.g. occluded or texture-less regions.

No Thumbnail Available
Publication

Incorporating interferometric coherence into LULC classification of airborne PolSAR-images using fully convolutional networks

2020 , Schmitz, Sylvia , Weinmann, Martin , Thiele, Antje

Inspired by the application of state-of-the-art Fully Convolutional Networks (FCNs) for the semantic segmentation of high-resolution optical imagery, recent works transfer this methodology successfully to pixel-wise land use and land cover (LULC) classification of PolSAR data. So far, mainly single PolSAR images are included in the FCN-based classification processes. To further increase classification accuracy, this paper presents an approach for integrating interferometric coherence derived from co-registered image pairs into a FCN-based classification framework. A network based on an encoder-decoder structure with two separated encoder branches is presented for this task. It extracts features from polarimetric backscattering intensities on the one hand and interferometric coherence on the other hand. Based on a joint representation of the complementary features pixel-wise classification is performed. To overcome the scarcity of labelled SAR data for training and testing, annotations are generated automatically by fusing available LULC products. Experimental evaluation is performed on high-resolution airborne SAR data, captured over the German Wadden Sea. The results demonstrate that the proposed model produces smooth and accurate classification maps. A comparison with a single-branch FCN model indicates that the appropriate integration of interferometric coherence enables the improvement of classification performance.

No Thumbnail Available
Publication

Deep Cross-Domain Building Extraction for Selective Depth Estimation from Oblique Aerial Imagery

2018 , Ruf, Boitumelo , Thiel, Laurenz , Weinmann, Martin

With the technological advancements of aerial imagery and accurate 3d reconstruction of urban environments, more and more attention has been paid to the automated analyses of urban areas. In our work, we examine two important aspects that allow online analysis of building structures in city models given oblique aerial image sequences, namely automatic building extraction with convolutional neural networks (CNNs) and selective real-time depth estimation from aerial imagery. We use transfer learning to train the Faster R-CNN method for real-time deep object detection, by combining a large ground-based dataset for urban scene understanding with a smaller number of images from an aerial dataset. We achieve an average precision (AP) of about 80% for the task of building extraction on a selected evaluation dataset. Our evaluation focuses on both dataset-specific learning and transfer learning. Furthermore, we present an algorithm that allows for multi-view depth estimation from aerial image sequences in real-time. We adopt the semi-global matching (SGM) optimization strategy to preserve sharp edges at object boundaries. In combination with the Faster R-CNN, it allows a selective reconstruction of buildings, identified with regions of interest (RoIs), from oblique aerial imagery.

No Thumbnail Available
Publication

ReS2tAC - UAV-borne real-time SGM stereo optimized for embedded ARM and CUDA devices

2021 , Ruf, Boitumelo , Mohrs, Jonas , Weinmann, Martin , Hinz, Stefan , Beyerer, Jürgen

With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV.

No Thumbnail Available
Publication

Superpoints in RANSAC planes: A new approach for ground surface extraction exemplified on point classification and context-aware reconstruction

2020 , Bulatov, Dimitri , Stütz, Dominik , Lucks, Lukas , Weinmann, Martin

In point clouds obtained from airborne data, the ground points have traditionally been identified as local minima of the altitude. Subsequently, the 2.5D digital terrain models have been computed by approximation of a smooth surfaces from the ground points. But how can we handle purely 3D surfaces of cultural heritage monuments covered by vegetation or Alpine overhangs, where trees are not necessarily growing in bottom-to-top direction? We suggest a new approach based on a combination of superpoints and RANSAC implemented as a filtering procedure, which allows efficient handling of large, challenging point clouds without necessity of training data. If training data is available, covariance-based features, point histogram features, and dataset-dependent features as well as combinations thereof are applied to classify points. Results achieved with a Random Forest classifier and non-local optimization using Markov Random Fields are analyzed for two challenging datasets: an airborne laser scan and a photogrammetrically reconstructed point cloud. As an application, surface reconstruction from the thus cleaned point sets is demonstrated.

No Thumbnail Available
Publication

Automatic Extrinsic Self-Calibration of Mobile Mapping Systems Based on Geometric 3D Features

2019 , Hillemann, Markus , Weinmann, Martin , Mueller, Markus S. , Jutzi, Boris

Mobile Mapping is an efficient technology to acquire spatial data of the environment. The spatial data is fundamental for applications in crisis management, civil engineering or autonomous driving. The extrinsic calibration of the Mobile Mapping System is a decisive factor that affects the quality of the spatial data. Many existing extrinsic calibration approaches require the use of artificial targets in a time-consuming calibration procedure. Moreover, they are usually designed for a specific combination of sensors and are, thus, not universally applicable. We introduce a novel extrinsic self-calibration algorithm, which is fully automatic and completely data-driven. The fundamental assumption of the self-calibration is that the calibration parameters are estimated the best when the derived point cloud represents the real physical circumstances the best. The cost function we use to evaluate this is based on geometric features which rely on the 3D structure tensor derived from the local neighborhood of each point. We compare different cost functions based on geometric features and a cost function based on the Rényi quadratic entropy to evaluate the suitability for the self-calibration. Furthermore, we perform tests of the self-calibration on synthetic and two different real datasets. The real datasets differ in terms of the environment, the scale and the utilized sensors. We show that the self-calibration is able to extrinsically calibrate Mobile Mapping Systems with different combinations of mapping and pose estimation sensors such as a 2D laser scanner to a Motion Capture System and a 3D laser scanner to a stereo camera and ORB-SLAM2. For the first dataset, the parameters estimated by our self-calibration lead to a more accurate point cloud than two comparative approaches. For the second dataset, which has been acquired via a vehicle-based mobile mapping, our self-calibration achieves comparable results to a manually refined reference calibration, while it is universally applicable and fully automated.

No Thumbnail Available
Publication

Classification of airborne 3D point clouds regarding separation of vegetation in complex environments

2021 , Bulatov, Dimitri , Stütz, Dominik , Hacker, Jorg , Weinmann, Martin

Classification of outdoor point clouds is an intensely studied topic, particularly with respect to the separation of vegetation from the terrain and manmade structures. In the presence of many overhanging and vertical structures, the (relative) height is no longer a reliable criterion for such a separation. An alternative would be to apply supervised classification; however, thousands of examples are typically required for appropriate training. In this paper, an unsupervised and rotation-invariant method is presented and evaluated for three datasets with very different characteristics. The method allows us to detect planar patches by filtering and clustering so-called superpoints, whereby the well-known but suitably modified random sampling and consensus (RANSAC) approach plays a key role for plane estimation in outlier-rich data. The performance of our method is compared to that produced by supervised classifiers common for remote sensing settings: random forest as learner and feature sets for point cloud processing, like covariance-based features or point descriptors. It is shown that for point clouds resulting from airborne laser scans, the detection accuracy of the proposed method is over 96% and, as such, higher than that of standard supervised classification approaches. Because of artifacts caused by interpolation during 3D stereo matching, the overall accuracy was lower for photogrammetric point clouds (74-77%). However, using additional salient features, such as the normalized green-red difference index, the results became more accurate and less dependent on the data source.

No Thumbnail Available
Publication

Automatic Generation of Training Data for Land Use and Land Cover Classification by Fusing Heterogeneous Data Sets

2020 , Schmitz, Sylvia , Weinmann, Martin , Weidner, Uwe , Hammer, Horst , Thiele, Antje

Nowadays, automatic classification of remote sensing data can efficiently produce maps of land use and land cover, which provide an essential source of information in the field of environmental sciences. Most state-of-the-art algorithms use supervised learning methods that require a large amount of annotated training data. In order to avoid time-consuming manual labelling, we propose a method for the automatic annotation of remote sensing data that relies on available land use and land cover information. Using the example of automatic labelling of SAR data, we show how the Dempster-Shafer evidence theory can be used to fuse information from different land use and land cover products into one training data set. Our results confirm that the combination of information from OpenStreetMap, CORINE Land Cover 2018, Global Surface Water and the SAR data itself leads to reliable class assignments, and that this combination outperforms each considered single land use and land cover product.

No Thumbnail Available
Publication

Fusion von Kameradaten und Laserscanner-Daten für mobile Anwendungen

2019 , Meidow, Jochen , Weinmann, Martin , Jutzi, Boris