Now showing 1 - 10 of 36
  • Publication
    Improving Semantic Image Segmentation via Label Fusion in Semantically Textured Meshes
    Models for semantic segmentation require a large amount of hand-labeled training data which is costly and time-consuming to produce. For this purpose, we present a label fusion framework that is capable of improving semantic pixel labels of video sequences in an unsupervised manner. We make use of a 3D mesh representation of the environment and fuse the predictions of different frames into a consistent representation using semantic mesh textures. Rendering the semantic mesh using the original intrinsic and extrinsic camera parameters yields a set of improved semantic segmentation images. Due to our optimized CUDA implementation, we are able to exploit the entire c-dimensional probability distribution of annotations over c classes in an uncertainty-aware manner. We evaluate our method on the Scannet dataset where we improve annotations produced by the state-of-the-art segmentation network ESANet from 52.05% to 58.25% pixel accuracy. We publish the source code of our framework online to foster future research in this area (https://github.com/fferflo/semantic-meshes). To the best of our knowledge, this is the first publicly available label fusion framework for semantic image segmentation based on meshes with semantic textures.
  • Publication
    Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey
    Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized as being non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificially generated datasets, which often do not reflect reality. By basing decision-making algorithms on Deep Neural Networks, prejudice and unfairness may be promoted unknowingly due to a lack of transparency. Hence, several so-called explanators, or explainers, have been developed. Explainers try to give insight into the inner structure of machine learning black boxes by analyzing the connection between the input and output. In this survey, we present the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about the taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas.
  • Publication
    DecaWave ultra-wideband warm-up error correction
    ( 2021)
    Sidorenko, Juri
    ;
    ; ; ;
    Hugentobler, Urs
    In the field of indoor localization, ultra-wideband (UWB) technology is no longer dispensable. The market demands that the UWB hardware has to be cheap, precise and accurate. These requirements lead to the popularity of the DecaWave UWB system. The great majority of the publications about this system deals with the correction of the signal power, hardware delay or clock drift. It has traditionally been assumed that this error only appears at the beginning of the operation and is caused by the warm-up process of the crystal. In this article, we show that the warm-up error is influenced by the same error source as the signal power. To our knowledge, no scientific publication has explicitly examined the warm-up error before. This work aims to close this gap and, moreover, to present a solution which does not require any external measuring equipment and only has to be carried out once. It is shown that the empirically obtained warm-up correction curve increases the accuracy for the twoway- ranging (TWR) significantly.
  • Publication
    Generating Synthetic Training Data for Deep Learning-Based UAV Trajectory Prediction
    Deep learning-based models, such as recurrent neural networks (RNNs), have been applied to various sequence learning tasks with great success. Following this, these models are increasingly replacing classic approaches in object tracking applications for motion prediction. On the one hand, these models can capture complex object dynamics with less modeling required, but on the other hand, they depend on a large amount of training data for parameter tuning. Towards this end, we present an approach for generating synthetic trajectory data of unmanned-aerial-vehicles (UAVs) in image space. Since UAVs, or rather quadrotors are dynamical systems, they can not follow arbitrary trajectories. With the prerequisite that UAV trajectories fulfill a smoothness criterion corresponding to a minimal change of higher-order motion, methods for planning aggressive quadrotors flights can be utilized to generate optimal trajectories through a sequence of 3D waypoints. By projecting these maneuver trajectories, which are suitable for controlling quadrotors, to image space, a versatile trajectory data set is realized. To demonstrate the applicability of the synthetic trajectory data, we show that an RNN-based prediction model solely trained on the generated data can outperform classic reference models on a real-world UAV tracking dataset. The evaluation is done on the publicly available ANTI-UAV dataset.
  • Publication
    Quantifying the Complexity of Standard Benchmarking Datasets for Long-Term Human Trajectory Prediction
    Methods to quantify the complexity of trajectory datasets are still a missing piece in benchmarking human trajectory prediction models. In order to gain a better understanding of the complexity of trajectory prediction tasks and following the intuition, that more complex datasets contain more information, an approach for quantifying the amount of information contained in a dataset from a prototype-based dataset representation is proposed. The dataset representation is obtained by first employing a non-trivial spatial sequence alignment, which enables a subsequent learning vector quantization (LVQ) stage. A large-scale complexity analysis is conducted on several human trajectory prediction benchmarking datasets, followed by a brief discussion on indications for human trajectory prediction and benchmarking.
  • Publication
    Information Acquisition on Pedestrian Movements in Urban Traffic with a Mobile Multi-Sensor System
    This paper presents an approach which combines LiDAR sensors and cameras of a mobile multi-sensor system to obtain information about pedestrians in the vicinity of the sensor platform. Such information can be used, for example, in the context of driver assistance systems. In the first step, our approach starts by using LiDAR sensor data to detect and track pedestrians, benefiting from LiDAR's capability to directly provide accurate 3D data. After LiDAR-based detection, the approach leverages the typically higher data density provided by 2D cameras to determine the body pose of the detected pedestrians. The approach combines several state-of-the-art machine learning techniques: it uses a neural network and a subsequent voting process to detect pedestrians in LiDAR sensor data. Based on the known geometric constellation of the different sensors and the knowledge of the intrinsic parameters of the cameras, image sections are generated with the respective regions of interest showing only the detected pedestrians. These image sections are then processed with a method for image-based human pose estimation to determine keypoints for different body parts. These keypoints are finally projected from 2D image coordinates to 3D world coordinates using the assignment of the original LiDAR points to a particular pedestrian.
  • Publication
    LiDAR-based localization and automatic control of UAVs for mobile remote reconnaissance
    Sensor-based monitoring of the surroundings of civilian vehicles is primarily relevant for driver assistance in road traffic, whereas in military vehicles, far-reaching reconnaissance of the environment is crucial for accomplishing the respective mission. Modern military vehicles are typically equipped with electro-optical sensor systems for such observation or surveillance purposes. However, especially when the line-of-sight to the onward route is obscured or visibility conditions are generally limited, more enhanced methods for reconnaissance are needed. The obvious benefit of micro-drones (UAVs) for remote reconnaissance is well known. The spatial mobility of UAVs can provide additional information that cannot be obtained on the vehicle itself. For example, the UAV could keep a fixed position in front and above the vehicle to gather information about the area ahead, or it could fly above or around obstacles to clear hidden areas. In a military context, this is usually referred to as manned-unmanned teaming (MUM-T). In this paper, we propose the use of vehicle-based electro-optical sensors as an alternative way for automatic control of (cooperative) UAVs in the vehicle's vicinity. In its most automated form, the external control of the UAV only requires a 3D nominal position relative to the vehicle or in absolute geocoordinates. The flight path there and the maintaining of this position including obstacle avoidance are automatically calculated on-board the vehicle and permanently communicated to the UAV as control commands. We show first results of an implementation of this approach using 360° scanning LiDAR sensors mounted on a mobile sensor unit. The control loop of detection, tracking and guidance of a cooperative UAV in the local environment is demonstrated by two experiments. We show the automatic LiDAR-controlled navigation of a UAV from a starting point A to a destination point B. with and without an obstacle between A and B. The obstacle in the direct path is detected and an alternative flight route is calculated and used.
  • Publication
    A comparison of deep saliency map generators on multispectral data in object detection
    Deep neural networks, especially convolutional deep neural networks, are state-of-the-art methods to classify, segment or even generate images, movies, or sounds. However, these methods lack of a good semantic understanding of what happens internally. The question, why a COVID-19 detector has classified a stack of lung-ct images as positive, is sometimes more interesting than the overall specificity and sensitivity. Especially when human domain expert knowledge disagrees with the given output. This way, human domain experts could also be advised to reconsider their choice, regarding the information pointed out by the system. In addition, the deep learning model can be controlled, and a present dataset bias can be found. Currently, most explainable AI methods in the computer vision domain are purely used on image classification, where the images are ordinary images in the visible spectrum. As a result, there is no comparison on how the methods behave with multimodal image data, as well as most methods have not been investigated on how they behave when used for object detection. This work tries to close the gaps by investigating three saliency map generator methods on how their maps differ in the different spectra. This is achieved via an accurate and systematic training. Additionally, we examine how they perform when used for object detection. As a practical problem, we chose object detection in the infrared and visual spectrum for autonomous driving. The dataset used in this work, is the Multispectral Object Detection Dataset,1 where each scene is available in the long-wave (FIR), mid-wave (MIR) and short-wave (NIR) infrared as well as the visual (RGB) spectrum. The results show, that there are differences between the infrared and visual activation maps. Further, an advanced training with both, the infrared and visual data not only improves the network's output, it also leads to more focused spots in the saliency maps.
  • Publication
    Handling Missing Observations with an RNN-based Prediction-Update Cycle
    In tasks such as tracking, time-series data inevitably carry missing observations. While traditional tracking approaches can handle missing observations, recurrent neural networks (RNNs) are designed to receive input data in every step. Furthermore, current solutions for RNNs, like omitting the missing data or data imputation, are not sufficient to account for the resulting increased uncertainty. Towards this end, this paper introduces an RNN-based approach that provides a full temporal filtering cycle for motion state estimation. The Kalman filter inspired approach enables to deal with missing observations and outliers. For providing a full temporal filtering cycle, a basic RNN is extended to take observations and the associated belief about its accuracy into account for updating the current state. An RNN prediction model, which generates a parametrized distribution to capture the predicted states, is combined with an RNN update model, which relies on the prediction model output and the current observation. By providing the model with masking information, binary-encoded missing events, the model can overcome limitations of standard techniques for dealing with missing input values. The model abilities are demonstrated on synthetic data reflecting prototypical pedestrian tracking scenarios.
  • Publication
    Efficient Tour Planning for a Measurement Vehicle by Combining Next Best View and Traveling Salesman
    Path planning for a measuring vehicle requires solving two popular problems from computer science, namely the search for the optimal tour and the search for the optimal viewpoint. Combining both problems results in a new variation of the Traveling Salesman Problem, which we refer to as the Explorational Traveling Salesman Problem. The solution to this problem is the optimal tour with a minimum of observations. In this paper, we formulate the basic problem, discuss it in context of the existing literature and present an iterative solution algorithm. We demonstrate how the method can be applied directly to LiDAR data using an occupancy grid. The ability of our algorithm to generate suitably efficient tours is verified based on two synthetic benchmark datasets, utilizing a ground truth determined by an exhaustive search.