Now showing 1 - 10 of 50
  • Publication
    Utilizing Dataset Affinity Prediction in Object Detection to Assess Training Data
    Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual datasets. Towards this end, we propose incorporating a data source prediction module into standard object detection pipelines. The module runs with minimal overhead during inference time, providing additional information about the data source assigned to individual detections. We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy. T
  • Publication
    Center point-based feature representation for tracking
    Center points are commonly the results of anchor-free object detectors. Starting from this initial representation, a regression scheme is utilized to determine a target point set to capture object properties such as enclosing bounding boxes and further attributes such as class labels. When only trained for the detection tasks, the encoded center point feature representations are not well suited for tracking objects since the embedded features are not stable over time. To tackle this problem, we present an approach of joint detection and feature embedding for multiple object tracking. The proposed approach applies an anchor-free detection model to pairs of images to extract single-point feature representations. To generate temporal stable features which are suitable for track association across short time intervals, auxiliary losses are applied to reduce the distance of tracked identities in the embedded feature space. The abilities of the presented approach are demonstrated on real-world data reflecting prototypical object tracking scenarios.
  • Publication
    Companion Paper: Deep Saliency Map Generators for Multispectral Video Classification
    This is the companion paper for the ICPR 2022 Paper "Deep Saliency Map Generators for Multispectral Video Classification", that investigates the applicability of three saliency map generators on multispectral video input data. In addition to implementation details of modifications for the investigated methods and the used neural network implementations, the influence of the parameters and a more detailed insight in the training and evaluation process is given.
  • Publication
    Eigenpatches - Adversarial Patches from Principal Components
    Adversarial patches are still a simple yet powerful white-box attack that can be used to fool object detectors by suppressing possible detections. The patches of these so-called evasion attacks are computational expensive to produce and require full access to the attacked detector. This paper addresses the problem of computationally expensiveness by analyzing 375 generated patches, calculating the principal components of these and shows, that traversing the spanned up subspace of the resulting “eigenpatches” can be used to create patches that can be used to fool the attacked YOLOv7 object detector successfully. Furthermore, the influence regarding the mean average precision of the number of principal components used for the patch recreation and the sampling size for the principal component analysis are investigated. Patches generated this way can either be used as a starting point for further optimization or as an adversarial patch as it is.
  • Publication
    Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization
    This paper proposes a novel method for vision-based metric cross-view geolocalization (CVGL) that matches the camera images captured from a ground-based vehicle with an aerial image to determine the vehicle's geo-pose. Since aerial images are globally available at low cost, they represent a potential compromise between two established paradigms of autonomous driving, i.e. using expensive high-definition prior maps or relying entirely on the sensor data captured at runtime. We present an end-to-end differentiable model that uses the ground and aerial images to predict a probability distribution over possible vehicle poses. We combine multiple vehicle datasets with aerial images from orthophoto providers on which we demonstrate the feasibility of our method. Since the ground truth poses are often inaccurate w.r.t. the aerial images, we implement a pseudo-label approach to produce more accurate ground truth poses and make them publicly available. While previous works require training data from the target region to achieve reasonable localization accuracy (i.e. same-area evaluation), our approach overcomes this limitation and outperforms previous results even in the strictly more challenging cross-area case. We improve the previous state-of-the-art by a large margin even without ground or aerial data from the test region, which highlights the model's potential for global-scale application. We further integrate the uncertainty-aware predictions in a tracking framework to determine the vehicle's trajectory over time resulting in a mean position error on KITTI-360 of 0.78m.
  • Publication
    Bézier Curve Gaussian Processes
    Probabilistic models for sequential data are the basis for a variety of applications concerned with processing timely ordered information. The predominant approach in this domain is given by recurrent neural networks, implementing either an approximate Bayesian approach (e.g. Variational Autoencoders or Generative Adversarial Networks) or a regression-based approach, i.e. variations of Mixture Density networks (MDN). In this paper, we focus on the N-MDN variant, which parameterizes (mixtures of) probabilistic Bézier curves (N-Curves) for modeling stochastic processes. While in favor in terms of computational cost and stability, MDNs generally fall behind approximate Bayesian approaches in terms of expressiveness. Towards this end, we present an approach for closing this gap by enabling full Bayesian inference on top of N-MDNs. For this, we show that N-Curves are a special case of Gaussian processes (denoted as N-GP) and then derive corresponding mean and kernel functions for different modalities. Following this, we propose the use of the N-MDN as a data-dependent generator for N-GP prior distributions. We show the advantages granted by this combined model in an application context, using human trajectory prediction as an example.
  • Publication
    Continuous Self-Localization on Aerial Images Using Visual and Lidar Sensors
    This paper proposes a novel method for geo-tracking, i.e. continuous metric self-localization in outdoor environments by registering a vehicle's sensor information with aerial imagery of an unseen target region. Geo- tracking methods offer the potential to supplant noisy signals from global navigation satellite systems (GNSS) and expensive and hard to maintain prior maps that are typically used for this purpose. The proposed geo-tracking method aligns data from on-board cameras and lidar sensors with geo-registered orthophotos to continuously localize a vehicle. We train a model in a metric learning setting to extract visual features from ground and aerial images. The ground features are projected into a top-down perspective via the lidar points and are matched with the aerial features to determine the relative pose between vehicle and orthophoto. Our method is the first to utilize on-board cameras in an end-to-end differentiable model for metric self-localization on unseen orthophotos. It exhibits strong generalization, is robust to changes in the environment and requires only geo-poses as ground truth. We evaluate our approach on the KITTI-360 dataset and achieve a mean absolute position error (APE) of 0.94m. We further compare with previous approaches on the KITTI odometry dataset and achieve state-of-the-art results on the geo-tracking task.
  • Publication
    Improving Semantic Image Segmentation via Label Fusion in Semantically Textured Meshes
    Models for semantic segmentation require a large amount of hand-labeled training data which is costly and time-consuming to produce. For this purpose, we present a label fusion framework that is capable of improving semantic pixel labels of video sequences in an unsupervised manner. We make use of a 3D mesh representation of the environment and fuse the predictions of different frames into a consistent representation using semantic mesh textures. Rendering the semantic mesh using the original intrinsic and extrinsic camera parameters yields a set of improved semantic segmentation images. Due to our optimized CUDA implementation, we are able to exploit the entire c-dimensional probability distribution of annotations over c classes in an uncertainty-aware manner. We evaluate our method on the Scannet dataset where we improve annotations produced by the state-of-the-art segmentation network ESANet from 52.05% to 58.25% pixel accuracy. We publish the source code of our framework online to foster future research in this area (https://github.com/fferflo/semantic-meshes). To the best of our knowledge, this is the first publicly available label fusion framework for semantic image segmentation based on meshes with semantic textures.
  • Publication
    Change detection in street environments based on mobile laser scanning
    Automated change detection based on urban mobile laser scanning data is the foundation for a whole range of applications such as building model updates, map generation for autonomous driving and natural disaster assessment. The challenge with mobile LiDAR data is that various sources of error, such as localization errors, lead to uncertainties and contradictions in the derived information. This paper presents an approach to automatic change detection using a new category of generic evidence grids that addresses the above problems. Said technique, referred to as fuzzy spatial reasoning, solves common problems of state-of-the-art evidence grids and also provides a method of inference utilizing fuzzy Boolean reasoning. Based on this, logical operations are used to determine changes and combine them with semantic information. A quantitative evaluation based on a hand-annotated version of the TUM-MLS data set shows that the proposed method is able to identify confirmed and changed elements of the environment with F1-scores of 0.93 and 0.89.
  • Publication
    Extraction and matching of 3D features for LiDAR-based self-localization in an urban environment
    Geolocation of vehicles, objects or people is commonly done using global navigation satellite system (GNSS) receivers. Such a receiver for GNSS-based positioning is either built into the vehicle, or separate handheld devices like a smartphone or similar are used. Self-localization in this way is simple and accurate up to a few meters. Environments where no GNSS service is available require other strategies for self-localization. Especially in the military domain, it is necessary to be prepared for such GNSS-denied scenarios. Awareness of the own position in relation to other units is crucial in military operations, especially where joint operations have to be coordinated geographically and temporally. However, even if a common map-like representation of the terrain is available, precise self-localization relative to this map is not necessarily easy. In this paper, we propose an approach for LiDAR-based localization of a vehicle-based sensor platform in an urban environment. Our approach is to use 360° scanning LiDAR sensors to generate short-duration point clouds of the local environment. In these point clouds, we detect pole-like 3D features such as traffic sign poles, lampposts or tree trunks. The relative distance and orientation of these features to each other is rather unique, and the matrix of these individual distances and orientations can be used to determine the position of the sensor relative to a current map. This map can either be created in advance for the entire area, or a cooperative preceding vehicle with an equivalent sensor setup can generate it. By matching the found LiDARbased 3D features with those of the map, not only the position of the sensor platform but also its orientation can be determined. We provide first experimental results of the proposed method, which were achieved with measurements by Fraunhofer IOSB’s sensor-equipped vehicle MODISSA.